CN106778576B - Motion recognition method based on SEHM characteristic diagram sequence - Google Patents

Motion recognition method based on SEHM characteristic diagram sequence Download PDF

Info

Publication number
CN106778576B
CN106778576B CN201611110573.5A CN201611110573A CN106778576B CN 106778576 B CN106778576 B CN 106778576B CN 201611110573 A CN201611110573 A CN 201611110573A CN 106778576 B CN106778576 B CN 106778576B
Authority
CN
China
Prior art keywords
sehm
sequence
frame
action
diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611110573.5A
Other languages
Chinese (zh)
Other versions
CN106778576A (en
Inventor
吴贺俊
李嘉豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201611110573.5A priority Critical patent/CN106778576B/en
Publication of CN106778576A publication Critical patent/CN106778576A/en
Application granted granted Critical
Publication of CN106778576B publication Critical patent/CN106778576B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

When the action recognition method provided by the invention is used for carrying out action recognition, the SEHM (segment energy hierarchy maps) feature diagram provided by the invention is used as a bottom-layer feature for carrying out action recognition. By reasonably selecting parameters such as time slice length in the algorithm, calculating a corresponding SEHM characteristic diagram sequence and applying the sequence to a neural network for prediction, the functions of off-line recognition and on-line recognition can be realized on action recognition. And because the constructed SEHM characteristic diagram is related to the front and back change of the overall gesture of the action, the action information in the action change process can be fully utilized, and the accuracy of action recognition is improved. Meanwhile, certain compression is carried out on the original data when the SEHM characteristic diagram is calculated, the complexity of the method and the requirement on hardware are low, and online real-time action recognition can be achieved.

Description

Motion recognition method based on SEHM characteristic diagram sequence
Technical Field
The invention relates to the field of image recognition, in particular to an action recognition method based on an SEHM feature map sequence.
Background
With the development of camera sensor technology, the definition level of the camera is generally improved, so that the number and probability of the camera appearing in various scenes are greatly increased. Under the wave of the internet in the modern times, a large amount of image video data is emerging in daily life, and the development of image processing technology is also driven. As one field of image processing technology, motion recognition technology has been widely applied in many scenes, including video monitoring, motion sensing games, health care, social assistance, and the like. For example, Microsoft introduced a somatosensory peripheral-Kinect with an Xbox360 in 2010, which can be used as a depth camera in a host game to capture the body movements of a player and interact with the game; in addition, developers can also develop own applications, such as simulation dressing and the like, on the Windows platform by using a development kit.
While having a wide application scenario, the development of motion recognition has many technical difficulties and constraints.
The first is the constraint of objective conditions. In a video image sequence, because of the actual shooting situation, unavoidable hindering factors often occur, such as a person in a camera encountering the occlusion of other objects (object occlusion); the camera is not always fixed, so that the camera shooting picture shakes (the visual angle shakes); the same person changes its color in light and shadow (lighting conditions); different cameras have great difference (resolution) in picture definition due to the quality of the lens. This is a problem that must be considered in the field of motion recognition, and even in the field of image processing.
The second is the influence of subjective conditions. As a subject of the motion recognition processing, different persons have their own definitions and understandings about the same motion, and even the same motion may have some slight differences. The specific expression is that different people do the same action, and the length, amplitude, pause and the like of the action often cause a plurality of differences of the whole image sequence. In addition to the difference caused by the action of the subject, different people have some differences in body type and structure due to age and sex; the distance from the camera, the angle of facing the camera, will make a large difference between the recorded actions. Each of the above factors may increase the diversity of data. Meanwhile, in order to realize the action recognition algorithm and provide specific interfaces and applications for different industries and scenes, not only the accuracy of the action recognition algorithm but also other constraint conditions, such as cost problems and real-time problems, need to be considered.
In the motion recognition algorithm, generally, a sensor is used as raw input data, and the classification and judgment of motions are performed in cooperation with processes such as preprocessing, feature calculation, classification models and the like. The conventional motion recognition method generally uses a conventional RGB camera as an input method, but as various new sensors appear, more and more kinds of sensors are applied to the motion recognition method, such as a depth camera, an infrared camera, an acceleration sensor, and the like. The advent of new sensors has enabled new input data to be applied to motion recognition methods, and even a number of model fusion methods have emerged. The depth map, as new data from the conventional RGB map, records not a color value but a distance from the camera per pixel. Because of its distance information, research and algorithms based thereon have gained increasing attention and interest.
Reference one discloses a motion recognition method, which takes a depth map as an input, and projects the depth map into different planes of three orthogonal coordinate systems according to distance information of the depth map: front view, side view, top view. The first document proposes a new feature map, namely a depth energy map; and then, calculating HOG characteristics corresponding to the depth energy maps under different views and inputting the HOG characteristics into an SVM classifier for prediction classification. The method directly combines the whole depth video sequence into a depth energy map, does not fully consider the overlapping and redundant information of the motion before and after the whole motion, and does not consider the change of the human posture before and after the whole motion. In a video in which a plurality of different motions appear in front and behind, an energy map of the plurality of motions cannot be accurately divided and generated, and thus the plurality of motions cannot be recognized (front and back multi-motion video recognition); similarly, in the online identification, the depth energy map cannot be synthesized because the end frame cannot be selected, that is, the real-time requirement cannot be met.
The second reference discloses a motion recognition method, which also projects a depth map onto three coordinate surfaces and calculates a corresponding depth energy map, and then introduces another feature operator LBP as a high-level feature. And after the LBP characteristics of the depth energy map are calculated, the improved extreme learning machine model is used for action recognition. The method also processes the whole video sequence into a depth energy map, does not consider the internal relation of the gestures before and after the action, and cannot meet the requirements of front and back multi-action video identification, online identification, instantaneity and the like.
Reference three discloses a motion recognition method, which projects a depth map to three different view angle maps, and calculates only a depth energy map representing distance change in the whole video, different from reference one and reference two, and calculates a historical track map of a depth distance active region, and takes the appearance sequence of postures into consideration; meanwhile, a static posture graph and an average energy graph are also provided, and the input of features is enriched. However, this method does not fully consider the problem that the previous historical poses in the entire video sequence are covered by the later poses, although the order of appearance of the poses is considered, resulting in that the first half of some actions are covered by the second half and much information is lost. Although the situation before and after the gesture is considered to a certain extent, the interference of some redundant actions is not considered. Although the calculation of the stationary region is added, only the absolute value of the motion energy map is considered, and the positive and negative directions of the motion energy are not considered. Similar to the first reference and the second reference, the third reference also cannot meet the requirements of front and back multi-action video identification, online identification and real-time performance.
Reference 1: yang, Xiaoodong, C.Zhang, and Y.L.Tian. "recording action received maps-based programs of oriented programs" ACMINETIONAL Conference on Multimedia 2012: 1057-.
Reference two: chen, Chen, R.Jafari, and N.Kehtarnavaz. "Action recognitions from Depth Sequences Using Depth Motion Maps-Based Local Binary patterns." Applications of Computer Vision IEEE 2015: 1092-.
Reference three: liang, Bin, and L.Zheng. "3D Motion Trail Model Based pyrad histograms of organized Gradient for Action Recognition." International conference on Pattern Recognition IEEE Computer Society,2014: 1952-.
Disclosure of Invention
The invention provides an action recognition method based on an SEHM characteristic diagram sequence for solving the problems in the prior art, and the method can realize off-line recognition and on-line recognition and has better real-time performance.
In order to realize the purpose, the technical scheme is as follows:
an action recognition method based on an SEHM characteristic diagram sequence comprises the following steps:
s1, aiming at a depth map sequence with a selected time interval of N frames in a video, projecting a depth map of each frame in the depth map sequence to different planes of three orthogonal coordinate systems to obtain three orthogonal view angle maps: front, side and top views;
s2, calculating the difference value of two adjacent frames of the depth image sequence under each visual angle image to serve as an energy image, wherein each frame of energy image represents the distance change of the previous frame and the next frame; then, the energy diagram is divided into three state diagrams according to the specific values of the energy diagram and the set threshold value: a binary map for forward state, a binary map for backward state, or a static binary map. The method comprises the following specific steps:
Figure BDA0001172412190000031
wherein
Figure BDA0001172412190000032
Is the energy map of the t frame under the view map v; epsilon is a set threshold;
Figure BDA0001172412190000033
Figure BDA0001172412190000034
representing the absolute value of the difference of the next frame minus the previous frame; i is 1,2 and 3, which respectively represent a forward state binary diagram, a backward state binary diagram and a static binary diagram; state diagram of the tth frame
Figure BDA0001172412190000041
By a three-channel matrix EMtCarrying out representation;
s3, after the step S2 is executed, state diagram sequences under the three view angle diagrams are obtained respectively; respectively averagely dividing N frame state diagram sequences of the three view diagrams into S time slices according to the front and back orders, wherein S is N/K, and K represents the length of each time slice; for the state diagram sequence under each view angle diagram, sequentially selecting the state diagram sequence of a time slice from front to back to calculate the SEHM characteristic diagram:
s31, assuming that the state diagram sequence of the time slice selected for calculation for the p-th time is started from the (p-1) × K +1 frame of the state diagram sequence of the N frames and ended at the p × K frame, the SEHM feature map of the time slice is calculated by the following formula and step S32:
SEHMp=max(SEHMp,EM(p-1)*K+k·k)
wherein k has an initial value of 1, SEHMpIs a three-channel matrix with an initial value set to zero;
s32. let k be k +1 and then execute the formula of step S31 until k>K, finally outputting the SEHM after standardization processingpAn SEHM profile as the p-th time slice selected for calculation;
s4, obtaining SEHM characteristic diagrams of each time slice under the three view angle diagrams through steps S31 and S32;
s5, fusing the SEHM characteristic diagrams of the time slices corresponding to each other under the three view angle diagrams to obtain a fused SEHM characteristic diagram taking the time slices as units;
s6, the fused SEHM feature maps of the time slices form an SEHM feature map sequence, the SEHM feature map sequence is input into a neural network, the neural network outputs a list of probability vectors P representing the possibility of each action, and the action recognition result of the current N-frame depth map sequence is determined according to the output probability vectors P.
In the above-described aspect, the motion recognition method performs motion recognition based on the SEHM feature map when performing motion recognition. By reasonably selecting parameters such as time slice length in the algorithm, calculating a corresponding SEHM characteristic diagram sequence and applying the sequence to a neural network for prediction, the functions of off-line recognition and on-line recognition can be realized on action recognition. And because the constructed SEHM characteristic diagram is related to the front and back change of the overall gesture of the action, the action information in the action change process can be fully utilized, and the accuracy of action recognition is improved. Meanwhile, certain compression is carried out on the original data when the SEHM characteristic diagram is calculated, the complexity of the method and the requirement on hardware are low, and online real-time action recognition can be achieved.
Preferably, the SEHM feature maps are calculated for the N frame state diagram sequences under the three view angle diagrams respectively, and then the calculated SEHM feature maps under the three view angle diagrams are fused to obtain a global SEHM feature map; in step S6, the global SEHM feature map and the SEHM feature maps of the time slices form an SEHM feature map sequence, and the SEHM feature map sequence is input to the neural network for motion recognition. Through the arrangement, the action characteristics of the whole time period length can be taken into account, and the accuracy of action recognition can be further improved.
Preferably, in step S1, when selecting the N-frame depth map sequence for motion recognition, the N-frame depth map sequence is selected through a sliding window, where the sliding window includes a window size value m indicating a time length from a starting frame of a next selected depth map sequence to a starting frame of a last selected depth map sequence. A video segment can select a plurality of time segments with the length of N frames in a sliding window mode for action recognition, and finally, the model can respectively give result prediction of each segment.
Preferably, the epsilon is 30.
Preferably, K is 10.
Preferably, N-80.
Preferably, in step S5, the SEHM feature maps of the time slices corresponding to the front view, the side view and the top view are fused according to a ratio of 2:1: 1.
Preferably, the neural network comprises a convolutional layer, a magnetization layer, an LSTM layer, a fully-connected layer, and a Softmax layer;
wherein the convolutional layer and the magnetization layer are used for extracting high-level features from the SEHM feature map sequence;
the LSTM layer is used for performing context processing on the high-level features of the extracted feature map sequence and outputting the high-level features with better recognition effect and time sequence information;
the full connection layer and the Softmax layer are used for receiving high-level characteristics output by the LSTM layer or the convolution layer and the magnetization layer and outputting a column of prediction probability vectors P.
Preferably, the probability vector P comprises a number of probabilities PiWherein p isiIndicating a motionProbability of being identified as action i;
the process of determining the motion recognition result in step S6 is as follows:
setting a threshold value rho with a value between 0 and 1, and if the probability of no action in the probability vector P is greater than rho, considering the action in the N-frame depth map sequence as a meaningless action; otherwise, the action with the maximum recognition probability value is taken as a recognition result to be output.
Preferably, ρ is 0.5.
Compared with the prior art, the invention has the beneficial effects that:
the action recognition method provided by the invention is used for carrying out action recognition based on the SEHM characteristic diagram during action recognition. By reasonably selecting parameters such as time slice length in the algorithm, calculating a corresponding SEHM characteristic diagram sequence and applying the sequence to a neural network for prediction, the functions of off-line recognition and on-line recognition can be realized on action recognition. And because the constructed SEHM characteristic diagram is related to the front and back change of the overall gesture of the action, the action information in the action change process can be fully utilized, and the accuracy of action recognition is improved. Meanwhile, certain compression is carried out on the original data when the SEHM characteristic diagram is calculated, the complexity of the method and the requirement on hardware are low, and online real-time action recognition can be achieved.
Drawings
Fig. 1 is an exploded view of a sequence of SEHM profiles of a waving motion.
Fig. 2 is a block diagram of the overall neural network with an LSTM layer used in the embodiment.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the invention is further illustrated below with reference to the figures and examples.
Example 1
Different people have their own definitions and understandings of the same action, and one of the most obvious manifestations is the difference of the action length. But also the same person does the same action at different times for subjective reasons. Most existing methods generally only merge the whole depth video sequence into a new feature map. However, this results in a large part of the spatiotemporal information in the video sequence being lost, and especially, a large amount of information is easily lost due to a relatively large gesture overlapping part such as one-hand crossing in front of the body. To reduce the loss of information, the present invention proposes SEHM feature maps (segment energy history maps).
For a depth video sequence with a time period of N frames, a depth Map of each frame is projected to three orthogonal view maps (Map)f,Maps,Mapt): front view, side view, top view. The energy map calculation is then performed for the sequence of depth maps for each view. For the depth map sequence under each view, the invention calculates the difference value (the next frame minus the previous frame) of two adjacent frames in the sequence as an energy map. Each energy map represents the distance change between the previous and the next frame. According to the specific values of each energy map obtained, the invention divides them into binary maps of three states according to the threshold value: forward state, backward state, static state. The method comprises the following specific steps:
Figure BDA0001172412190000061
wherein
Figure BDA0001172412190000062
Is the energy map of the t frame under the view map v; epsilon is a set threshold;
Figure BDA0001172412190000063
Figure BDA0001172412190000064
representing the absolute value of the difference of the next frame minus the previous frame; i is 1,2 and 3, which respectively represent a forward state binary diagram, a backward state binary diagram and a static binary diagram; three state diagrams for the tth frame
Figure BDA0001172412190000071
By a three-channel matrix EMtCarrying out representation;
through calculation, energy map sequences under different viewing angles can be obtained. But the energy map cannot be applied directly to the neural network as input data because:
1. millions of data sets are often used for image recognition to obtain a good effect; the length of some simple actions generally has tens to hundreds of frames of pictures, and each main body is a person with similar appearance; compared with image recognition, motion recognition requires far more data sets to achieve similar effects than the former. Therefore, if each frame of data is used as input, a larger data set is required to obtain a more appreciable result when training the model.
2. Because the LSTM layer in the neural network needs to consider the context of all input sequences, if each frame in the video is taken as an input unit, the selected time period is appropriate but the calculation amount is large and the requirement on hardware is high; or the selected time period is too short to influence the training result of the model.
In summary, the invention performs appropriate compression and combination, i.e. SEHM feature map calculation, on the original depth sequence.
When all the energy map sequences in each view in the current time period are calculated, the energy maps of each frame can be synthesized into the SEHM feature map. In order to consider the real-time performance of the algorithm, the specific situation of the motion data needs to be comprehensively considered, and the appropriate values of N and K are selected to be used for calculating the SEHM characteristic diagram. Meanwhile, in order to achieve the functions of front and back multi-action video identification, online identification and real-time performance, the video is divided into a plurality of time periods for identification respectively in a sliding window mode. For example, if the length of a certain video is 120 frames, and each time 80 frames are taken as the length of a time period, and the sliding window is 40 frames, the SEHM feature map sequence calculation needs to be performed on the depth map sequences of 1 to 80 frames and 41 to 120 frames respectively. Calculating to obtain an SEHM characteristic map sequence of two time periods; and motion recognition results of two time periods are respectively obtained through the neural network model, so that functions such as online recognition and the like are realized.
For an energy map sequence with a certain time slot length of N frames, the N frame state map sequences of the three view maps are respectively and averagely divided into S time slices according to the front and back sequence, wherein S is N/K, and K represents the length of each time slice; for the state diagram sequence under each view angle diagram, sequentially selecting the state diagram sequence of a time slice from front to back to calculate the SEHM characteristic diagram:
s31, assuming that the state diagram sequence of the time slice selected for calculation for the p-th time is started from the (p-1) × K +1 frame of the state diagram sequence of the N frames and ended at the p × K frame, the SEHM feature map of the time slice is calculated by the following formula and step S32:
SEHMp=max(SEHMp,EM(p-1)*K+k·k)
wherein k has an initial value of 1, SEHMpIs a three-channel matrix with an initial value set to zero;
s32. let k be k +1 and then execute the formula of step S31 until k>K, finally outputting the SEHM after standardization processingpAn SEHM profile as the p-th time slice selected for calculation;
after the feature maps for the time slices are computed, a global SEHM feature map for the entire time slice is similarly computed. The global SEHM feature map starts at the first frame of the time period and ends at the last frame. Through the above operations, the SEHM feature map is compressed and retains key pose information in the video. For general speed and complexity actions, N80 and K10 may be considered.
And in order to obtain the final recognition result, the SEHM feature maps at three visual angles are required to be fused. Considering the ability of the neural network to handle the local and global relationships of the pictures, the present invention combines the SEHM feature maps of each corresponding time slice or the global SEHM feature maps of the time slices at multiple viewing angles into a final SEHM feature map in a ratio of 2:1: 1. The final SEHM feature map is then passed to a neural network to extract features. Fig. 1 shows the structure composition of the combined final SEHM profile sequence.
For the pattern recognition method, except for extracting features from the original data, the algorithm model is the most important part. Because the SEHM eigenmap sequence has been compressed and ordered backwards and forwards, a model with ordered input handling, such as the LSTM (long shortterm memory) layer, may work well. While LSTM has achieved great success in the natural language and speech domains, it has also begun to be referred to the image domain in recent years.
The deep neural network can be selected to preprocess the model as it can perform better effect when the data set is larger. The Alexnet network model is an image recognition model for RGB maps, where a person is one recognition type of his task. Considering that the SEHM characteristic diagram is a three-channel characteristic diagram obviously having the contour characteristics of the human body, it can be considered that the SEHM characteristic diagram of the present invention has better results when retraining the SEHM characteristic diagram with the parameters of the convolution layer and the magnetization layer of Alexnet as initial values. The network structure of the convolution layer and the magnetization layer of the Alexnet network is used as the front section of the neural network structure, and the LSTM network layer is connected to the rear section of the neural network model, so that the training speed of the front section of the model can be accelerated, and the precision can be improved. The overall structure of the neural network model is shown in fig. 2.
As can be seen from fig. 2, both the global SEHM feature map and the sequence of SEHM feature maps undergo convolution magnetization layer extraction of high-level features; the difference is that the SEHM characteristic diagram sequence can provide better advanced characteristics through LSTM layer processing because of the existence of the context information; whereas the global SEHM profile does not need to pass through the LSTM layer for information covering the entire time period. Finally, inputting the high-level characteristics into the full-link layer and the Softmax layer to obtain a list of probability vectors P (wherein each item P in the vectorsiRepresenting the probability of being judged as a class).
For a probability vector P for a certain time period, a threshold P between 0 and 1 may be defined, if there is no classification P in the probability vectoriIf the value is larger than rho, the action in the time period is regarded as meaningless action; else take the probability piThe largest category is taken as the predicted action.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (8)

1. An action recognition method based on an SEHM characteristic diagram sequence is characterized in that: the method comprises the following steps:
s1, aiming at a depth map sequence with a selected time interval of N frames in a video, projecting a depth map of each frame in the depth map sequence to different planes of three orthogonal coordinate systems to obtain three orthogonal view angle maps: front, side and top views;
s2, calculating the difference value of two adjacent frames of the depth image sequence under each visual angle image to serve as an energy image, wherein each frame of energy image represents the distance change of the previous frame and the next frame; then, the energy diagram is divided into three state diagrams according to the specific values of the energy diagram and the set threshold value: a binary map for forward state, a binary map for backward state, or a static binary map; the method comprises the following specific steps:
Figure FDA0002209203910000011
wherein
Figure FDA0002209203910000012
Is the energy map of the t frame under the view map v; epsilon is a set threshold;
Figure FDA0002209203910000013
Figure FDA0002209203910000014
representing the absolute value of the difference of the next frame minus the previous frame; i is 1,2 and 3, which respectively represent a forward state binary diagram, a backward state binary diagram and a static binary diagram; state diagram of the tth frame
Figure FDA0002209203910000015
By a three-channel matrix EMtCarrying out representation;
s3, after the step S2 is executed, state diagram sequences under the three view angle diagrams are obtained respectively; respectively averagely dividing N frame state diagram sequences of the three view diagrams into S time slices according to the front and back orders, wherein S is N/K, and K represents the length of each time slice; for the state diagram sequence under each view angle diagram, sequentially selecting the state diagram sequence of a time slice from front to back to calculate the SEHM characteristic diagram:
s31, assuming that the state diagram sequence of the time slice selected for calculation for the p-th time is started from the (p-1) × K +1 frame of the state diagram sequence of the N frames and ended at the p × K frame, the SEHM feature map of the time slice is calculated by the following formula and step S32:
SEHMp=max(SEHMp,EM(p-1)*K+k·k)
wherein k has an initial value of 1, SEHMpIs a three-channel matrix with an initial value set to zero;
s32, enabling K to be K +1, then executing the formula in the step S31 until K is larger than K, and finally outputting SEHM after standardization processingpAn SEHM profile as the p-th time slice selected for calculation;
s4, obtaining SEHM characteristic diagrams of each time slice under the three view angle diagrams through steps S31 and S32;
s5, fusing the SEHM characteristic diagrams of the time slices corresponding to each other under the three view angle diagrams to obtain a fused SEHM characteristic diagram taking the time slices as units;
s6, the fused SEHM characteristic maps of the time slices form an SEHM characteristic map sequence, the SEHM characteristic map sequence is input into a neural network, the neural network outputs a list of probability vectors P representing the possibility of each action, and the action recognition result of the current N-frame depth map sequence is determined according to the output probability vectors P;
respectively calculating the SEHM characteristic diagrams of the N frame state diagram sequences under the three view angle diagrams, and then fusing the calculated SEHM characteristic diagrams under the three view angle diagrams to obtain a global SEHM characteristic diagram; in step S6, the global SEHM feature map and the SEHM feature maps of the time slices form an SEHM feature map sequence, and the SEHM feature map sequence is input to the neural network for motion recognition.
2. The method of claim 1 for motion recognition based on an SEHM feature map sequence, wherein: in step S1, when the N-frame depth map sequence is selected for motion recognition, the N-frame depth map sequence is selected through a sliding window, where the sliding window includes a window size value m indicating a time length from a start frame of a next selected depth map sequence to a start frame of a last selected depth map sequence.
3. The method of claim 1 for motion recognition based on an SEHM feature map sequence, wherein: and epsilon is 30.
4. The method of claim 1 for motion recognition based on an SEHM feature map sequence, wherein: and K is 10.
5. The method of claim 1 for motion recognition based on an SEHM feature map sequence, wherein: and N is 80.
6. The method of claim 1 for motion recognition based on an SEHM feature map sequence, wherein: in step S5, the SEHM feature maps of the time slices corresponding to the front view, the side view and the top view are fused according to the ratio of 2: 1.
7. The method of claim 1 for motion recognition based on an SEHM feature map sequence, wherein: the neural network comprises a convolutional layer, a magnetization layer, an LSTM layer, a full connection layer and a Softmax layer;
wherein the convolutional layer and the magnetization layer are used for extracting high-level features from the SEHM feature map sequence;
the LSTM layer is used for performing context processing on the high-level features of the extracted feature map sequence and outputting the high-level features with better recognition effect and time sequence information;
the full connection layer and the Softmax layer are used for receiving high-level characteristics output by the LSTM layer or the convolution layer and the magnetization layer and outputting a column of prediction probability vectors P.
8. The method of claim 1 for motion recognition based on an SEHM feature map sequence, wherein: the probability vector P comprises a number of probabilities PiWherein p isiRepresenting the probability of the motion being recognized as motion i;
the process of determining the motion recognition result in step S6 is as follows:
setting a threshold value rho with a value between 0 and 1, and if the probability of no action in the probability vector P is greater than rho, considering the action in the N-frame depth map sequence as a meaningless action; otherwise, the action with the maximum recognition probability value is taken as a recognition result to be output.
CN201611110573.5A 2016-12-06 2016-12-06 Motion recognition method based on SEHM characteristic diagram sequence Active CN106778576B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611110573.5A CN106778576B (en) 2016-12-06 2016-12-06 Motion recognition method based on SEHM characteristic diagram sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611110573.5A CN106778576B (en) 2016-12-06 2016-12-06 Motion recognition method based on SEHM characteristic diagram sequence

Publications (2)

Publication Number Publication Date
CN106778576A CN106778576A (en) 2017-05-31
CN106778576B true CN106778576B (en) 2020-05-26

Family

ID=58874488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611110573.5A Active CN106778576B (en) 2016-12-06 2016-12-06 Motion recognition method based on SEHM characteristic diagram sequence

Country Status (1)

Country Link
CN (1) CN106778576B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944376A (en) * 2017-11-20 2018-04-20 北京奇虎科技有限公司 The recognition methods of video data real-time attitude and device, computing device
CN110633004B (en) * 2018-06-21 2023-05-26 杭州海康威视数字技术股份有限公司 Interaction method, device and system based on human body posture estimation
CN109002780B (en) * 2018-07-02 2020-12-18 深圳码隆科技有限公司 Shopping flow control method and device and user terminal
CN110138681B (en) * 2019-04-19 2021-01-22 上海交通大学 Network flow identification method and device based on TCP message characteristics

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886293A (en) * 2014-03-21 2014-06-25 浙江大学 Human body behavior recognition method based on history motion graph and R transformation
CN104636725A (en) * 2015-02-04 2015-05-20 华中科技大学 Gesture recognition method based on depth image and gesture recognition system based on depth images
CN105608421A (en) * 2015-12-18 2016-05-25 中国科学院深圳先进技术研究院 Human movement recognition method and device
CN105631415A (en) * 2015-12-25 2016-06-01 中通服公众信息产业股份有限公司 Video pedestrian recognition method based on convolution neural network
CN105740773A (en) * 2016-01-25 2016-07-06 重庆理工大学 Deep learning and multi-scale information based behavior identification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886293A (en) * 2014-03-21 2014-06-25 浙江大学 Human body behavior recognition method based on history motion graph and R transformation
CN104636725A (en) * 2015-02-04 2015-05-20 华中科技大学 Gesture recognition method based on depth image and gesture recognition system based on depth images
CN105608421A (en) * 2015-12-18 2016-05-25 中国科学院深圳先进技术研究院 Human movement recognition method and device
CN105631415A (en) * 2015-12-25 2016-06-01 中通服公众信息产业股份有限公司 Video pedestrian recognition method based on convolution neural network
CN105740773A (en) * 2016-01-25 2016-07-06 重庆理工大学 Deep learning and multi-scale information based behavior identification method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
3D Motion Trail Model based Pyramid Histograms of Oriented Gradient for Action Recognition;Bin Liang等;《2014 22nd International Conference on Pattern Recognition》;20140828;1952-1957 *
Binarized-BLSTM-RNN based Human Activity Recognition;Marcus Edel等;《2016 INTERNATIONAL CONFERENCE ON INDOOR POSITIONING AND INDOOR NAVIGATION (IPIN)》;20161007;1-7 *
DMM-Pyramid B ased Deep Architectures for Action R ecognition with Depth Cameras;Rui Yang等;《ACCV 2014》;20150417;37-49 *
Recognizing Actions Using Depth Motion Maps-based Histograms of Oriented Gradients;Xiaodong Yang等;《MM` 12》;20121102;1-4 *

Also Published As

Publication number Publication date
CN106778576A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
Luo et al. 3d human motion estimation via motion compression and refinement
Gu et al. Dynamic facial analysis: From bayesian filtering to recurrent neural network
Wang et al. Hidden‐Markov‐models‐based dynamic hand gesture recognition
Del Rincón et al. Tracking human position and lower body parts using Kalman and particle filters constrained by human biomechanics
CN112530019B (en) Three-dimensional human body reconstruction method and device, computer equipment and storage medium
CN114220176A (en) Human behavior recognition method based on deep learning
CN109685037B (en) Real-time action recognition method and device and electronic equipment
US20160086017A1 (en) Face pose rectification method and apparatus
CN106778576B (en) Motion recognition method based on SEHM characteristic diagram sequence
US11282257B2 (en) Pose selection and animation of characters using video data and training techniques
Slama et al. Grassmannian representation of motion depth for 3D human gesture and action recognition
Kumar et al. Indian sign language recognition using graph matching on 3D motion captured signs
EP3198522A1 (en) A face pose rectification method and apparatus
CN113312973A (en) Method and system for extracting features of gesture recognition key points
CN114241379A (en) Passenger abnormal behavior identification method, device and equipment and passenger monitoring system
Bhuyan et al. Trajectory guided recognition of hand gestures having only global motions
Neverova Deep learning for human motion analysis
Kakumanu et al. A local-global graph approach for facial expression recognition
Ling et al. Human object inpainting using manifold learning-based posture sequence estimation
US11361467B2 (en) Pose selection and animation of characters using video data and training techniques
Wenkai et al. Continuous gesture trajectory recognition system based on computer vision
Funes Mora et al. Eyediap database: Data description and gaze tracking evaluation benchmarks
Otberdout et al. Hand pose estimation based on deep learning depth map for hand gesture recognition
Flores et al. Person re-identification on a mobile robot using a depth camera
CN115220574A (en) Pose determination method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant