CN110135249A - Human bodys' response method based on time attention mechanism and LSTM - Google Patents
Human bodys' response method based on time attention mechanism and LSTM Download PDFInfo
- Publication number
- CN110135249A CN110135249A CN201910271178.2A CN201910271178A CN110135249A CN 110135249 A CN110135249 A CN 110135249A CN 201910271178 A CN201910271178 A CN 201910271178A CN 110135249 A CN110135249 A CN 110135249A
- Authority
- CN
- China
- Prior art keywords
- lstm
- bone
- artis
- follows
- attention mechanism
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 29
- 230000004044 response Effects 0.000 title claims abstract description 28
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000010276 construction Methods 0.000 claims abstract description 15
- 230000007787 long-term memory Effects 0.000 claims abstract description 7
- 210000000988 bone and bone Anatomy 0.000 claims description 45
- 239000013598 vector Substances 0.000 claims description 36
- 238000000605 extraction Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 8
- 238000012544 monitoring process Methods 0.000 claims description 8
- 210000001217 buttock Anatomy 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 210000000707 wrist Anatomy 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 210000001364 upper extremity Anatomy 0.000 claims description 3
- 210000003557 bones of lower extremity Anatomy 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 15
- 230000009471 action Effects 0.000 abstract description 7
- 150000001875 compounds Chemical class 0.000 abstract description 6
- 230000006399 behavior Effects 0.000 description 6
- 230000033001 locomotion Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 210000003141 lower extremity Anatomy 0.000 description 2
- 210000004218 nerve net Anatomy 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000037081 physical activity Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The Human bodys' response method based on time attention mechanism and LSTM that the present invention provides a kind of, comprising steps of 1) obtaining the video data of RGB monocular vision sensor;2) 2D skeleton joint point data is extracted;3) artis co-ordinative construction feature is extracted;4) LSTM shot and long term memory network is constructed;5) time attention mechanism is added in LSTM network;6) Human bodys' response is carried out using softmax classifier.The present invention can improve the universality, real-time and the accuracy rate to compound action identification of the Activity recognition system of view-based access control model.
Description
Technical field
The present invention relates to the technical fields of Human bodys' response, refer in particular to a kind of based on time attention mechanism and LSTM
Human bodys' response method.
Background technique
In recent years, Human bodys' response technology has a wide range of applications in production and living.On the one hand, the hair of smart home
Exhibition puts forward higher requirements the action recognition of machine Human To Human and understanding, and on the other hand, the transition of industry makes industry tend to intelligence
Energyization development, Human bodys' response are widely used in the fields such as human-computer interaction and the man-machine collaboration of industrial robot.In addition,
With the development of video media and popularizing for visual sensor, Human bodys' response technology is in tele-medicine, family's monitoring and city
City's security monitoring etc. plays an important role.RGB+D video becomes current behavior identification since it includes information abundant
The hot spot of research.
Currently, mainly using the sensor of view-based access control model and based on depth nerve net in terms of Human bodys' response technical research
The method of network, but it is also faced with following problem at present:
1, the universality of deep vision sensor is poor: although based on the Activity recognition method of RGB+D video in experimental situation
There is higher precision, however since deep vision sensor real-time is poor, resolution ratio is low, higher cost, can only closely identify
Deng limitation, it is difficult to be popularized in real life.
2, the real-time of rgb video Activity recognition system is poor: since video contains bulk information, bringing for Activity recognition
While enough available informations, a large amount of redundancy is also brought, to reduce the speed of system operation, makes to prolong in practical application
The slow time is long, and real-time is poor.
3, the accuracy of identification of complex background and compound action is low: for compound action, current most of Activity recognition methods
All be video sequence input deep neural network is subjected to feature extraction, however but ignore different frame in video sequence to movement
The percentage contribution of classification lacks the concern to key message so that the accuracy of identification of compound action drops in Human bodys' response system
It is low.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology and deficiency, proposes a kind of based on time attention mechanism
With the Human bodys' response method of LSTM, recognition accuracy is higher and universality is stronger, it is intended to which building is passed based on RGB monocular vision
The deep neural network model of sensor is to improve the universality in the Activity recognition system of view-based access control model;From rgb video stream
2D skeleton joint point is extracted, proposes a kind of structure feature extracting method based on skeletal joint point, by reducing Video Redundancy letter
It ceases to improve the processing speed of Activity recognition system to improve real-time;It is proposed a kind of LSTM of binding time attention mechanism
(shot and long term memory network) model, to improve the accuracy rate of Activity recognition.
To achieve the above object, technical solution provided by the present invention are as follows: the people based on time attention mechanism and LSTM
Body Activity recognition method, comprising the following steps:
1) video data of RGB monocular vision sensor is obtained;
2) 2D skeleton joint point data is extracted;
3) artis co-ordinative construction feature is extracted;
4) LSTM shot and long term memory network is constructed;
5) time attention mechanism is added in LSTM network;
6) Human bodys' response is carried out using softmax classifier.
In step 1), the video data of RGB monocular vision sensor is obtained, comprising the following steps:
1.1) RGB monocular vision sensor is installed on monitoring area, obtains data in real time;
1.2) server is connected to front end codec, real time video data is downloaded by stream media protocol;
1.3) using the IP connection mode of iSCSI, the storage equipment of the transmission of video that will acquire to server is stored;
1.4) video data of acquisition is pre-processed, and data is sent to artis extraction module and are handled.
In step 2), 2D skeleton joint point data is extracted, comprising the following steps:
2.1) video is subjected to segment processing according to every 10 seconds durations;
2.2) after input picture, picture size length and width are appointed as 368*368;
2.3) OpenPose frame is called, the picture input CNN network of specified size is extracted into part confidence
Maps and part affinity fields;
2.4) list is established, for storing detect from picture 18 artis;
2.5) even matching is used to find out part association, artis is connected to the entirety to form human synovial
Skeleton.
In step 3), artis co-ordinative construction feature is extracted, comprising the following steps:
3.1) 2 acquired dimension skeleton joint point coordinates are defined are as follows:
pi(x,y)
3.2) defining extracted two-dimensional framework joint point set is vector J, and J is expressed as follows:
J={ p1,p2,...,p18}
3.3) bone vector between two artis is normalized, normalized vector calculates as follows:
Wherein, piAnd pjIndicate two adjacent artis, | | pi-pj| | the Euclidean distance between two o'clock calculates such as
Under:
3.4) bone vector characteristics are calculated, i.e., adjacent segment point are formed by connecting bone vector, select four groups of upper limb respectively
With four groups of the lower limb bone vectors as present embodiment, according to artis definition rule, bone vector characteristics set S is defined
Are as follows:
S={ B2,3,B3,4,B5,6,B6,7,B8,9,B9,10,B11,12,B12,13}
3.5) bone angle character is calculated, using left wrist, the angle of left shoulder and the left buttocks of difference, right wrist, right shoulder and right stern
The angle in portion defines artis p as bone space angleiAnd pjFor the angle theta where being projected in three-dimensional space in X/Y plane
Are as follows:
Bone angle character set θ is defined as:
θ=(θ4,8,θ2,8,θ5,11,θ7,11)
3.6) bone length feature is calculated, select bone length as biasing to describe the globality difference of human skeleton,
Using backbone vector, i.e., the distance between left buttocks, two nodes of right hips and neck node are used as bone length feature, bone
Length characteristic D set is defined as:
D=D1,8+D1,11
Wherein, if artis i is connected with artis j;
Dij=| | pi-pj||
3.7) skeleton joint point co-ordinative construction feature is calculated, by bone vector characteristics, bone angle character and bone length
Feature carries out linear mosaic, forms the co-ordinative construction feature of skeleton joint point, indicates are as follows:
Feature={ S, θ, D }.
In step 4), LSTM shot and long term memory network is constructed, specific as follows:
It hides in layer unit internal structure, the status level line of uppermost hidden unit is by hidden unit state from upper one
A moment is transmitted to next moment, only comprising a small number of linear transformation operations;
LSTM includes three " door " structures, input gate it, forget a ftWith out gate ot;Each door has sigmoid function
Multiply operation with step-by-step, so that hidden unit only remembers useful information as far as possible, abandons useless information;
To being calculated inside LSTM hidden unit, forget in door, WfIndicate input vector forgets weight, bfExpression is forgotten
Note biasing, forgets that calculating is as follows:
ft=σ (Wf·[ht-1,xt]+bf)
In input gate, WiIndicate the update weight of input vector, biIt indicates to update biasing, input gate calculates as follows:
it=σ (Wi·[ht-1,xt]+bi)
C is the state of hidden unit, and hidden unit calculates as follows:
Ct=ft*Ct-1+it*tanh(WC·[ht-1,xt]+bC)
In out gate, WoFor the output weight of input vector, boFor output biasing, out gate calculates as follows:
ot=σ (Wo·[ht-1,xt]+bo)
Finally calculate output layer h:
ht=ot*tanh(Ct)
Wherein, x is input layer, and h is output layer, ht-1For the output unit at t-1 moment, xt-1For the input list at t-1 moment
Member.
In step 5), time attention mechanism is added in LSTM network, comprising the following steps:
5.1) the expression y of some part of Input context information c and current datai;
5.2) tanh layers of calculating m are used1, m2..., mn, by yiIt is polymerize with c, if the weight of c is Wcm, yiWeight
For Wym, then miIt calculates as follows:
mi=tanh (Wcmc+Wymyi)
5.3) each weight after polymerization is calculated by softmax function:
Wherein, siIt is miThe softmax value on learning direction is projected, so softmax is considered to obtain according to context c
The most correlation arrived;
5.4) all y are calculatediWeighted average as output valve z, weight indicates each variable with the correlation of context c
Property, z calculates as follows:
Z=∑isiyi。
In step 6), classified using softmax regression model classifier, comprising the following steps:
6.1) training dataset is constructed, is disclosed using the multi-modal Human bodys' response of Berkeley MHAD and UTD-MHAD
Data set;
6.2) a softmax classifier is added in the last layer of the LSTM model based on time attention mechanism,
Input of the output of LSTM the last layer as classifier obtains final disaggregated model by training classifier;
6.3) use the co-ordinative construction feature for the 2D artis extracted from rgb video as input, utilization is trained
Softmax classifier is classified.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, the Activity recognition method based on RGB monocular vision sensor, using the behavior characterizing method based on global characteristics,
Not only available motion information abundant identifies compound action with this, but also compared with being currently commonly used to Activity recognition field
The RGB+D depth camera of research is had lower cost and better universality, can be obtained compared with using wearable inertial sensor
More fully information is taken, the limitation of its wearing position and motion information is broken through, to enable behavior identification technology in reality
It is popularized in scene.
2, the extraction that rgb video data are carried out with skeletal joint point can not only be extracted to behavior classification more useful bone
Frame artis motion information, and a large amount of redundancy can be removed, so as to reduce space, the raising behavior of storing data
The speed of identification.In addition, a kind of co-ordinative construction feature extracting method of skeletal joint point is proposed, in removal complex background to human body
While the negative interference of Activity recognition, more effective character representation is carried out to raw skeleton artis, it is multiple so as to improve
The accuracy rate of Activity recognition under miscellaneous background.
3, Human bodys' response is carried out using time attention mechanism and LSTM model, can effectively solve the problem that depth nerve net
When network automatically extracts feature, the problem of assigning time series data property of equal importance.The relationship of video interframe is extracted with LSTM network, is used
Time attention mechanism is more concerned about network to the maximum key frame of Activity recognition contribution, to improve the identification to compound action
Accuracy rate.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart.
Fig. 2 is the skeleton artis definition rule schematic diagram that the present invention is extracted from RGB image.
Fig. 3 is LSTM neuronal structure schematic diagram.
Fig. 4 is attention Mechanism Model schematic diagram.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
As shown in Figures 1 to 4, the Human bodys' response based on time attention mechanism and LSTM provided by the present embodiment
Method includes the following steps:
1) video monitoring platform is established, obtains rgb video data using the monocular vision sensor of low cost, including following
Step:
1.1) RGB monocular vision sensor is installed on monitoring area, obtains data in real time;
1.2) server is connected to front end codec, real time video data is downloaded by stream media protocol;
1.3) the IP connection mode of iSCSI is used, the storage equipment of the transmission of video that will acquire to server is stored;
1.4) video data of acquisition is pre-processed, and data is sent to artis extraction module and are handled.
2) 2 dimension skeleton joint point datas are extracted from rgb video using OpenPose model, comprising the following steps:
2.1) in the present embodiment, for convenience of the extraction of skeleton joint point is carried out, by video according to every 10 seconds when progress
Row segment processing;
2.2) in present embodiment, specify image input having a size of 368*368;
2.3) OpenPose frame is called, picture input CNN network is extracted into part confidence maps and part
affinity fields;
2.4) list is established, 18 artis detected from picture are stored;
2.5) part association is found out using even matching, artis is connected to the entirety to form human synovial
Skeleton.
3) artis is normalized, calculates the co-ordinative construction feature of artis, comprising the following steps:
3.1) in the present embodiment, the definition rule of 18 skeleton joint points is as shown in Fig. 2, 2 dimensions acquired in definition
Skeleton joint point coordinate are as follows:
pi(x,y)
3.2) vector J includes extracted two-dimensional framework joint point set, and J is defined as follows:
J={ p1,p2,...,p18}
3.3) bone vector is normalized between two artis, and normalized vector calculates as follows:
Wherein, piAnd pjIndicate two adjacent artis, | | pi-pj| | the Euclidean distance between two o'clock calculates such as
Under:
3.4) in the present embodiment, bone vector characteristics refer to the principle according to human structurology, by adjacent segment point
Be formed by connecting bone vector, selects four groups of upper limb and four groups of the lower limb bone vectors as present embodiment respectively, according to fig. 2 institute
The artis definition rule shown, bone vector characteristics set S is defined as:
S={ B2,3,B3,4,B5,6,B6,7,B8,9,B9,10,B11,12,B12,13}
3.5) in the present embodiment, bone angle character is using left wrist, left shoulder and the angle of left buttocks respectively, right wrist,
Right shoulder and the angle of right hips define artis p as bone space angleiAnd pjTo project institute in X/Y plane in three-dimensional space
Angle theta are as follows:
Bone angle character set θ is defined as:
θ=(θ4,8,θ2,8,θ5,11,θ7,11)
3.6) in the present embodiment, due to human body personalization difference, bone length is selected as biasing to describe human body
The globality difference of skeleton, using backbone vector, i.e., the conduct of the distance between left buttocks, two nodes of right hips and neck node
Bone length feature, bone length feature D set is defined as:
D=D1,8+D1,11
Wherein, if artis i is connected with artis j,
Dij=| | pi-pj||
3.7) in the present embodiment, bone vector characteristics, bone angle character and bone length feature are carried out linear
Splicing forms the co-ordinative construction feature of skeleton joint point, indicates are as follows:
Feature={ S, θ, D }
4) shot and long term memory network LSTM is constructed, is embodied as follows:
4.1) hide in layer unit internal structure, the status level line of uppermost hidden unit by hidden unit state from
A upper moment is transmitted to next moment, only comprising a small number of linear transformation operations, is conducive to the state for maintaining hidden unit
It is constant;
4.2) LSTM includes three special " door " structures, input gate it, forget a ftWith out gate ot.Each Men Douyou
Sigmoid function and step-by-step multiply operation, so that hidden unit only remembers useful information as far as possible, abandon useless information, from
And it solves the problems, such as to rely on for a long time;
4.3) to being calculated inside LSTM hidden unit, forget in door, WfIndicate input vector forgets weight, bfTable
Show and forget to bias, forgets that calculating is as follows:
ft=σ (Wf·[ht-1,xt]+bf)
In input gate, WiIndicate the update weight of input vector, biIt indicates to update biasing, input gate calculates as follows:
it=σ (Wi·[ht-1,xt]+bi)
C is the state of hidden unit, and hidden unit calculates as follows:
Ct=ft*Ct-1+it*tanh(WC·[ht-1,xt]+bC)
In out gate, WoFor the output weight of input vector, boFor output biasing, out gate calculates as follows:
ot=σ (Wo·[ht-1,xt]+bo)
Finally calculate output layer h:
ht=ot*tanh(Ct)
Wherein, x is input layer, and h is output layer, ht-1For the output unit at t-1 moment, xt-1For the input list at t-1 moment
Member.
5) time attention mechanism is added in LSTM network, extracts temporal aspect, is embodied as follows:
5.1) the expression y of some part of Input context information c and current datai;
5.2) tanh layers of calculating m are used1, m2..., mn, by yiIt is polymerize with c, if the weight of c is Wcm, yiWeight
For Wym, then miIt calculates as follows:
mi=tanh (Wcmc+Wymyi)
5.3) each weight after polymerization is calculated by softmax function:
Wherein, siIt is miThe softmax value on learning direction is projected, so softmax may be considered according to context
The most correlation that c is obtained;
5.4) all y are calculatediWeighted average as output valve z, weight indicates each variable with the correlation of context c
Property, z calculates as follows:
Z=∑isiyi
6) Human bodys' response is carried out using softmax classifier, specific implementation step is as follows:
6.1) training dataset is constructed, is disclosed using the multi-modal Human bodys' response of Berkeley MHAD and UTD-MHAD
Data set;
6.2) a softmax classifier is added in the last layer of the LSTM model based on time attention mechanism,
Input of the output of LSTM the last layer as classifier obtains final disaggregated model by training classifier;
6.3) it uses the co-ordinative construction feature for the 2D artis extracted from rgb video in step 3) as input, utilizes
Trained softmax classifier is classified.
In conclusion the Human bodys' response method provided by the present invention based on time attention mechanism and LSTM, structure
The deep neural network model based on RGB monocular vision sensor is built, can be improved the Activity recognition system in view-based access control model
Universality;2D skeleton joint point is extracted using OpenPose Open Framework in rgb video, is proposed a kind of based on skeletal joint point
Structure feature extracting method, processing speed and the raising of Activity recognition system can be improved by reducing Video Redundancy information
Real-time;The LSTM model for proposing a kind of binding time attention mechanism, can be improved the accuracy rate of the identification to complex behavior.
In addition, technical method provided by the invention can also be extended to human body exception monitoring, video monitoring, smart home, identity authentication with
And the various fields such as motion analysis, there is extensive research significance, be worthy to be popularized.
In above-described embodiment, included modules are that function logic according to the invention is divided, but simultaneously
It is not limited to above-mentioned division, as long as corresponding functions can be realized, the protection scope that is not intended to restrict the invention.
The above is the preferable embodiment of the present invention, but embodiments of the present invention are not by the limit of above-described embodiment
System, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention,
It should be equivalent substitute mode, be included within the scope of the present invention.
Claims (7)
1. the Human bodys' response method based on time attention mechanism and LSTM, which comprises the following steps:
1) video data of RGB monocular vision sensor is obtained;
2) 2D skeleton joint point data is extracted;
3) artis co-ordinative construction feature is extracted;
4) LSTM shot and long term memory network is constructed;
5) time attention mechanism is added in LSTM network;
6) Human bodys' response is carried out using softmax classifier.
2. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist
In, in step 1), the video data of acquisition RGB monocular vision sensor, comprising the following steps:
1.1) RGB monocular vision sensor is installed on monitoring area, obtains data in real time;
1.2) server is connected to front end codec, real time video data is downloaded by stream media protocol;
1.3) the IP connection mode of iSCSI is used, the storage equipment of the transmission of video that will acquire to server is stored;
1.4) video data of acquisition is pre-processed, and data is sent to artis extraction module and are handled.
3. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist
In, in step 2), extraction 2D skeleton joint point data, comprising the following steps:
2.1) video is subjected to segment processing according to every 10 seconds durations;
2.2) after input picture, picture size length and width are appointed as 368*368;
2.3) OpenPose frame is called, the picture input CNN network of specified size is extracted into part confidence maps
With part affinity fields;
2.4) list is established, for storing detect from picture 18 artis;
2.5) part association is found out using even matching, artis is connected to the whole bone to form human synovial
Frame.
4. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist
In, in step 3), extraction artis co-ordinative construction feature, comprising the following steps:
3.1) 2 acquired dimension skeleton joint point coordinates are defined are as follows:
pi(x,y)
3.2) defining extracted two-dimensional framework joint point set is vector J, and J is expressed as follows:
J={ p1,p2,...,p18}
3.3) bone vector between two artis is normalized, normalized vector calculates as follows:
Wherein, piAnd pjIndicate two adjacent artis, | | pi-pj| | the Euclidean distance between two o'clock calculates as follows:
3.4) bone vector characteristics are calculated, i.e., adjacent segment point are formed by connecting bone vector, select respectively four groups of upper limb and under
Four groups of the limb bone vectors as present embodiment, according to artis definition rule, bone vector characteristics set S is defined as:
S={ B2,3,B3,4,B5,6,B6,7,B8,9,B9,10,B11,12,B12,13}
3.5) bone angle character is calculated, using left wrist, left shoulder and the angle of left buttocks respectively, right wrist, right shoulder and right hips
Angle defines artis p as bone space angleiAnd pjFor the angle theta where being projected in three-dimensional space in X/Y plane are as follows:
Bone angle character set θ is defined as:
θ=(θ4,8,θ2,8,θ5,11,θ7,11)
3.6) bone length feature is calculated, select bone length as biasing to describe the globality difference of human skeleton, use
Backbone vector, i.e., the distance between left buttocks, two nodes of right hips and neck node are used as bone length feature, bone length
Feature D set is defined as:
D=D1,8+D1,11
Wherein, if artis i is connected with artis j;
Dij=| | pi-pj||
3.7) skeleton joint point co-ordinative construction feature is calculated, by bone vector characteristics, bone angle character and bone length feature
Linear mosaic is carried out, the co-ordinative construction feature of skeleton joint point is formed, is indicated are as follows:
Feature={ S, θ, D }.
5. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist
In, in step 4), LSTM shot and long term memory network is constructed, specific as follows:
Hide in layer unit internal structure, the status level line of uppermost hidden unit by hidden unit state from upper one when
Next moment is transmitted to quarter, only comprising a small number of linear transformation operations;
LSTM includes three " door " structures, input gate it, forget a ftWith out gate ot;Each door have sigmoid function and by
Position multiplies operation, so that hidden unit only remembers useful information as far as possible, abandons useless information;
To being calculated inside LSTM hidden unit, forget in door, WfIndicate input vector forgets weight, bfExpression is forgotten partially
It sets, forgets that calculating is as follows:
ft=σ (Wf·[ht-1,xt]+bf)
In input gate, WiIndicate the update weight of input vector, biIt indicates to update biasing, input gate calculates as follows:
it=σ (Wi·[ht-1,xt]+bi)
C is the state of hidden unit, and hidden unit calculates as follows:
Ct=ft*Ct-1+it*tanh(WC·[ht-1,xt]+bC)
In out gate, WoFor the output weight of input vector, boFor output biasing, out gate calculates as follows:
ot=σ (Wo·[ht-1,xt]+bo)
Finally calculate output layer h:
ht=ot*tanh(Ct)
Wherein, x is input layer, and h is output layer, ht-1For the output unit at t-1 moment, xt-1For the input unit at t-1 moment.
6. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist
In, in step 5), the addition time attention mechanism in LSTM network, comprising the following steps:
5.1) the expression y of some part of Input context information c and current datai;
5.2) tanh layers of calculating m are used1, m2..., mn, by yiIt is polymerize with c, if the weight of c is Wcm, yiWeight be Wym,
Then miIt calculates as follows:
mi=tanh (Wcmc+Wymyi)
5.3) each weight after polymerization is calculated by softmax function:
Wherein, siIt is miThe softmax value on learning direction is projected, so softmax is considered what foundation context c was obtained
Most correlation;
5.4) all y are calculatediWeighted average as output valve z, weight indicates each variable with the correlation of context c, z
It calculates as follows:
Z=∑isiyi。
7. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist
In being classified in step 6) using softmax regression model classifier, comprising the following steps:
6.1) training dataset is constructed, the multi-modal Human bodys' response public data of Berkeley MHAD and UTD-MHAD is used
Collection;
6.2) a softmax classifier is added in the last layer of the LSTM model based on time attention mechanism, LSTM
Input of the output of the last layer as classifier obtains final disaggregated model by training classifier;
6.3) use the co-ordinative construction feature for the 2D artis extracted from rgb video as input, utilization is trained
Softmax classifier is classified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910271178.2A CN110135249B (en) | 2019-04-04 | 2019-04-04 | Human behavior identification method based on time attention mechanism and LSTM (least Square TM) |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910271178.2A CN110135249B (en) | 2019-04-04 | 2019-04-04 | Human behavior identification method based on time attention mechanism and LSTM (least Square TM) |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110135249A true CN110135249A (en) | 2019-08-16 |
CN110135249B CN110135249B (en) | 2021-07-20 |
Family
ID=67569411
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910271178.2A Expired - Fee Related CN110135249B (en) | 2019-04-04 | 2019-04-04 | Human behavior identification method based on time attention mechanism and LSTM (least Square TM) |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110135249B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705390A (en) * | 2019-09-17 | 2020-01-17 | 平安科技(深圳)有限公司 | Body posture recognition method and device based on LSTM and storage medium |
CN110781771A (en) * | 2019-10-08 | 2020-02-11 | 北京邮电大学 | Abnormal behavior real-time monitoring method based on deep learning |
CN111310655A (en) * | 2020-02-13 | 2020-06-19 | 蒋营国 | Human body action recognition method and system based on key frame and combined attention model |
CN111368810A (en) * | 2020-05-26 | 2020-07-03 | 西南交通大学 | Sit-up detection system and method based on human body and skeleton key point identification |
CN111553229A (en) * | 2020-04-21 | 2020-08-18 | 清华大学 | Worker action identification method and device based on three-dimensional skeleton and LSTM |
CN111723667A (en) * | 2020-05-20 | 2020-09-29 | 同济大学 | Human body joint point coordinate-based intelligent lamp pole crowd behavior identification method and device |
CN111860267A (en) * | 2020-07-13 | 2020-10-30 | 浙大城市学院 | Multichannel body-building movement identification method based on human body bone joint point positions |
CN112149613A (en) * | 2020-10-12 | 2020-12-29 | 萱闱(北京)生物科技有限公司 | Motion estimation evaluation method based on improved LSTM model |
CN112257845A (en) * | 2020-10-12 | 2021-01-22 | 萱闱(北京)生物科技有限公司 | Press action recognition method based on improved LSTM model |
CN112528891A (en) * | 2020-12-16 | 2021-03-19 | 重庆邮电大学 | Bidirectional LSTM-CNN video behavior identification method based on skeleton information |
CN112560582A (en) * | 2020-11-24 | 2021-03-26 | 超越科技股份有限公司 | Real-time abnormal behavior monitoring method based on LSTM |
CN114973403A (en) * | 2022-05-06 | 2022-08-30 | 广州紫为云科技有限公司 | Efficient behavior prediction method based on space-time dual-dimension feature depth network |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104615983A (en) * | 2015-01-28 | 2015-05-13 | 中国科学院自动化研究所 | Behavior identification method based on recurrent neural network and human skeleton movement sequences |
CN108600701A (en) * | 2018-05-02 | 2018-09-28 | 广州飞宇智能科技有限公司 | A kind of monitoring system and method judging video behavior based on deep learning |
CN108764066A (en) * | 2018-05-08 | 2018-11-06 | 南京邮电大学 | A kind of express delivery sorting working specification detection method based on deep learning |
CN108776796A (en) * | 2018-06-26 | 2018-11-09 | 内江师范学院 | A kind of action identification method based on global spatio-temporal attention model |
CN108846332A (en) * | 2018-05-30 | 2018-11-20 | 西南交通大学 | A kind of railway drivers Activity recognition method based on CLSTA |
CN108875708A (en) * | 2018-07-18 | 2018-11-23 | 广东工业大学 | Behavior analysis method, device, equipment, system and storage medium based on video |
CN109508688A (en) * | 2018-11-26 | 2019-03-22 | 平安科技(深圳)有限公司 | Behavioral value method, terminal device and computer storage medium based on skeleton |
-
2019
- 2019-04-04 CN CN201910271178.2A patent/CN110135249B/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104615983A (en) * | 2015-01-28 | 2015-05-13 | 中国科学院自动化研究所 | Behavior identification method based on recurrent neural network and human skeleton movement sequences |
CN108600701A (en) * | 2018-05-02 | 2018-09-28 | 广州飞宇智能科技有限公司 | A kind of monitoring system and method judging video behavior based on deep learning |
CN108764066A (en) * | 2018-05-08 | 2018-11-06 | 南京邮电大学 | A kind of express delivery sorting working specification detection method based on deep learning |
CN108846332A (en) * | 2018-05-30 | 2018-11-20 | 西南交通大学 | A kind of railway drivers Activity recognition method based on CLSTA |
CN108776796A (en) * | 2018-06-26 | 2018-11-09 | 内江师范学院 | A kind of action identification method based on global spatio-temporal attention model |
CN108875708A (en) * | 2018-07-18 | 2018-11-23 | 广东工业大学 | Behavior analysis method, device, equipment, system and storage medium based on video |
CN109508688A (en) * | 2018-11-26 | 2019-03-22 | 平安科技(深圳)有限公司 | Behavioral value method, terminal device and computer storage medium based on skeleton |
Non-Patent Citations (1)
Title |
---|
SIJIE SONG 等: "An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data", 《ARXIV:1611.06067V1 [CS.CV]》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021051579A1 (en) * | 2019-09-17 | 2021-03-25 | 平安科技(深圳)有限公司 | Body pose recognition method, system, and apparatus, and storage medium |
CN110705390A (en) * | 2019-09-17 | 2020-01-17 | 平安科技(深圳)有限公司 | Body posture recognition method and device based on LSTM and storage medium |
CN110781771A (en) * | 2019-10-08 | 2020-02-11 | 北京邮电大学 | Abnormal behavior real-time monitoring method based on deep learning |
CN111310655A (en) * | 2020-02-13 | 2020-06-19 | 蒋营国 | Human body action recognition method and system based on key frame and combined attention model |
CN111553229A (en) * | 2020-04-21 | 2020-08-18 | 清华大学 | Worker action identification method and device based on three-dimensional skeleton and LSTM |
CN111723667A (en) * | 2020-05-20 | 2020-09-29 | 同济大学 | Human body joint point coordinate-based intelligent lamp pole crowd behavior identification method and device |
CN111368810A (en) * | 2020-05-26 | 2020-07-03 | 西南交通大学 | Sit-up detection system and method based on human body and skeleton key point identification |
CN111368810B (en) * | 2020-05-26 | 2020-08-25 | 西南交通大学 | Sit-up detection system and method based on human body and skeleton key point identification |
CN111860267A (en) * | 2020-07-13 | 2020-10-30 | 浙大城市学院 | Multichannel body-building movement identification method based on human body bone joint point positions |
CN111860267B (en) * | 2020-07-13 | 2022-06-14 | 浙大城市学院 | Multichannel body-building exercise identification method based on human body skeleton joint point positions |
CN112149613A (en) * | 2020-10-12 | 2020-12-29 | 萱闱(北京)生物科技有限公司 | Motion estimation evaluation method based on improved LSTM model |
CN112257845A (en) * | 2020-10-12 | 2021-01-22 | 萱闱(北京)生物科技有限公司 | Press action recognition method based on improved LSTM model |
CN112149613B (en) * | 2020-10-12 | 2024-01-05 | 萱闱(北京)生物科技有限公司 | Action pre-estimation evaluation method based on improved LSTM model |
CN112560582A (en) * | 2020-11-24 | 2021-03-26 | 超越科技股份有限公司 | Real-time abnormal behavior monitoring method based on LSTM |
CN112528891A (en) * | 2020-12-16 | 2021-03-19 | 重庆邮电大学 | Bidirectional LSTM-CNN video behavior identification method based on skeleton information |
CN114973403A (en) * | 2022-05-06 | 2022-08-30 | 广州紫为云科技有限公司 | Efficient behavior prediction method based on space-time dual-dimension feature depth network |
CN114973403B (en) * | 2022-05-06 | 2023-11-03 | 广州紫为云科技有限公司 | Behavior prediction method based on space-time double-dimension feature depth network |
Also Published As
Publication number | Publication date |
---|---|
CN110135249B (en) | 2021-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135249A (en) | Human bodys' response method based on time attention mechanism and LSTM | |
CN110135375B (en) | Multi-person attitude estimation method based on global information integration | |
CN109829436B (en) | Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network | |
He et al. | Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features | |
CN110222580A (en) | A kind of manpower 3 d pose estimation method and device based on three-dimensional point cloud | |
CN111160294B (en) | Gait recognition method based on graph convolution network | |
CN113128424B (en) | Method for identifying action of graph convolution neural network based on attention mechanism | |
CN111444488A (en) | Identity authentication method based on dynamic gesture | |
CN110135277B (en) | Human behavior recognition method based on convolutional neural network | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
Neverova | Deep learning for human motion analysis | |
Sheu et al. | Improvement of human pose estimation and processing with the intensive feature consistency network | |
CN111611869B (en) | End-to-end monocular vision obstacle avoidance method based on serial deep neural network | |
CN117115911A (en) | Hypergraph learning action recognition system based on attention mechanism | |
CN117711066A (en) | Three-dimensional human body posture estimation method, device, equipment and medium | |
CN117576149A (en) | Single-target tracking method based on attention mechanism | |
Yang et al. | Human action recognition based on skeleton and convolutional neural network | |
Gadhiya et al. | Analysis of deep learning based pose estimation techniques for locating landmarks on human body parts | |
CN113469018B (en) | Multi-modal interactive behavior recognition method based on RGB and three-dimensional skeleton | |
Usman et al. | Skeleton-based motion prediction: A survey | |
Huang et al. | View-independent behavior analysis | |
Ramanathan et al. | Combining pose-invariant kinematic features and object context features for rgb-d action recognition | |
CN111178141B (en) | LSTM human body behavior identification method based on attention mechanism | |
CN115482481A (en) | Single-view three-dimensional human skeleton key point detection method, device, equipment and medium | |
Liang | Face recognition technology analysis based on deep learning algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210720 |