CN110135249A - Human bodys' response method based on time attention mechanism and LSTM - Google Patents

Human bodys' response method based on time attention mechanism and LSTM Download PDF

Info

Publication number
CN110135249A
CN110135249A CN201910271178.2A CN201910271178A CN110135249A CN 110135249 A CN110135249 A CN 110135249A CN 201910271178 A CN201910271178 A CN 201910271178A CN 110135249 A CN110135249 A CN 110135249A
Authority
CN
China
Prior art keywords
lstm
bone
artis
follows
attention mechanism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910271178.2A
Other languages
Chinese (zh)
Other versions
CN110135249B (en
Inventor
毕盛
谢澈澈
董敏
李永发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910271178.2A priority Critical patent/CN110135249B/en
Publication of CN110135249A publication Critical patent/CN110135249A/en
Application granted granted Critical
Publication of CN110135249B publication Critical patent/CN110135249B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The Human bodys' response method based on time attention mechanism and LSTM that the present invention provides a kind of, comprising steps of 1) obtaining the video data of RGB monocular vision sensor;2) 2D skeleton joint point data is extracted;3) artis co-ordinative construction feature is extracted;4) LSTM shot and long term memory network is constructed;5) time attention mechanism is added in LSTM network;6) Human bodys' response is carried out using softmax classifier.The present invention can improve the universality, real-time and the accuracy rate to compound action identification of the Activity recognition system of view-based access control model.

Description

Human bodys' response method based on time attention mechanism and LSTM
Technical field
The present invention relates to the technical fields of Human bodys' response, refer in particular to a kind of based on time attention mechanism and LSTM Human bodys' response method.
Background technique
In recent years, Human bodys' response technology has a wide range of applications in production and living.On the one hand, the hair of smart home Exhibition puts forward higher requirements the action recognition of machine Human To Human and understanding, and on the other hand, the transition of industry makes industry tend to intelligence Energyization development, Human bodys' response are widely used in the fields such as human-computer interaction and the man-machine collaboration of industrial robot.In addition, With the development of video media and popularizing for visual sensor, Human bodys' response technology is in tele-medicine, family's monitoring and city City's security monitoring etc. plays an important role.RGB+D video becomes current behavior identification since it includes information abundant The hot spot of research.
Currently, mainly using the sensor of view-based access control model and based on depth nerve net in terms of Human bodys' response technical research The method of network, but it is also faced with following problem at present:
1, the universality of deep vision sensor is poor: although based on the Activity recognition method of RGB+D video in experimental situation There is higher precision, however since deep vision sensor real-time is poor, resolution ratio is low, higher cost, can only closely identify Deng limitation, it is difficult to be popularized in real life.
2, the real-time of rgb video Activity recognition system is poor: since video contains bulk information, bringing for Activity recognition While enough available informations, a large amount of redundancy is also brought, to reduce the speed of system operation, makes to prolong in practical application The slow time is long, and real-time is poor.
3, the accuracy of identification of complex background and compound action is low: for compound action, current most of Activity recognition methods All be video sequence input deep neural network is subjected to feature extraction, however but ignore different frame in video sequence to movement The percentage contribution of classification lacks the concern to key message so that the accuracy of identification of compound action drops in Human bodys' response system It is low.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology and deficiency, proposes a kind of based on time attention mechanism With the Human bodys' response method of LSTM, recognition accuracy is higher and universality is stronger, it is intended to which building is passed based on RGB monocular vision The deep neural network model of sensor is to improve the universality in the Activity recognition system of view-based access control model;From rgb video stream 2D skeleton joint point is extracted, proposes a kind of structure feature extracting method based on skeletal joint point, by reducing Video Redundancy letter It ceases to improve the processing speed of Activity recognition system to improve real-time;It is proposed a kind of LSTM of binding time attention mechanism (shot and long term memory network) model, to improve the accuracy rate of Activity recognition.
To achieve the above object, technical solution provided by the present invention are as follows: the people based on time attention mechanism and LSTM Body Activity recognition method, comprising the following steps:
1) video data of RGB monocular vision sensor is obtained;
2) 2D skeleton joint point data is extracted;
3) artis co-ordinative construction feature is extracted;
4) LSTM shot and long term memory network is constructed;
5) time attention mechanism is added in LSTM network;
6) Human bodys' response is carried out using softmax classifier.
In step 1), the video data of RGB monocular vision sensor is obtained, comprising the following steps:
1.1) RGB monocular vision sensor is installed on monitoring area, obtains data in real time;
1.2) server is connected to front end codec, real time video data is downloaded by stream media protocol;
1.3) using the IP connection mode of iSCSI, the storage equipment of the transmission of video that will acquire to server is stored;
1.4) video data of acquisition is pre-processed, and data is sent to artis extraction module and are handled.
In step 2), 2D skeleton joint point data is extracted, comprising the following steps:
2.1) video is subjected to segment processing according to every 10 seconds durations;
2.2) after input picture, picture size length and width are appointed as 368*368;
2.3) OpenPose frame is called, the picture input CNN network of specified size is extracted into part confidence Maps and part affinity fields;
2.4) list is established, for storing detect from picture 18 artis;
2.5) even matching is used to find out part association, artis is connected to the entirety to form human synovial Skeleton.
In step 3), artis co-ordinative construction feature is extracted, comprising the following steps:
3.1) 2 acquired dimension skeleton joint point coordinates are defined are as follows:
pi(x,y)
3.2) defining extracted two-dimensional framework joint point set is vector J, and J is expressed as follows:
J={ p1,p2,...,p18}
3.3) bone vector between two artis is normalized, normalized vector calculates as follows:
Wherein, piAnd pjIndicate two adjacent artis, | | pi-pj| | the Euclidean distance between two o'clock calculates such as Under:
3.4) bone vector characteristics are calculated, i.e., adjacent segment point are formed by connecting bone vector, select four groups of upper limb respectively With four groups of the lower limb bone vectors as present embodiment, according to artis definition rule, bone vector characteristics set S is defined Are as follows:
S={ B2,3,B3,4,B5,6,B6,7,B8,9,B9,10,B11,12,B12,13}
3.5) bone angle character is calculated, using left wrist, the angle of left shoulder and the left buttocks of difference, right wrist, right shoulder and right stern The angle in portion defines artis p as bone space angleiAnd pjFor the angle theta where being projected in three-dimensional space in X/Y plane Are as follows:
Bone angle character set θ is defined as:
θ=(θ4,82,85,117,11)
3.6) bone length feature is calculated, select bone length as biasing to describe the globality difference of human skeleton, Using backbone vector, i.e., the distance between left buttocks, two nodes of right hips and neck node are used as bone length feature, bone Length characteristic D set is defined as:
D=D1,8+D1,11
Wherein, if artis i is connected with artis j;
Dij=| | pi-pj||
3.7) skeleton joint point co-ordinative construction feature is calculated, by bone vector characteristics, bone angle character and bone length Feature carries out linear mosaic, forms the co-ordinative construction feature of skeleton joint point, indicates are as follows:
Feature={ S, θ, D }.
In step 4), LSTM shot and long term memory network is constructed, specific as follows:
It hides in layer unit internal structure, the status level line of uppermost hidden unit is by hidden unit state from upper one A moment is transmitted to next moment, only comprising a small number of linear transformation operations;
LSTM includes three " door " structures, input gate it, forget a ftWith out gate ot;Each door has sigmoid function Multiply operation with step-by-step, so that hidden unit only remembers useful information as far as possible, abandons useless information;
To being calculated inside LSTM hidden unit, forget in door, WfIndicate input vector forgets weight, bfExpression is forgotten Note biasing, forgets that calculating is as follows:
ft=σ (Wf·[ht-1,xt]+bf)
In input gate, WiIndicate the update weight of input vector, biIt indicates to update biasing, input gate calculates as follows:
it=σ (Wi·[ht-1,xt]+bi)
C is the state of hidden unit, and hidden unit calculates as follows:
Ct=ft*Ct-1+it*tanh(WC·[ht-1,xt]+bC)
In out gate, WoFor the output weight of input vector, boFor output biasing, out gate calculates as follows:
ot=σ (Wo·[ht-1,xt]+bo)
Finally calculate output layer h:
ht=ot*tanh(Ct)
Wherein, x is input layer, and h is output layer, ht-1For the output unit at t-1 moment, xt-1For the input list at t-1 moment Member.
In step 5), time attention mechanism is added in LSTM network, comprising the following steps:
5.1) the expression y of some part of Input context information c and current datai
5.2) tanh layers of calculating m are used1, m2..., mn, by yiIt is polymerize with c, if the weight of c is Wcm, yiWeight For Wym, then miIt calculates as follows:
mi=tanh (Wcmc+Wymyi)
5.3) each weight after polymerization is calculated by softmax function:
Wherein, siIt is miThe softmax value on learning direction is projected, so softmax is considered to obtain according to context c The most correlation arrived;
5.4) all y are calculatediWeighted average as output valve z, weight indicates each variable with the correlation of context c Property, z calculates as follows:
Z=∑isiyi
In step 6), classified using softmax regression model classifier, comprising the following steps:
6.1) training dataset is constructed, is disclosed using the multi-modal Human bodys' response of Berkeley MHAD and UTD-MHAD Data set;
6.2) a softmax classifier is added in the last layer of the LSTM model based on time attention mechanism, Input of the output of LSTM the last layer as classifier obtains final disaggregated model by training classifier;
6.3) use the co-ordinative construction feature for the 2D artis extracted from rgb video as input, utilization is trained Softmax classifier is classified.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, the Activity recognition method based on RGB monocular vision sensor, using the behavior characterizing method based on global characteristics, Not only available motion information abundant identifies compound action with this, but also compared with being currently commonly used to Activity recognition field The RGB+D depth camera of research is had lower cost and better universality, can be obtained compared with using wearable inertial sensor More fully information is taken, the limitation of its wearing position and motion information is broken through, to enable behavior identification technology in reality It is popularized in scene.
2, the extraction that rgb video data are carried out with skeletal joint point can not only be extracted to behavior classification more useful bone Frame artis motion information, and a large amount of redundancy can be removed, so as to reduce space, the raising behavior of storing data The speed of identification.In addition, a kind of co-ordinative construction feature extracting method of skeletal joint point is proposed, in removal complex background to human body While the negative interference of Activity recognition, more effective character representation is carried out to raw skeleton artis, it is multiple so as to improve The accuracy rate of Activity recognition under miscellaneous background.
3, Human bodys' response is carried out using time attention mechanism and LSTM model, can effectively solve the problem that depth nerve net When network automatically extracts feature, the problem of assigning time series data property of equal importance.The relationship of video interframe is extracted with LSTM network, is used Time attention mechanism is more concerned about network to the maximum key frame of Activity recognition contribution, to improve the identification to compound action Accuracy rate.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart.
Fig. 2 is the skeleton artis definition rule schematic diagram that the present invention is extracted from RGB image.
Fig. 3 is LSTM neuronal structure schematic diagram.
Fig. 4 is attention Mechanism Model schematic diagram.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
As shown in Figures 1 to 4, the Human bodys' response based on time attention mechanism and LSTM provided by the present embodiment Method includes the following steps:
1) video monitoring platform is established, obtains rgb video data using the monocular vision sensor of low cost, including following Step:
1.1) RGB monocular vision sensor is installed on monitoring area, obtains data in real time;
1.2) server is connected to front end codec, real time video data is downloaded by stream media protocol;
1.3) the IP connection mode of iSCSI is used, the storage equipment of the transmission of video that will acquire to server is stored;
1.4) video data of acquisition is pre-processed, and data is sent to artis extraction module and are handled.
2) 2 dimension skeleton joint point datas are extracted from rgb video using OpenPose model, comprising the following steps:
2.1) in the present embodiment, for convenience of the extraction of skeleton joint point is carried out, by video according to every 10 seconds when progress Row segment processing;
2.2) in present embodiment, specify image input having a size of 368*368;
2.3) OpenPose frame is called, picture input CNN network is extracted into part confidence maps and part affinity fields;
2.4) list is established, 18 artis detected from picture are stored;
2.5) part association is found out using even matching, artis is connected to the entirety to form human synovial Skeleton.
3) artis is normalized, calculates the co-ordinative construction feature of artis, comprising the following steps:
3.1) in the present embodiment, the definition rule of 18 skeleton joint points is as shown in Fig. 2, 2 dimensions acquired in definition Skeleton joint point coordinate are as follows:
pi(x,y)
3.2) vector J includes extracted two-dimensional framework joint point set, and J is defined as follows:
J={ p1,p2,...,p18}
3.3) bone vector is normalized between two artis, and normalized vector calculates as follows:
Wherein, piAnd pjIndicate two adjacent artis, | | pi-pj| | the Euclidean distance between two o'clock calculates such as Under:
3.4) in the present embodiment, bone vector characteristics refer to the principle according to human structurology, by adjacent segment point Be formed by connecting bone vector, selects four groups of upper limb and four groups of the lower limb bone vectors as present embodiment respectively, according to fig. 2 institute The artis definition rule shown, bone vector characteristics set S is defined as:
S={ B2,3,B3,4,B5,6,B6,7,B8,9,B9,10,B11,12,B12,13}
3.5) in the present embodiment, bone angle character is using left wrist, left shoulder and the angle of left buttocks respectively, right wrist, Right shoulder and the angle of right hips define artis p as bone space angleiAnd pjTo project institute in X/Y plane in three-dimensional space Angle theta are as follows:
Bone angle character set θ is defined as:
θ=(θ4,82,85,117,11)
3.6) in the present embodiment, due to human body personalization difference, bone length is selected as biasing to describe human body The globality difference of skeleton, using backbone vector, i.e., the conduct of the distance between left buttocks, two nodes of right hips and neck node Bone length feature, bone length feature D set is defined as:
D=D1,8+D1,11
Wherein, if artis i is connected with artis j,
Dij=| | pi-pj||
3.7) in the present embodiment, bone vector characteristics, bone angle character and bone length feature are carried out linear Splicing forms the co-ordinative construction feature of skeleton joint point, indicates are as follows:
Feature={ S, θ, D }
4) shot and long term memory network LSTM is constructed, is embodied as follows:
4.1) hide in layer unit internal structure, the status level line of uppermost hidden unit by hidden unit state from A upper moment is transmitted to next moment, only comprising a small number of linear transformation operations, is conducive to the state for maintaining hidden unit It is constant;
4.2) LSTM includes three special " door " structures, input gate it, forget a ftWith out gate ot.Each Men Douyou Sigmoid function and step-by-step multiply operation, so that hidden unit only remembers useful information as far as possible, abandon useless information, from And it solves the problems, such as to rely on for a long time;
4.3) to being calculated inside LSTM hidden unit, forget in door, WfIndicate input vector forgets weight, bfTable Show and forget to bias, forgets that calculating is as follows:
ft=σ (Wf·[ht-1,xt]+bf)
In input gate, WiIndicate the update weight of input vector, biIt indicates to update biasing, input gate calculates as follows:
it=σ (Wi·[ht-1,xt]+bi)
C is the state of hidden unit, and hidden unit calculates as follows:
Ct=ft*Ct-1+it*tanh(WC·[ht-1,xt]+bC)
In out gate, WoFor the output weight of input vector, boFor output biasing, out gate calculates as follows:
ot=σ (Wo·[ht-1,xt]+bo)
Finally calculate output layer h:
ht=ot*tanh(Ct)
Wherein, x is input layer, and h is output layer, ht-1For the output unit at t-1 moment, xt-1For the input list at t-1 moment Member.
5) time attention mechanism is added in LSTM network, extracts temporal aspect, is embodied as follows:
5.1) the expression y of some part of Input context information c and current datai
5.2) tanh layers of calculating m are used1, m2..., mn, by yiIt is polymerize with c, if the weight of c is Wcm, yiWeight For Wym, then miIt calculates as follows:
mi=tanh (Wcmc+Wymyi)
5.3) each weight after polymerization is calculated by softmax function:
Wherein, siIt is miThe softmax value on learning direction is projected, so softmax may be considered according to context The most correlation that c is obtained;
5.4) all y are calculatediWeighted average as output valve z, weight indicates each variable with the correlation of context c Property, z calculates as follows:
Z=∑isiyi
6) Human bodys' response is carried out using softmax classifier, specific implementation step is as follows:
6.1) training dataset is constructed, is disclosed using the multi-modal Human bodys' response of Berkeley MHAD and UTD-MHAD Data set;
6.2) a softmax classifier is added in the last layer of the LSTM model based on time attention mechanism, Input of the output of LSTM the last layer as classifier obtains final disaggregated model by training classifier;
6.3) it uses the co-ordinative construction feature for the 2D artis extracted from rgb video in step 3) as input, utilizes Trained softmax classifier is classified.
In conclusion the Human bodys' response method provided by the present invention based on time attention mechanism and LSTM, structure The deep neural network model based on RGB monocular vision sensor is built, can be improved the Activity recognition system in view-based access control model Universality;2D skeleton joint point is extracted using OpenPose Open Framework in rgb video, is proposed a kind of based on skeletal joint point Structure feature extracting method, processing speed and the raising of Activity recognition system can be improved by reducing Video Redundancy information Real-time;The LSTM model for proposing a kind of binding time attention mechanism, can be improved the accuracy rate of the identification to complex behavior. In addition, technical method provided by the invention can also be extended to human body exception monitoring, video monitoring, smart home, identity authentication with And the various fields such as motion analysis, there is extensive research significance, be worthy to be popularized.
In above-described embodiment, included modules are that function logic according to the invention is divided, but simultaneously It is not limited to above-mentioned division, as long as corresponding functions can be realized, the protection scope that is not intended to restrict the invention.
The above is the preferable embodiment of the present invention, but embodiments of the present invention are not by the limit of above-described embodiment System, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims (7)

1. the Human bodys' response method based on time attention mechanism and LSTM, which comprises the following steps:
1) video data of RGB monocular vision sensor is obtained;
2) 2D skeleton joint point data is extracted;
3) artis co-ordinative construction feature is extracted;
4) LSTM shot and long term memory network is constructed;
5) time attention mechanism is added in LSTM network;
6) Human bodys' response is carried out using softmax classifier.
2. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist In, in step 1), the video data of acquisition RGB monocular vision sensor, comprising the following steps:
1.1) RGB monocular vision sensor is installed on monitoring area, obtains data in real time;
1.2) server is connected to front end codec, real time video data is downloaded by stream media protocol;
1.3) the IP connection mode of iSCSI is used, the storage equipment of the transmission of video that will acquire to server is stored;
1.4) video data of acquisition is pre-processed, and data is sent to artis extraction module and are handled.
3. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist In, in step 2), extraction 2D skeleton joint point data, comprising the following steps:
2.1) video is subjected to segment processing according to every 10 seconds durations;
2.2) after input picture, picture size length and width are appointed as 368*368;
2.3) OpenPose frame is called, the picture input CNN network of specified size is extracted into part confidence maps With part affinity fields;
2.4) list is established, for storing detect from picture 18 artis;
2.5) part association is found out using even matching, artis is connected to the whole bone to form human synovial Frame.
4. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist In, in step 3), extraction artis co-ordinative construction feature, comprising the following steps:
3.1) 2 acquired dimension skeleton joint point coordinates are defined are as follows:
pi(x,y)
3.2) defining extracted two-dimensional framework joint point set is vector J, and J is expressed as follows:
J={ p1,p2,...,p18}
3.3) bone vector between two artis is normalized, normalized vector calculates as follows:
Wherein, piAnd pjIndicate two adjacent artis, | | pi-pj| | the Euclidean distance between two o'clock calculates as follows:
3.4) bone vector characteristics are calculated, i.e., adjacent segment point are formed by connecting bone vector, select respectively four groups of upper limb and under Four groups of the limb bone vectors as present embodiment, according to artis definition rule, bone vector characteristics set S is defined as:
S={ B2,3,B3,4,B5,6,B6,7,B8,9,B9,10,B11,12,B12,13}
3.5) bone angle character is calculated, using left wrist, left shoulder and the angle of left buttocks respectively, right wrist, right shoulder and right hips Angle defines artis p as bone space angleiAnd pjFor the angle theta where being projected in three-dimensional space in X/Y plane are as follows:
Bone angle character set θ is defined as:
θ=(θ4,82,85,117,11)
3.6) bone length feature is calculated, select bone length as biasing to describe the globality difference of human skeleton, use Backbone vector, i.e., the distance between left buttocks, two nodes of right hips and neck node are used as bone length feature, bone length Feature D set is defined as:
D=D1,8+D1,11
Wherein, if artis i is connected with artis j;
Dij=| | pi-pj||
3.7) skeleton joint point co-ordinative construction feature is calculated, by bone vector characteristics, bone angle character and bone length feature Linear mosaic is carried out, the co-ordinative construction feature of skeleton joint point is formed, is indicated are as follows:
Feature={ S, θ, D }.
5. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist In, in step 4), LSTM shot and long term memory network is constructed, specific as follows:
Hide in layer unit internal structure, the status level line of uppermost hidden unit by hidden unit state from upper one when Next moment is transmitted to quarter, only comprising a small number of linear transformation operations;
LSTM includes three " door " structures, input gate it, forget a ftWith out gate ot;Each door have sigmoid function and by Position multiplies operation, so that hidden unit only remembers useful information as far as possible, abandons useless information;
To being calculated inside LSTM hidden unit, forget in door, WfIndicate input vector forgets weight, bfExpression is forgotten partially It sets, forgets that calculating is as follows:
ft=σ (Wf·[ht-1,xt]+bf)
In input gate, WiIndicate the update weight of input vector, biIt indicates to update biasing, input gate calculates as follows:
it=σ (Wi·[ht-1,xt]+bi)
C is the state of hidden unit, and hidden unit calculates as follows:
Ct=ft*Ct-1+it*tanh(WC·[ht-1,xt]+bC)
In out gate, WoFor the output weight of input vector, boFor output biasing, out gate calculates as follows:
ot=σ (Wo·[ht-1,xt]+bo)
Finally calculate output layer h:
ht=ot*tanh(Ct)
Wherein, x is input layer, and h is output layer, ht-1For the output unit at t-1 moment, xt-1For the input unit at t-1 moment.
6. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist In, in step 5), the addition time attention mechanism in LSTM network, comprising the following steps:
5.1) the expression y of some part of Input context information c and current datai
5.2) tanh layers of calculating m are used1, m2..., mn, by yiIt is polymerize with c, if the weight of c is Wcm, yiWeight be Wym, Then miIt calculates as follows:
mi=tanh (Wcmc+Wymyi)
5.3) each weight after polymerization is calculated by softmax function:
Wherein, siIt is miThe softmax value on learning direction is projected, so softmax is considered what foundation context c was obtained Most correlation;
5.4) all y are calculatediWeighted average as output valve z, weight indicates each variable with the correlation of context c, z It calculates as follows:
Z=∑isiyi
7. the Human bodys' response method according to claim 1 based on time attention mechanism and LSTM, feature exist In being classified in step 6) using softmax regression model classifier, comprising the following steps:
6.1) training dataset is constructed, the multi-modal Human bodys' response public data of Berkeley MHAD and UTD-MHAD is used Collection;
6.2) a softmax classifier is added in the last layer of the LSTM model based on time attention mechanism, LSTM Input of the output of the last layer as classifier obtains final disaggregated model by training classifier;
6.3) use the co-ordinative construction feature for the 2D artis extracted from rgb video as input, utilization is trained Softmax classifier is classified.
CN201910271178.2A 2019-04-04 2019-04-04 Human behavior identification method based on time attention mechanism and LSTM (least Square TM) Expired - Fee Related CN110135249B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910271178.2A CN110135249B (en) 2019-04-04 2019-04-04 Human behavior identification method based on time attention mechanism and LSTM (least Square TM)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910271178.2A CN110135249B (en) 2019-04-04 2019-04-04 Human behavior identification method based on time attention mechanism and LSTM (least Square TM)

Publications (2)

Publication Number Publication Date
CN110135249A true CN110135249A (en) 2019-08-16
CN110135249B CN110135249B (en) 2021-07-20

Family

ID=67569411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910271178.2A Expired - Fee Related CN110135249B (en) 2019-04-04 2019-04-04 Human behavior identification method based on time attention mechanism and LSTM (least Square TM)

Country Status (1)

Country Link
CN (1) CN110135249B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705390A (en) * 2019-09-17 2020-01-17 平安科技(深圳)有限公司 Body posture recognition method and device based on LSTM and storage medium
CN110781771A (en) * 2019-10-08 2020-02-11 北京邮电大学 Abnormal behavior real-time monitoring method based on deep learning
CN111310655A (en) * 2020-02-13 2020-06-19 蒋营国 Human body action recognition method and system based on key frame and combined attention model
CN111368810A (en) * 2020-05-26 2020-07-03 西南交通大学 Sit-up detection system and method based on human body and skeleton key point identification
CN111553229A (en) * 2020-04-21 2020-08-18 清华大学 Worker action identification method and device based on three-dimensional skeleton and LSTM
CN111723667A (en) * 2020-05-20 2020-09-29 同济大学 Human body joint point coordinate-based intelligent lamp pole crowd behavior identification method and device
CN111860267A (en) * 2020-07-13 2020-10-30 浙大城市学院 Multichannel body-building movement identification method based on human body bone joint point positions
CN112149613A (en) * 2020-10-12 2020-12-29 萱闱(北京)生物科技有限公司 Motion estimation evaluation method based on improved LSTM model
CN112257845A (en) * 2020-10-12 2021-01-22 萱闱(北京)生物科技有限公司 Press action recognition method based on improved LSTM model
CN112528891A (en) * 2020-12-16 2021-03-19 重庆邮电大学 Bidirectional LSTM-CNN video behavior identification method based on skeleton information
CN112560582A (en) * 2020-11-24 2021-03-26 超越科技股份有限公司 Real-time abnormal behavior monitoring method based on LSTM
CN114973403A (en) * 2022-05-06 2022-08-30 广州紫为云科技有限公司 Efficient behavior prediction method based on space-time dual-dimension feature depth network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615983A (en) * 2015-01-28 2015-05-13 中国科学院自动化研究所 Behavior identification method based on recurrent neural network and human skeleton movement sequences
CN108600701A (en) * 2018-05-02 2018-09-28 广州飞宇智能科技有限公司 A kind of monitoring system and method judging video behavior based on deep learning
CN108764066A (en) * 2018-05-08 2018-11-06 南京邮电大学 A kind of express delivery sorting working specification detection method based on deep learning
CN108776796A (en) * 2018-06-26 2018-11-09 内江师范学院 A kind of action identification method based on global spatio-temporal attention model
CN108846332A (en) * 2018-05-30 2018-11-20 西南交通大学 A kind of railway drivers Activity recognition method based on CLSTA
CN108875708A (en) * 2018-07-18 2018-11-23 广东工业大学 Behavior analysis method, device, equipment, system and storage medium based on video
CN109508688A (en) * 2018-11-26 2019-03-22 平安科技(深圳)有限公司 Behavioral value method, terminal device and computer storage medium based on skeleton

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615983A (en) * 2015-01-28 2015-05-13 中国科学院自动化研究所 Behavior identification method based on recurrent neural network and human skeleton movement sequences
CN108600701A (en) * 2018-05-02 2018-09-28 广州飞宇智能科技有限公司 A kind of monitoring system and method judging video behavior based on deep learning
CN108764066A (en) * 2018-05-08 2018-11-06 南京邮电大学 A kind of express delivery sorting working specification detection method based on deep learning
CN108846332A (en) * 2018-05-30 2018-11-20 西南交通大学 A kind of railway drivers Activity recognition method based on CLSTA
CN108776796A (en) * 2018-06-26 2018-11-09 内江师范学院 A kind of action identification method based on global spatio-temporal attention model
CN108875708A (en) * 2018-07-18 2018-11-23 广东工业大学 Behavior analysis method, device, equipment, system and storage medium based on video
CN109508688A (en) * 2018-11-26 2019-03-22 平安科技(深圳)有限公司 Behavioral value method, terminal device and computer storage medium based on skeleton

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SIJIE SONG 等: "An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data", 《ARXIV:1611.06067V1 [CS.CV]》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021051579A1 (en) * 2019-09-17 2021-03-25 平安科技(深圳)有限公司 Body pose recognition method, system, and apparatus, and storage medium
CN110705390A (en) * 2019-09-17 2020-01-17 平安科技(深圳)有限公司 Body posture recognition method and device based on LSTM and storage medium
CN110781771A (en) * 2019-10-08 2020-02-11 北京邮电大学 Abnormal behavior real-time monitoring method based on deep learning
CN111310655A (en) * 2020-02-13 2020-06-19 蒋营国 Human body action recognition method and system based on key frame and combined attention model
CN111553229A (en) * 2020-04-21 2020-08-18 清华大学 Worker action identification method and device based on three-dimensional skeleton and LSTM
CN111723667A (en) * 2020-05-20 2020-09-29 同济大学 Human body joint point coordinate-based intelligent lamp pole crowd behavior identification method and device
CN111368810A (en) * 2020-05-26 2020-07-03 西南交通大学 Sit-up detection system and method based on human body and skeleton key point identification
CN111368810B (en) * 2020-05-26 2020-08-25 西南交通大学 Sit-up detection system and method based on human body and skeleton key point identification
CN111860267A (en) * 2020-07-13 2020-10-30 浙大城市学院 Multichannel body-building movement identification method based on human body bone joint point positions
CN111860267B (en) * 2020-07-13 2022-06-14 浙大城市学院 Multichannel body-building exercise identification method based on human body skeleton joint point positions
CN112149613A (en) * 2020-10-12 2020-12-29 萱闱(北京)生物科技有限公司 Motion estimation evaluation method based on improved LSTM model
CN112257845A (en) * 2020-10-12 2021-01-22 萱闱(北京)生物科技有限公司 Press action recognition method based on improved LSTM model
CN112149613B (en) * 2020-10-12 2024-01-05 萱闱(北京)生物科技有限公司 Action pre-estimation evaluation method based on improved LSTM model
CN112560582A (en) * 2020-11-24 2021-03-26 超越科技股份有限公司 Real-time abnormal behavior monitoring method based on LSTM
CN112528891A (en) * 2020-12-16 2021-03-19 重庆邮电大学 Bidirectional LSTM-CNN video behavior identification method based on skeleton information
CN114973403A (en) * 2022-05-06 2022-08-30 广州紫为云科技有限公司 Efficient behavior prediction method based on space-time dual-dimension feature depth network
CN114973403B (en) * 2022-05-06 2023-11-03 广州紫为云科技有限公司 Behavior prediction method based on space-time double-dimension feature depth network

Also Published As

Publication number Publication date
CN110135249B (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN110135249A (en) Human bodys' response method based on time attention mechanism and LSTM
CN110135375B (en) Multi-person attitude estimation method based on global information integration
CN109829436B (en) Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network
He et al. Visual recognition of traffic police gestures with convolutional pose machine and handcrafted features
CN110222580A (en) A kind of manpower 3 d pose estimation method and device based on three-dimensional point cloud
CN111160294B (en) Gait recognition method based on graph convolution network
CN113128424B (en) Method for identifying action of graph convolution neural network based on attention mechanism
CN111444488A (en) Identity authentication method based on dynamic gesture
CN110135277B (en) Human behavior recognition method based on convolutional neural network
CN112906520A (en) Gesture coding-based action recognition method and device
Neverova Deep learning for human motion analysis
Sheu et al. Improvement of human pose estimation and processing with the intensive feature consistency network
CN111611869B (en) End-to-end monocular vision obstacle avoidance method based on serial deep neural network
CN117115911A (en) Hypergraph learning action recognition system based on attention mechanism
CN117711066A (en) Three-dimensional human body posture estimation method, device, equipment and medium
CN117576149A (en) Single-target tracking method based on attention mechanism
Yang et al. Human action recognition based on skeleton and convolutional neural network
Gadhiya et al. Analysis of deep learning based pose estimation techniques for locating landmarks on human body parts
CN113469018B (en) Multi-modal interactive behavior recognition method based on RGB and three-dimensional skeleton
Usman et al. Skeleton-based motion prediction: A survey
Huang et al. View-independent behavior analysis
Ramanathan et al. Combining pose-invariant kinematic features and object context features for rgb-d action recognition
CN111178141B (en) LSTM human body behavior identification method based on attention mechanism
CN115482481A (en) Single-view three-dimensional human skeleton key point detection method, device, equipment and medium
Liang Face recognition technology analysis based on deep learning algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210720