CN107886061A - Human bodys' response method and system based on multi-modal depth Boltzmann machine - Google Patents

Human bodys' response method and system based on multi-modal depth Boltzmann machine Download PDF

Info

Publication number
CN107886061A
CN107886061A CN201711061490.6A CN201711061490A CN107886061A CN 107886061 A CN107886061 A CN 107886061A CN 201711061490 A CN201711061490 A CN 201711061490A CN 107886061 A CN107886061 A CN 107886061A
Authority
CN
China
Prior art keywords
data
mrow
depth
boltzmann machine
modal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711061490.6A
Other languages
Chinese (zh)
Other versions
CN107886061B (en
Inventor
毕盛
谢澈澈
董敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201711061490.6A priority Critical patent/CN107886061B/en
Publication of CN107886061A publication Critical patent/CN107886061A/en
Application granted granted Critical
Publication of CN107886061B publication Critical patent/CN107886061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content

Abstract

The invention discloses a kind of Human bodys' response method and system based on multi-modal depth Boltzmann machine, the method comprising the steps of:1) vision and the data of wearable sensors are obtained;2) vision data and wearable sensors multi-modal fusion model are established;3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network;4) classified using softmax regression model graders;5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.The present invention can improve the accuracy rate of the Human bodys' response in the case of complex scene and shortage of data.

Description

Human bodys' response method and system based on multi-modal depth Boltzmann machine
Technical field
The present invention relates to artificial intelligence, Activity recognition technical field, refers in particular to one kind and is based on multi-modal depth Bohr hereby The Human bodys' response method and system of graceful machine.
Background technology
In recent years, Robot industry is in the growth of explosion type, and robot " full application " epoch arrive.On the one hand, machine Device people is appeared in family and daily life, and on the other hand, with industrial machine man-based development, robot is widely used in vapour In the industry-by-industries such as car manufacturing industry, metal manufacturing, man-machine collaboration is realized.Human bodys' response is widely used in man-machine friendship Mutually, the field such as man-machine collaboration, robot are needed from each level of abstraction come the behavior for understanding and identifying the mankind, its accuracy identified Great effect will be played to the application development of robot technology.The Activity recognition of machine Human To Human is machine Human To Human and outer One highly important link of boundary's environment sensing, how to reduce the noise factors such as scene diversity, background complexity and identification is imitated The influence of fruit, it is always the focus of Human bodys' response research.
At present, mainly using view-based access control model and based on two kinds of think ofs of wearable sensors in terms of Human bodys' response technical research Road is carried out, but is also faced with following problem at present:
1st, robot needs to improve to the accuracy rate of Human bodys' response under complex scene:Human bodys' response is main at present By based on single vision, based on single wearable sensors, view-based access control model and the traditional data fusion side of wearable sensors For method come what is realized, these modes all can not effectively solve the problems, such as that Human bodys' response accuracy rate is low under complex scene.
2nd, to the challenge of Human bodys' response accuracy rate when multi-modal data lacks:Study at present and rare be related to this Problem, and in real life, due to people privacy and the reason such as block, can often occur to lack the situation of visual signal, this meeting Large effect is caused to the accuracy of robot identification people's behavior.
3rd, robot is in the general character and a sex chromosome mosaicism to facing people during the Activity recognition of people:Study at present it is rare be related to as What is added to the customized information of people in general character model so that model has the characteristics of personalized, and this can also influence robot Identification to human body behavior.
The content of the invention
The shortcomings that it is an object of the invention to overcome prior art and deficiency, it is proposed that a recognition accuracy is higher and can With the stronger Human bodys' response method and system based on multi-modal depth Boltzmann machine of property, it is intended to build view-based access control model and The multi-modal deep neural network model of wearable sensors is so as to improving the accuracy rate of the Activity recognition under complex scene;More Depth Boltzmann machine network is used in mode deep learning model, the Activity recognition degree of accuracy is caused so as to reduce missing data Influence;A kind of method that combination personalization features adjustment network structure establishes adaptive general character model is proposed, so as to improve machine Accuracy rate of the device people to specific owner's Activity recognition.
To achieve the above object, technical scheme proposed by the invention is as follows:
Based on the Human bodys' response method of multi-modal depth Boltzmann machine, comprise the following steps:
1) vision and the data of wearable sensors are obtained;
2) vision data and wearable sensors multi-modal fusion model are established;
3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network;
4) classified using softmax regression model graders;
5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.
In step 1), vision and wearable sensors data are obtained, are comprised the following steps:
1.1) adopted using the maximum frequency acquisition of vision Kinect sensor as vision and wearable sensors are common Collect frequency;
1.2) using Kinect vision sensors as video input feature, and it is installed in robot, passes through USB interface Convey data to notebook computer;
1.3) posture of wearable sensors selection wrist and the attitude data of waist are as input feature vector, by wireless blue The data for storing a period of time are sent to notebook computer by tooth communication;
1.4) notebook computer is pre-processed to the data of collection and the data after processing is sent to backstage graphical Work Stand and carry out deep learning.
In step 2), vision data and wearable sensors multi-modal fusion model are established, is comprised the following steps:
2.1) start frame, end frame and frame are added for the data in vision and wearable sensors each acquisition window time Numbering, data are then extracted according to frame number and inputted as deep neural network;
2.2) a kind of method of dynamically changeable acquisition window length is used, dynamic partition goes out each action cycle as slip The time span of window;
2.3) color RGB and depth D information architecture of the Kinect cameras all pixels point in an acquisition time window Into the visual feature vector of one as input;
2.4) wearable sensors are the wrist in an acquisition time window and (the 3 axles acceleration of the axle attitude transducer of waist 6 Degree and 3 axis angular rates) data collectively form wearable sensors characteristic vector as input;
2.5) deep learning is directly acquired to initial data and obtains feature by training.
In step 3), reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network, including Following steps:
3.1) space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine, sensing data are built respectively As input, depth is used as two layers of depth Boltzmann machine, it is seen that the neuron of layer and hidden layer is all Gauss unit, two The energy function of depth Boltzmann machine of layer is:
E(v,h(1),h(2), θ) and=- vTW(1)h(1)-h(1)W(2)h(2)
Wherein, θ is RBM parameter { W, a, b }, and v represents visible element, h(i)I-th layer of hidden unit is represented, W is visible list The weight on the side between member and hidden unit;
3.2) multi-modal depth Boltzmann machine is built, space or depth perception Boltzmann is merged using a common hidden layer Machine and wearable sensors depth Boltzmann machine, the joint probability distribution of the network are:
Wherein, θ is joint probability distribution parameter, vmRepresent space or depth perception Boltzmann machine visible layer, vtRepresent wearable biography Sensor depth Boltzmann machine visible layer,Space or depth perception i-th of hidden layer of Boltzmann machine is represented,Represent wearable sensing Device depth i-th of hidden layer of Boltzmann machine;
In step 4), classified using softmax regression model graders, comprised the following steps:
4.1) training dataset is built, includes the multi-modal human action datas of Berkeley using multi-modal public data collection Collection, and the real data collection obtained combine composing training data set;
4.2) add a softmax grader in last layer of deep learning model, using the output of final layer as The input of grader, by training grader to obtain final disaggregated model;
4.3) common trait that fusion depth Boltzmann machine obtains in step 3) is used to utilize what is trained as input Softmax graders are classified.
In step 5), depth network model caused by public sample data is carried out according to user's individual character adaptive Adjustment, comprises the following steps:
5.1) hidden layer is respectively added before vision input feature vector layer and wearable sensors input feature vector layer;
5.2) data with a high credibility that individual consumer is obtained using public data network model progress Activity recognition are made To there is the sample data of mark;
5.3) by having the sample data of mark and using Mini-batch incremental learnings to model caused by common data It is trained, the size of Mini-batch needed for selection.
Based on the Human bodys' response system of multi-modal depth Boltzmann machine, including:
Data acquisition module, for gathering the original data stream of machine Human To Human's Activity recognition platform, including vision data Stream and wearable sensors data flow;
Data preprocessing module, for being filtered to the initial data of collection at noise reduction, smoothing processing and adding window Reason;
Deep learning module, for being learnt and being merged pretreated data addition deep neural network, extract The common trait of vision and attitude transducer data;
Model training module, by the study and modeling to training dataset, the multi-modal fusion depth after being trained Boltzmann machine Human bodys' response model;
Activity recognition module, human body behavior is carried out using multi-modal fusion depth Boltzmann machine Human bodys' response model Identification classification.
Preferably, the data acquisition module is specifically using Kinect sensor collection visual data streams, using 26 axles Attitude transducer gathers the data of waist and wrist respectively, and the frequency acquisition maximum using Kinect sensor is adopted as common Collect frequency.
Preferably, the data preprocessing module uses a kind of adding window method of dynamically changeable, is partitioned into each action row For cycle.
Preferably, the deep learning module specifically uses multi-modal depth Boltzmann machine, common hidden using one Hide layer fusion space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine.
Preferably, the model training module uses the multi-modal open number such as the multi-modal human action data collection of Berkeley According to collection, and the real data collection obtained combines composing training data set.
Preferably, the Activity recognition module makes an addition to depth god specifically using softmax regression models as grader Last layer through network.
The present invention compared with prior art, has the following advantages that and beneficial effect:
1st, the multi-modal neural network model of view-based access control model and wearable sensors, uses the behavior table based on global characteristics Sign method, the data fusion of vision sensor and wearable sensors is got up the behavioural characteristic of people is identified jointly, no Multiple sensors need not be only worn, can effectively reduce human body wearable sensors influences to caused by comfort level, and can The limitation of compound action can not be identified by breaking through the behavior characterizing method based on local feature, can be effectively improved in complex scene The accuracy rate of the Activity recognition of lower machine Human To Human.
2nd, using depth Boltzmann machine, the data of missing can be reconstructed, can effectively reduces shortage of data pair Influence caused by the Activity recognition degree of accuracy.Vision shooting visual angle is impacted, object is blocked, wearable device is by extraneous electricity In the case of having shortage of data caused by the factors such as magnetic disturbance, the accuracy rate of machine Human To Human's Activity recognition can be effectively improved.
3rd, a kind of method of combination personalization features adjust automatically general character model is proposed, can effectively solve the problem that robot right General character and a sex chromosome mosaicism during Human bodys' response, make robot be automatically adjusted to common data model so as to meet individual Characteristics of personality, so as to more fully understand human body behavior, improve accuracy rate of the robot to specific owner's Activity recognition.
Brief description of the drawings
Fig. 1 is the Human bodys' response method flow diagram of the invention based on multi-modal depth Boltzmann machine.
Fig. 2 is robot identification human body system of behavior platform schematic diagram of the present invention.
Fig. 3 is the multi-modal deep neural network model schematic diagram of view-based access control model and wearable sensors of the present invention.
Fig. 4 is multi-modal depth Boltzmann machine schematic diagram.
Embodiment
With reference to specific embodiment, the invention will be further described.
Human bodys' response side shown in Figure 1, that the present embodiment is provided based on multi-modal depth Boltzmann machine Method, comprise the following steps:
1) robot identification human body system of behavior platform is established, obtains vision and the data of wearable sensors;
2) vision data and wearable sensors multi-modal fusion model are established, vision and wearable sensors information are entered Row fusion;
3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network;
4) human body behavior classification is carried out using softmax regression models grader;
5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.
It is shown in Figure 2, in step 1), described robot identification human body system of behavior platform, obtain vision and can The data of sensor are dressed, are comprised the following steps:
1.1) the Kinect vision sensors collection video data installed in robot;
1.2) wearable sensors use 6 axle attitude transducers (3 axle accelerations and 3 axis angular rates), are respectively arranged in intelligence In bracelet and intelligent waistband, can select human body wrist posture and waist attitude data as input feature vector;
1.3) vision and wearable in the present embodiment, is used as using the maximum frequency acquisition of vision Kinect sensor The common frequency acquisition of sensor;
1.4) after gathering vision data, Kinect vision sensors convey data to notebook computer by USB interface;
1.5) after gathering attitude transducer data, wearable sensors store a period of time by wireless blue tooth communication handle Data be sent to notebook computer.
It is shown in Figure 3, in step 2), the multi-modal depth nerve net of described view-based access control model and wearable sensors Network model, multi-modal deep neural network is built, is comprised the following steps:
2.1) in the present embodiment, for convenience of merge two kinds of sensing datas, the synchronous method used be to vision with Data addition start frame, end frame and frame number in wearable sensors each acquisition window time;
2.2) data are extracted according to frame number to input as deep neural network, ensures wearable sensors and vision The uniformity of Kinect sensor time;
2.3) in the present embodiment, using a kind of adding window method of dynamically changeable, isolate each action cycle, slide The length of window is the time span of each action cycle, and sliding step is half of length of window;
2.4) while acquisition characteristics data, vision data and wearable sensors data are analyzed, found dynamic Make the key point that changes and in this as the beginning and end of acquisition window;
2.5) it is the gatherer process of not effect characteristicses, gatherer process and analysis concurrent processization are carried out;
2.6) in an acquisition time window, color RGB and depth D information structure of the Kinect cameras all pixels point The visual feature vector of one is built up as input;
2.7) in an acquisition time window, wearable sensors are wrist 6 axle attitude transducer (3 axle accelerations and 3 Axis angular rate) data and the axle attitude transducer of waist 6 (3 axle accelerations and 3 axis angular rates) data collectively form wearable sensing Device characteristic vector is as input;
2.8) deep learning is directly acquired to initial data and obtains feature by training.
It is shown in Figure 4, in step 3), described multi-modal depth Boltzmann machine, to the god of missing data reconstruction Through network structure, comprise the following steps:
3.1) in the present embodiment, it is using two layers of depth Boltzmann machine, its energy function:
E(v,h(1),h(2), θ) and=- vTW(1)h(1)-h(1)W(2)h(2)
Wherein, θ is RBM parameter { W, a, b }, and v represents visible element, h(i)I-th layer of hidden unit is represented, W is visible list The weight on the side between member and hidden unit;
3.2) space or depth perception Boltzmann machine is built, Kinect vision sensors use depth as two layers as input Depth Boltzmann machine, it is seen that the neuron of layer and hidden layer is all Gauss unit;
3.3) data of wearable sensors depth Boltzmann machine wearable sensors are built as input, also using deep Spend the depth Boltzmann machine for two layers, it is seen that the neuron of layer and hidden layer is all Gauss unit;
3.4) the multi-modal depth Boltzmann being made up of two depth Boltzmann machines in the present embodiment, is built Machine, there is a common hidden layer to combine the two depth networks in this structure.Assuming that the visible layer of a depth network is vm, another is vt, then the joint probability distribution of the network be:
Wherein, θ is joint probability distribution parameter, vmRepresent space or depth perception Boltzmann machine visible layer, vtRepresent wearable biography Sensor depth Boltzmann machine visible layer,Space or depth perception i-th of hidden layer of Boltzmann machine is represented,Represent wearable sensing Device depth i-th of hidden layer of Boltzmann machine;
3.5) the multi-modal deep neural network model of view-based access control model and wearable sensors is built, by two depth Bohr Zi Man mechanisms into multi-modal depth Boltzmann machine, have common hidden layer (representing a characteristic layer jointly) fusion vision and The two depth networks of wearable sensors.
In step 4), described carries out human body behavior classification, including following step using softmax regression models grader Suddenly:
4.1) training dataset is built, utilizes the multi-modal public datas such as the multi-modal human action data collection of Berkeley Collection, and the real data collection that this research team obtains through various channels combine composing training data set;
4.2) add a softmax grader in last layer of deep learning model, using the output of final layer as The input of grader, by training grader to obtain final disaggregated model;
4.3) obtained using fusion space or depth perception Boltzmann machine and wearable sensors Boltzmann machine in step 3) Common trait is classified as input using the softmax graders trained.
In step 5), described personal feature is automatically adjusted to depth network model caused by common data, its It is divided into and improves network structure and mark the incremental learning two ways that new samples are trained, implementation steps is as follows:
5.1) network structure is improved, extends original neural network structure, its specific steps includes:
5.1.1 a hidden layer) is respectively added before vision input feature vector layer and wearable sensors input feature vector layer;
When 5.1.2) together with user and robot, the training of unsupervised learning is re-started;
5.1.3 the content with individual consumer's behavior) is trained in new network structure;
5.2) individual consumer, is entered every trade by the incremental learning that mark new samples are trained using public data network model To identify that obtained data with a high credibility include as the sample data for having mark, its specific steps:
5.2.1) determine whether the data of collection are normal according to sensor self character;
5.2.2) confidence level is obtained with reference to the Softmax disaggregated model COMPREHENSIVE CALCULATINGs of output result;
5.2.3) by having the sample data of mark and using Mini-batch incremental learnings to mould caused by common data Type is trained, and detailed process is:Whole samples are divided into several parts, per a renewal primary parameter, every part of sample size is more, The precision of model training is higher, but the time spent is more, in precision and the trade-off problem of time, reasonable selection Mini-batch Size.
A kind of Human bodys' response system based on multi-modal depth Boltzmann machine provided below by the present embodiment, Including:
Data acquisition module:For gathering the original data stream of machine Human To Human's Activity recognition platform, including vision data Stream and wearable sensors data flow.In the present embodiment, video data is gathered using Kinect sensor, using 26 axles Attitude transducer gathers the data of waist and wrist respectively, and the frequency acquisition maximum using Kinect sensor is adopted as common Collect frequency.
Data preprocessing module:For being filtered to the initial data of collection at noise reduction, smoothing processing and adding window Reason.In the present embodiment, using a kind of dynamic adding window mode, grown using the Cycle Length of each human body behavior as window Degree, the eigenmatrix of data in each window is extracted as input.
Deep learning module:For being learnt and being merged pretreated data addition deep neural network, extract The common trait of vision and attitude transducer data.In the present embodiment, using multi-modal depth Boltzmann machine, one is used Individual common hidden layer fusion space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine, so as to more sensings Device data are merged and train extraction common trait.
Model training module:By the study and modeling to training dataset, the multi-modal fusion depth after being trained Boltzmann machine Human bodys' response model.In the present embodiment, using multi-modal human action data collection of Berkeley etc. Multi-modal public data collection, and the real data collection that this research team obtains through various channels combine composing training data Collection.
Activity recognition module:Human body behavior is carried out using multi-modal fusion depth Boltzmann machine Human bodys' response model Identification classification.In the present embodiment, deep neural network is made an addition to as grader using softmax regression models Last layer.
In the above-described embodiments, included modules are to be divided according to the function logic of the present invention, but Above-mentioned division is not limited to, as long as corresponding function can be realized, the protection domain being not intended to limit the invention.
In summary, Human bodys' response method provided by the present invention based on multi-modal depth Boltzmann machine and it is System, build the multi-modal neural network model of view-based access control model and wearable sensors, it is possible to increase the robot under complex scene To the accuracy rate of the Activity recognition of people;Suitable deep neural network structure is used in multi-modal deep learning model, can Reduce because missing data influences to caused by the Activity recognition degree of accuracy;It is proposed that one kind combines personalization features adjust automatically general character The method of model, it is possible to increase accuracy rate of the robot to specific owner's Activity recognition.The present invention can be used for people and robot Cooperation, so as to improve the success rate of man-machine collaboration.Supervised extremely in addition, technical method provided by the invention can also be extended to human body The various fields such as survey, video monitoring, smart home, identity authentication and motion analysis, there is extensive Research Significance, be worth pushing away Extensively.
Embodiment described above is only the preferred embodiments of the invention, and the practical range of the present invention is not limited with this, therefore The change that all shape, principles according to the present invention are made, it all should cover within the scope of the present invention.

Claims (7)

1. the Human bodys' response method based on multi-modal depth Boltzmann machine, it is characterised in that comprise the following steps:
1) vision and the data of wearable sensors are obtained;
2) vision data and wearable sensors multi-modal fusion model are established;
3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network;
4) classified using softmax regression model graders;
5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.
2. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist In in step 1), obtaining vision and wearable sensors data, comprise the following steps:
1.1) the collection frequency that the maximum frequency acquisition of vision Kinect sensor is common as vision and wearable sensors is used Rate;
1.2) using Kinect vision sensors as video input feature, and it is installed in robot, by USB interface number According to sending notebook computer to;
1.3) posture of wearable sensors selection wrist and the attitude data of waist are led to as input feature vector by wireless blue tooth The data for storing a period of time are sent to notebook computer by letter;
1.4) notebook computer is pre-processed and the data after processing is sent to backstage graphics workstation to the data of collection Row deep learning.
3. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist In in step 2), establishing vision data and wearable sensors multi-modal fusion model, comprise the following steps:
2.1) start frame, end frame and frame is added for the data in vision and wearable sensors each acquisition window time to compile Number, data are then extracted according to frame number and inputted as deep neural network;
2.2) a kind of method of dynamically changeable acquisition window length is used, dynamic partition goes out each action cycle as sliding window Time span;
2.3) Kinect cameras the color RGB of all pixels point in an acquisition time window and depth D information architectures into one Individual visual feature vector is as input;
2.4) wearable sensors collectively form the wrist in an acquisition time window and the axle attitude transducer data of waist 6 Wearable sensors characteristic vector is as input;
2.5) deep learning is directly acquired to initial data and obtains feature by training.
4. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist In in step 3), utilizing deep neural network to carry out reconstruct of the isomery transfer learning realization to missing data, including following step Suddenly:
3.1) space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine, sensing data conduct are built respectively Input, depth is used as two layers of depth Boltzmann machine, it is seen that the neuron of layer and hidden layer is all Gauss unit, two layers The energy function of depth Boltzmann machine is:
E(v,h(1),h(2), θ) and=- vTW(1)h(1)-h(1)W(2)h(2)
Wherein, θ is RBM parameter { W, a, b }, and v represents visible element, h(i)Represent i-th layer of hidden unit, W be visible element and The weight on the side between hidden unit;
3.2) build multi-modal depth Boltzmann machine, using a common hidden layer merge space or depth perception Boltzmann machine and Wearable sensors depth Boltzmann machine, the joint probability distribution of the network are:
<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>m</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>t</mi> </msub> <mo>;</mo> <mi>&amp;theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msup> <mi>h</mi> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </msup> </mrow> </munder> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msup> <mi>h</mi> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>&amp;lsqb;</mo> <munder> <mo>&amp;Sigma;</mo> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> </munder> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>m</mi> </msub> <mo>,</mo> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mo>&amp;lsqb;</mo> <munder> <mo>&amp;Sigma;</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> </munder> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>t</mi> </msub> <mo>,</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow>
Wherein, θ is joint probability distribution parameter, vmRepresent space or depth perception Boltzmann machine visible layer, vtRepresent wearable sensors Depth Boltzmann machine visible layer,Space or depth perception i-th of hidden layer of Boltzmann machine is represented,Represent that wearable sensors are deep Spend i-th of hidden layer of Boltzmann machine;
In step 4), classified using softmax regression model graders, comprised the following steps:
4.1) training dataset is built, includes the multi-modal human action data collection of Berkeley using multi-modal public data collection, And the real data collection obtained combines composing training data set;
4.2) a softmax grader is added in last layer of deep learning model, using the output of final layer as classification The input of device, by training grader to obtain final disaggregated model;
4.3) common trait that fusion depth Boltzmann machine obtains in step 3) is used to utilize what is trained as input Softmax graders are classified.
5. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist In, in step 5), depth network model caused by public sample data is adaptively adjusted according to user's individual character, Comprise the following steps:
5.1) hidden layer is respectively added before vision input feature vector layer and wearable sensors input feature vector layer;
5.2) individual consumer is carried out the obtained data with a high credibility of Activity recognition as having using public data network model The sample data of mark;
5.3) by having the sample data of mark and model caused by common data being carried out using Mini-batch incremental learnings Training, Mini-batch size needed for selection.
6. the Human bodys' response system based on multi-modal depth Boltzmann machine, it is characterised in that including:
Data acquisition module, for gathering the original data stream of machine Human To Human's Activity recognition platform, including visual data streams and Wearable sensors data flow;
Data preprocessing module, for carrying out noise reduction, smooth and windowing process to the initial data of collection;
Deep learning module, for being learnt and being merged pretreated data addition deep neural network, extract vision With the common trait of attitude transducer data;
Model training module, by the study and modeling to training dataset, multi-modal fusion depth Bohr after being trained Hereby graceful machine Human bodys' response model;
Activity recognition module, the knowledge of human body behavior is carried out using multi-modal fusion depth Boltzmann machine Human bodys' response model Do not classify.
7. the Human bodys' response system according to claim 6 based on multi-modal depth Boltzmann machine, its feature exist In:The data acquisition module is adopted respectively using Kinect sensor collection visual data streams using 26 axle attitude transducers Collect the data of waist and wrist, the frequency acquisition maximum using Kinect sensor is used as common frequency acquisition;The data Pretreatment module uses a kind of adding window method of dynamically changeable, is partitioned into the cycle of each action behavior;The deep learning mould Block uses multi-modal depth Boltzmann machine, and space or depth perception Boltzmann machine and wearable is merged using a common hidden layer Sensor depth Boltzmann machine;The model training module includes the multi-modal people of Berkeley using multi-modal public data collection Body action data collection, and the real data collection obtained combine composing training data set;The Activity recognition module uses Softmax regression models make an addition to last layer of deep neural network as grader.
CN201711061490.6A 2017-11-02 2017-11-02 Human body behavior recognition method and system based on multi-mode deep Boltzmann machine Active CN107886061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711061490.6A CN107886061B (en) 2017-11-02 2017-11-02 Human body behavior recognition method and system based on multi-mode deep Boltzmann machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711061490.6A CN107886061B (en) 2017-11-02 2017-11-02 Human body behavior recognition method and system based on multi-mode deep Boltzmann machine

Publications (2)

Publication Number Publication Date
CN107886061A true CN107886061A (en) 2018-04-06
CN107886061B CN107886061B (en) 2021-08-06

Family

ID=61783558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711061490.6A Active CN107886061B (en) 2017-11-02 2017-11-02 Human body behavior recognition method and system based on multi-mode deep Boltzmann machine

Country Status (1)

Country Link
CN (1) CN107886061B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629380A (en) * 2018-05-11 2018-10-09 西北大学 A kind of across scene wireless signal cognitive method based on transfer learning
CN108958482A (en) * 2018-06-28 2018-12-07 福州大学 A kind of similitude action recognition device and method based on convolutional neural networks
CN109063722A (en) * 2018-06-08 2018-12-21 中国科学院计算技术研究所 A kind of Activity recognition method and system based on chance perception
CN109190550A (en) * 2018-08-29 2019-01-11 沈阳康泰电子科技股份有限公司 Combine the deep neural network multi-source data fusion method of micro- expression multi-input information
CN109241223A (en) * 2018-08-23 2019-01-18 中国电子科技集团公司电子科学研究院 The recognition methods of behavior whereabouts and platform
CN110222598A (en) * 2019-05-21 2019-09-10 平安科技(深圳)有限公司 A kind of video behavior recognition methods, device, storage medium and server
CN110222730A (en) * 2019-05-16 2019-09-10 华南理工大学 Method for identifying ID and identification model construction method based on inertial sensor
CN110458033A (en) * 2019-07-17 2019-11-15 哈尔滨工程大学 A kind of human body behavior sequence recognition methods based on wearable position sensor
CN111216126A (en) * 2019-12-27 2020-06-02 广东省智能制造研究所 Multi-modal perception-based foot type robot motion behavior recognition method and system
CN111401440A (en) * 2020-03-13 2020-07-10 重庆第二师范学院 Target classification recognition method and device, computer equipment and storage medium
CN111507281A (en) * 2020-04-21 2020-08-07 中山大学中山眼科中心 Behavior recognition system, device and method based on head movement and gaze behavior data
CN111556453A (en) * 2020-04-27 2020-08-18 南京邮电大学 Multi-scene indoor action recognition method based on channel state information and BilSTM
CN111680660A (en) * 2020-06-17 2020-09-18 郑州大学 Human behavior detection method based on multi-source heterogeneous data stream
CN111861275A (en) * 2020-08-03 2020-10-30 河北冀联人力资源服务集团有限公司 Method and device for identifying household working mode
CN112215136A (en) * 2020-10-10 2021-01-12 北京奇艺世纪科技有限公司 Target person identification method and device, electronic equipment and storage medium
CN112380976A (en) * 2020-11-12 2021-02-19 华东师范大学 Gesture recognition system and method based on neural network visual touch sensor fusion
CN113657487A (en) * 2021-08-16 2021-11-16 深圳多模智能科技有限公司 Human body attribute classification method and device based on incremental learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063720A (en) * 2014-07-03 2014-09-24 浙江大学 Method for detecting images of prohibited commodities of e-commerce websites based on deep Boltzmann machine
CN106778880A (en) * 2016-12-23 2017-05-31 南开大学 Microblog topic based on multi-modal depth Boltzmann machine is represented and motif discovery method
US20170220854A1 (en) * 2016-01-29 2017-08-03 Conduent Business Services, Llc Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063720A (en) * 2014-07-03 2014-09-24 浙江大学 Method for detecting images of prohibited commodities of e-commerce websites based on deep Boltzmann machine
US20170220854A1 (en) * 2016-01-29 2017-08-03 Conduent Business Services, Llc Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action
CN106778880A (en) * 2016-12-23 2017-05-31 南开大学 Microblog topic based on multi-modal depth Boltzmann machine is represented and motif discovery method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENG WANG ET AL: "Exploring Multimodal Video Representation for Action Recognition", 《2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS(IJCNN)》 *
张清辰: "面向大数据特征学习的深度计算模型研究", 《中国博士学位论文全文数据库信息科技辑》 *
毕盛 等: "基于多传感器信息融合的仿人机器人跌倒检测及控制", 《华南理工大学学报( 自然科学版)》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629380B (en) * 2018-05-11 2021-06-11 西北大学 Cross-scene wireless signal sensing method based on transfer learning
CN108629380A (en) * 2018-05-11 2018-10-09 西北大学 A kind of across scene wireless signal cognitive method based on transfer learning
CN109063722A (en) * 2018-06-08 2018-12-21 中国科学院计算技术研究所 A kind of Activity recognition method and system based on chance perception
CN109063722B (en) * 2018-06-08 2021-06-29 中国科学院计算技术研究所 Behavior recognition method and system based on opportunity perception
CN108958482A (en) * 2018-06-28 2018-12-07 福州大学 A kind of similitude action recognition device and method based on convolutional neural networks
CN109241223A (en) * 2018-08-23 2019-01-18 中国电子科技集团公司电子科学研究院 The recognition methods of behavior whereabouts and platform
CN109241223B (en) * 2018-08-23 2022-06-28 中国电子科技集团公司电子科学研究院 Behavior track identification method and system
CN109190550A (en) * 2018-08-29 2019-01-11 沈阳康泰电子科技股份有限公司 Combine the deep neural network multi-source data fusion method of micro- expression multi-input information
CN110222730A (en) * 2019-05-16 2019-09-10 华南理工大学 Method for identifying ID and identification model construction method based on inertial sensor
WO2020232886A1 (en) * 2019-05-21 2020-11-26 平安科技(深圳)有限公司 Video behavior identification method and apparatus, storage medium and server
CN110222598A (en) * 2019-05-21 2019-09-10 平安科技(深圳)有限公司 A kind of video behavior recognition methods, device, storage medium and server
CN110458033B (en) * 2019-07-17 2023-01-03 哈尔滨工程大学 Human body behavior sequence identification method based on wearable position sensor
CN110458033A (en) * 2019-07-17 2019-11-15 哈尔滨工程大学 A kind of human body behavior sequence recognition methods based on wearable position sensor
CN111216126A (en) * 2019-12-27 2020-06-02 广东省智能制造研究所 Multi-modal perception-based foot type robot motion behavior recognition method and system
CN111401440A (en) * 2020-03-13 2020-07-10 重庆第二师范学院 Target classification recognition method and device, computer equipment and storage medium
CN111507281A (en) * 2020-04-21 2020-08-07 中山大学中山眼科中心 Behavior recognition system, device and method based on head movement and gaze behavior data
CN111556453A (en) * 2020-04-27 2020-08-18 南京邮电大学 Multi-scene indoor action recognition method based on channel state information and BilSTM
CN111680660A (en) * 2020-06-17 2020-09-18 郑州大学 Human behavior detection method based on multi-source heterogeneous data stream
CN111680660B (en) * 2020-06-17 2023-03-24 郑州大学 Human behavior detection method based on multi-source heterogeneous data stream
CN111861275A (en) * 2020-08-03 2020-10-30 河北冀联人力资源服务集团有限公司 Method and device for identifying household working mode
CN111861275B (en) * 2020-08-03 2024-04-02 河北冀联人力资源服务集团有限公司 Household work mode identification method and device
CN112215136A (en) * 2020-10-10 2021-01-12 北京奇艺世纪科技有限公司 Target person identification method and device, electronic equipment and storage medium
CN112215136B (en) * 2020-10-10 2023-09-05 北京奇艺世纪科技有限公司 Target person identification method and device, electronic equipment and storage medium
CN112380976A (en) * 2020-11-12 2021-02-19 华东师范大学 Gesture recognition system and method based on neural network visual touch sensor fusion
CN113657487A (en) * 2021-08-16 2021-11-16 深圳多模智能科技有限公司 Human body attribute classification method and device based on incremental learning

Also Published As

Publication number Publication date
CN107886061B (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN107886061A (en) Human bodys&#39; response method and system based on multi-modal depth Boltzmann machine
Jalal et al. A Triaxial acceleration-based human motion detection for ambient smart home system
CN107153871B (en) Falling detection method based on convolutional neural network and mobile phone sensor data
CN108062170A (en) Multi-class human posture recognition method based on convolutional neural networks and intelligent terminal
CN106570477A (en) Vehicle model recognition model construction method based on depth learning and vehicle model recognition method based on depth learning
JP6788264B2 (en) Facial expression recognition method, facial expression recognition device, computer program and advertisement management system
CN107784282A (en) The recognition methods of object properties, apparatus and system
CN106127749A (en) The target part recognition methods of view-based access control model attention mechanism
CN105574510A (en) Gait identification method and device
CN108388876A (en) A kind of image-recognizing method, device and relevant device
CN107341452A (en) Human bodys&#39; response method based on quaternary number space-time convolutional neural networks
CN107609572A (en) Multi-modal emotion identification method, system based on neutral net and transfer learning
CN108764059A (en) A kind of Human bodys&#39; response method and system based on neural network
CN106485214A (en) A kind of eyes based on convolutional neural networks and mouth state identification method
CN107423730A (en) A kind of body gait behavior active detecting identifying system and method folded based on semanteme
CN107092894A (en) A kind of motor behavior recognition methods based on LSTM models
CN105069413A (en) Human body gesture identification method based on depth convolution neural network
CN107679462A (en) A kind of depth multiple features fusion sorting technique based on small echo
CN108764066A (en) A kind of express delivery sorting working specification detection method based on deep learning
CN110378208B (en) Behavior identification method based on deep residual error network
CN107423727B (en) Face complex expression recognition methods based on neural network
Liu et al. Contrastive self-supervised representation learning for sensing signals from the time-frequency perspective
CN102024145A (en) Layered recognition method and system for disguised face
CN107423721A (en) Interactive action detection method, device, storage medium and processor
WO2021004510A1 (en) Sensor-based separately deployed human body behavior recognition health management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant