CN107886061A

CN107886061A - Human bodys' response method and system based on multi-modal depth Boltzmann machine

Info

Publication number: CN107886061A
Application number: CN201711061490.6A
Authority: CN
Inventors: 毕盛; 谢澈澈; 董敏
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-11-02
Filing date: 2017-11-02
Publication date: 2018-04-06
Anticipated expiration: 2037-11-02
Also published as: CN107886061B

Abstract

The invention discloses a kind of Human bodys' response method and system based on multi-modal depth Boltzmann machine, the method comprising the steps of：1) vision and the data of wearable sensors are obtained；2) vision data and wearable sensors multi-modal fusion model are established；3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network；4) classified using softmax regression model graders；5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.The present invention can improve the accuracy rate of the Human bodys' response in the case of complex scene and shortage of data.

Description

Human bodys' response method and system based on multi-modal depth Boltzmann machine

Technical field

The present invention relates to artificial intelligence, Activity recognition technical field, refers in particular to one kind and is based on multi-modal depth Bohr hereby The Human bodys' response method and system of graceful machine.

Background technology

In recent years, Robot industry is in the growth of explosion type, and robot " full application " epoch arrive.On the one hand, machine Device people is appeared in family and daily life, and on the other hand, with industrial machine man-based development, robot is widely used in vapour In the industry-by-industries such as car manufacturing industry, metal manufacturing, man-machine collaboration is realized.Human bodys' response is widely used in man-machine friendship Mutually, the field such as man-machine collaboration, robot are needed from each level of abstraction come the behavior for understanding and identifying the mankind, its accuracy identified Great effect will be played to the application development of robot technology.The Activity recognition of machine Human To Human is machine Human To Human and outer One highly important link of boundary's environment sensing, how to reduce the noise factors such as scene diversity, background complexity and identification is imitated The influence of fruit, it is always the focus of Human bodys' response research.

At present, mainly using view-based access control model and based on two kinds of think ofs of wearable sensors in terms of Human bodys' response technical research Road is carried out, but is also faced with following problem at present：

1st, robot needs to improve to the accuracy rate of Human bodys' response under complex scene：Human bodys' response is main at present By based on single vision, based on single wearable sensors, view-based access control model and the traditional data fusion side of wearable sensors For method come what is realized, these modes all can not effectively solve the problems, such as that Human bodys' response accuracy rate is low under complex scene.

2nd, to the challenge of Human bodys' response accuracy rate when multi-modal data lacks：Study at present and rare be related to this Problem, and in real life, due to people privacy and the reason such as block, can often occur to lack the situation of visual signal, this meeting Large effect is caused to the accuracy of robot identification people's behavior.

3rd, robot is in the general character and a sex chromosome mosaicism to facing people during the Activity recognition of people：Study at present it is rare be related to as What is added to the customized information of people in general character model so that model has the characteristics of personalized, and this can also influence robot Identification to human body behavior.

The content of the invention

The shortcomings that it is an object of the invention to overcome prior art and deficiency, it is proposed that a recognition accuracy is higher and can With the stronger Human bodys' response method and system based on multi-modal depth Boltzmann machine of property, it is intended to build view-based access control model and The multi-modal deep neural network model of wearable sensors is so as to improving the accuracy rate of the Activity recognition under complex scene；More Depth Boltzmann machine network is used in mode deep learning model, the Activity recognition degree of accuracy is caused so as to reduce missing data Influence；A kind of method that combination personalization features adjustment network structure establishes adaptive general character model is proposed, so as to improve machine Accuracy rate of the device people to specific owner's Activity recognition.

To achieve the above object, technical scheme proposed by the invention is as follows：

Based on the Human bodys' response method of multi-modal depth Boltzmann machine, comprise the following steps：

1) vision and the data of wearable sensors are obtained；

2) vision data and wearable sensors multi-modal fusion model are established；

3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network；

4) classified using softmax regression model graders；

5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.

In step 1), vision and wearable sensors data are obtained, are comprised the following steps：

1.1) adopted using the maximum frequency acquisition of vision Kinect sensor as vision and wearable sensors are common Collect frequency；

1.2) using Kinect vision sensors as video input feature, and it is installed in robot, passes through USB interface Convey data to notebook computer；

1.3) posture of wearable sensors selection wrist and the attitude data of waist are as input feature vector, by wireless blue The data for storing a period of time are sent to notebook computer by tooth communication；

1.4) notebook computer is pre-processed to the data of collection and the data after processing is sent to backstage graphical Work Stand and carry out deep learning.

In step 2), vision data and wearable sensors multi-modal fusion model are established, is comprised the following steps：

2.1) start frame, end frame and frame are added for the data in vision and wearable sensors each acquisition window time Numbering, data are then extracted according to frame number and inputted as deep neural network；

2.2) a kind of method of dynamically changeable acquisition window length is used, dynamic partition goes out each action cycle as slip The time span of window；

2.3) color RGB and depth D information architecture of the Kinect cameras all pixels point in an acquisition time window Into the visual feature vector of one as input；

2.4) wearable sensors are the wrist in an acquisition time window and (the 3 axles acceleration of the axle attitude transducer of waist 6 Degree and 3 axis angular rates) data collectively form wearable sensors characteristic vector as input；

2.5) deep learning is directly acquired to initial data and obtains feature by training.

In step 3), reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network, including Following steps：

3.1) space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine, sensing data are built respectively As input, depth is used as two layers of depth Boltzmann machine, it is seen that the neuron of layer and hidden layer is all Gauss unit, two The energy function of depth Boltzmann machine of layer is：

E(v,h⁽¹⁾,h⁽²⁾, θ) and=- v^TW⁽¹⁾h⁽¹⁾-h⁽¹⁾W⁽²⁾h⁽²⁾

Wherein, θ is RBM parameter { W, a, b }, and v represents visible element, h⁽ⁱ⁾I-th layer of hidden unit is represented, W is visible list The weight on the side between member and hidden unit；

3.2) multi-modal depth Boltzmann machine is built, space or depth perception Boltzmann is merged using a common hidden layer Machine and wearable sensors depth Boltzmann machine, the joint probability distribution of the network are：

Wherein, θ is joint probability distribution parameter, v_mRepresent space or depth perception Boltzmann machine visible layer, v_tRepresent wearable biography Sensor depth Boltzmann machine visible layer,Space or depth perception i-th of hidden layer of Boltzmann machine is represented,Represent wearable sensing Device depth i-th of hidden layer of Boltzmann machine；

In step 4), classified using softmax regression model graders, comprised the following steps：

4.1) training dataset is built, includes the multi-modal human action datas of Berkeley using multi-modal public data collection Collection, and the real data collection obtained combine composing training data set；

4.2) add a softmax grader in last layer of deep learning model, using the output of final layer as The input of grader, by training grader to obtain final disaggregated model；

4.3) common trait that fusion depth Boltzmann machine obtains in step 3) is used to utilize what is trained as input Softmax graders are classified.

In step 5), depth network model caused by public sample data is carried out according to user's individual character adaptive Adjustment, comprises the following steps：

5.1) hidden layer is respectively added before vision input feature vector layer and wearable sensors input feature vector layer；

5.2) data with a high credibility that individual consumer is obtained using public data network model progress Activity recognition are made To there is the sample data of mark；

5.3) by having the sample data of mark and using Mini-batch incremental learnings to model caused by common data It is trained, the size of Mini-batch needed for selection.

Based on the Human bodys' response system of multi-modal depth Boltzmann machine, including：

Data acquisition module, for gathering the original data stream of machine Human To Human's Activity recognition platform, including vision data Stream and wearable sensors data flow；

Data preprocessing module, for being filtered to the initial data of collection at noise reduction, smoothing processing and adding window Reason；

Deep learning module, for being learnt and being merged pretreated data addition deep neural network, extract The common trait of vision and attitude transducer data；

Model training module, by the study and modeling to training dataset, the multi-modal fusion depth after being trained Boltzmann machine Human bodys' response model；

Activity recognition module, human body behavior is carried out using multi-modal fusion depth Boltzmann machine Human bodys' response model Identification classification.

Preferably, the data acquisition module is specifically using Kinect sensor collection visual data streams, using 26 axles Attitude transducer gathers the data of waist and wrist respectively, and the frequency acquisition maximum using Kinect sensor is adopted as common Collect frequency.

Preferably, the data preprocessing module uses a kind of adding window method of dynamically changeable, is partitioned into each action row For cycle.

Preferably, the deep learning module specifically uses multi-modal depth Boltzmann machine, common hidden using one Hide layer fusion space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine.

Preferably, the model training module uses the multi-modal open number such as the multi-modal human action data collection of Berkeley According to collection, and the real data collection obtained combines composing training data set.

Preferably, the Activity recognition module makes an addition to depth god specifically using softmax regression models as grader Last layer through network.

The present invention compared with prior art, has the following advantages that and beneficial effect：

1st, the multi-modal neural network model of view-based access control model and wearable sensors, uses the behavior table based on global characteristics Sign method, the data fusion of vision sensor and wearable sensors is got up the behavioural characteristic of people is identified jointly, no Multiple sensors need not be only worn, can effectively reduce human body wearable sensors influences to caused by comfort level, and can The limitation of compound action can not be identified by breaking through the behavior characterizing method based on local feature, can be effectively improved in complex scene The accuracy rate of the Activity recognition of lower machine Human To Human.

2nd, using depth Boltzmann machine, the data of missing can be reconstructed, can effectively reduces shortage of data pair Influence caused by the Activity recognition degree of accuracy.Vision shooting visual angle is impacted, object is blocked, wearable device is by extraneous electricity In the case of having shortage of data caused by the factors such as magnetic disturbance, the accuracy rate of machine Human To Human's Activity recognition can be effectively improved.

3rd, a kind of method of combination personalization features adjust automatically general character model is proposed, can effectively solve the problem that robot right General character and a sex chromosome mosaicism during Human bodys' response, make robot be automatically adjusted to common data model so as to meet individual Characteristics of personality, so as to more fully understand human body behavior, improve accuracy rate of the robot to specific owner's Activity recognition.

Brief description of the drawings

Fig. 1 is the Human bodys' response method flow diagram of the invention based on multi-modal depth Boltzmann machine.

Fig. 2 is robot identification human body system of behavior platform schematic diagram of the present invention.

Fig. 3 is the multi-modal deep neural network model schematic diagram of view-based access control model and wearable sensors of the present invention.

Fig. 4 is multi-modal depth Boltzmann machine schematic diagram.

Embodiment

With reference to specific embodiment, the invention will be further described.

Human bodys' response side shown in Figure 1, that the present embodiment is provided based on multi-modal depth Boltzmann machine Method, comprise the following steps：

1) robot identification human body system of behavior platform is established, obtains vision and the data of wearable sensors；

2) vision data and wearable sensors multi-modal fusion model are established, vision and wearable sensors information are entered Row fusion；

4) human body behavior classification is carried out using softmax regression models grader；

It is shown in Figure 2, in step 1), described robot identification human body system of behavior platform, obtain vision and can The data of sensor are dressed, are comprised the following steps：

1.1) the Kinect vision sensors collection video data installed in robot；

1.2) wearable sensors use 6 axle attitude transducers (3 axle accelerations and 3 axis angular rates), are respectively arranged in intelligence In bracelet and intelligent waistband, can select human body wrist posture and waist attitude data as input feature vector；

1.3) vision and wearable in the present embodiment, is used as using the maximum frequency acquisition of vision Kinect sensor The common frequency acquisition of sensor；

1.4) after gathering vision data, Kinect vision sensors convey data to notebook computer by USB interface；

1.5) after gathering attitude transducer data, wearable sensors store a period of time by wireless blue tooth communication handle Data be sent to notebook computer.

It is shown in Figure 3, in step 2), the multi-modal depth nerve net of described view-based access control model and wearable sensors Network model, multi-modal deep neural network is built, is comprised the following steps：

2.1) in the present embodiment, for convenience of merge two kinds of sensing datas, the synchronous method used be to vision with Data addition start frame, end frame and frame number in wearable sensors each acquisition window time；

2.2) data are extracted according to frame number to input as deep neural network, ensures wearable sensors and vision The uniformity of Kinect sensor time；

2.3) in the present embodiment, using a kind of adding window method of dynamically changeable, isolate each action cycle, slide The length of window is the time span of each action cycle, and sliding step is half of length of window；

2.4) while acquisition characteristics data, vision data and wearable sensors data are analyzed, found dynamic Make the key point that changes and in this as the beginning and end of acquisition window；

2.5) it is the gatherer process of not effect characteristicses, gatherer process and analysis concurrent processization are carried out；

2.6) in an acquisition time window, color RGB and depth D information structure of the Kinect cameras all pixels point The visual feature vector of one is built up as input；

2.7) in an acquisition time window, wearable sensors are wrist 6 axle attitude transducer (3 axle accelerations and 3 Axis angular rate) data and the axle attitude transducer of waist 6 (3 axle accelerations and 3 axis angular rates) data collectively form wearable sensing Device characteristic vector is as input；

2.8) deep learning is directly acquired to initial data and obtains feature by training.

It is shown in Figure 4, in step 3), described multi-modal depth Boltzmann machine, to the god of missing data reconstruction Through network structure, comprise the following steps：

3.1) in the present embodiment, it is using two layers of depth Boltzmann machine, its energy function：

3.2) space or depth perception Boltzmann machine is built, Kinect vision sensors use depth as two layers as input Depth Boltzmann machine, it is seen that the neuron of layer and hidden layer is all Gauss unit；

3.3) data of wearable sensors depth Boltzmann machine wearable sensors are built as input, also using deep Spend the depth Boltzmann machine for two layers, it is seen that the neuron of layer and hidden layer is all Gauss unit；

3.4) the multi-modal depth Boltzmann being made up of two depth Boltzmann machines in the present embodiment, is built Machine, there is a common hidden layer to combine the two depth networks in this structure.Assuming that the visible layer of a depth network is v_m, another is v_t, then the joint probability distribution of the network be：

3.5) the multi-modal deep neural network model of view-based access control model and wearable sensors is built, by two depth Bohr Zi Man mechanisms into multi-modal depth Boltzmann machine, have common hidden layer (representing a characteristic layer jointly) fusion vision and The two depth networks of wearable sensors.

In step 4), described carries out human body behavior classification, including following step using softmax regression models grader Suddenly：

4.1) training dataset is built, utilizes the multi-modal public datas such as the multi-modal human action data collection of Berkeley Collection, and the real data collection that this research team obtains through various channels combine composing training data set；

4.3) obtained using fusion space or depth perception Boltzmann machine and wearable sensors Boltzmann machine in step 3) Common trait is classified as input using the softmax graders trained.

In step 5), described personal feature is automatically adjusted to depth network model caused by common data, its It is divided into and improves network structure and mark the incremental learning two ways that new samples are trained, implementation steps is as follows：

5.1) network structure is improved, extends original neural network structure, its specific steps includes：

5.1.1 a hidden layer) is respectively added before vision input feature vector layer and wearable sensors input feature vector layer；

When 5.1.2) together with user and robot, the training of unsupervised learning is re-started；

5.1.3 the content with individual consumer's behavior) is trained in new network structure；

5.2) individual consumer, is entered every trade by the incremental learning that mark new samples are trained using public data network model To identify that obtained data with a high credibility include as the sample data for having mark, its specific steps：

5.2.1) determine whether the data of collection are normal according to sensor self character；

5.2.2) confidence level is obtained with reference to the Softmax disaggregated model COMPREHENSIVE CALCULATINGs of output result；

5.2.3) by having the sample data of mark and using Mini-batch incremental learnings to mould caused by common data Type is trained, and detailed process is：Whole samples are divided into several parts, per a renewal primary parameter, every part of sample size is more, The precision of model training is higher, but the time spent is more, in precision and the trade-off problem of time, reasonable selection Mini-batch Size.

A kind of Human bodys' response system based on multi-modal depth Boltzmann machine provided below by the present embodiment, Including：

Data acquisition module：For gathering the original data stream of machine Human To Human's Activity recognition platform, including vision data Stream and wearable sensors data flow.In the present embodiment, video data is gathered using Kinect sensor, using 26 axles Attitude transducer gathers the data of waist and wrist respectively, and the frequency acquisition maximum using Kinect sensor is adopted as common Collect frequency.

Data preprocessing module：For being filtered to the initial data of collection at noise reduction, smoothing processing and adding window Reason.In the present embodiment, using a kind of dynamic adding window mode, grown using the Cycle Length of each human body behavior as window Degree, the eigenmatrix of data in each window is extracted as input.

Deep learning module：For being learnt and being merged pretreated data addition deep neural network, extract The common trait of vision and attitude transducer data.In the present embodiment, using multi-modal depth Boltzmann machine, one is used Individual common hidden layer fusion space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine, so as to more sensings Device data are merged and train extraction common trait.

Model training module：By the study and modeling to training dataset, the multi-modal fusion depth after being trained Boltzmann machine Human bodys' response model.In the present embodiment, using multi-modal human action data collection of Berkeley etc. Multi-modal public data collection, and the real data collection that this research team obtains through various channels combine composing training data Collection.

Activity recognition module：Human body behavior is carried out using multi-modal fusion depth Boltzmann machine Human bodys' response model Identification classification.In the present embodiment, deep neural network is made an addition to as grader using softmax regression models Last layer.

In the above-described embodiments, included modules are to be divided according to the function logic of the present invention, but Above-mentioned division is not limited to, as long as corresponding function can be realized, the protection domain being not intended to limit the invention.

In summary, Human bodys' response method provided by the present invention based on multi-modal depth Boltzmann machine and it is System, build the multi-modal neural network model of view-based access control model and wearable sensors, it is possible to increase the robot under complex scene To the accuracy rate of the Activity recognition of people；Suitable deep neural network structure is used in multi-modal deep learning model, can Reduce because missing data influences to caused by the Activity recognition degree of accuracy；It is proposed that one kind combines personalization features adjust automatically general character The method of model, it is possible to increase accuracy rate of the robot to specific owner's Activity recognition.The present invention can be used for people and robot Cooperation, so as to improve the success rate of man-machine collaboration.Supervised extremely in addition, technical method provided by the invention can also be extended to human body The various fields such as survey, video monitoring, smart home, identity authentication and motion analysis, there is extensive Research Significance, be worth pushing away Extensively.

Embodiment described above is only the preferred embodiments of the invention, and the practical range of the present invention is not limited with this, therefore The change that all shape, principles according to the present invention are made, it all should cover within the scope of the present invention.

Claims

1. the Human bodys' response method based on multi-modal depth Boltzmann machine, it is characterised in that comprise the following steps：

1) vision and the data of wearable sensors are obtained；

2) vision data and wearable sensors multi-modal fusion model are established；

4) classified using softmax regression model graders；

2. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist In in step 1), obtaining vision and wearable sensors data, comprise the following steps：

1.1) the collection frequency that the maximum frequency acquisition of vision Kinect sensor is common as vision and wearable sensors is used Rate；

1.2) using Kinect vision sensors as video input feature, and it is installed in robot, by USB interface number According to sending notebook computer to；

1.3) posture of wearable sensors selection wrist and the attitude data of waist are led to as input feature vector by wireless blue tooth The data for storing a period of time are sent to notebook computer by letter；

1.4) notebook computer is pre-processed and the data after processing is sent to backstage graphics workstation to the data of collection Row deep learning.

3. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist In in step 2), establishing vision data and wearable sensors multi-modal fusion model, comprise the following steps：

2.1) start frame, end frame and frame is added for the data in vision and wearable sensors each acquisition window time to compile Number, data are then extracted according to frame number and inputted as deep neural network；

2.2) a kind of method of dynamically changeable acquisition window length is used, dynamic partition goes out each action cycle as sliding window Time span；

2.3) Kinect cameras the color RGB of all pixels point in an acquisition time window and depth D information architectures into one Individual visual feature vector is as input；

2.4) wearable sensors collectively form the wrist in an acquisition time window and the axle attitude transducer data of waist 6 Wearable sensors characteristic vector is as input；

4. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist In in step 3), utilizing deep neural network to carry out reconstruct of the isomery transfer learning realization to missing data, including following step Suddenly：

3.1) space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine, sensing data conduct are built respectively Input, depth is used as two layers of depth Boltzmann machine, it is seen that the neuron of layer and hidden layer is all Gauss unit, two layers The energy function of depth Boltzmann machine is：

Wherein, θ is RBM parameter { W, a, b }, and v represents visible element, h⁽ⁱ⁾Represent i-th layer of hidden unit, W be visible element and The weight on the side between hidden unit；

3.2) build multi-modal depth Boltzmann machine, using a common hidden layer merge space or depth perception Boltzmann machine and Wearable sensors depth Boltzmann machine, the joint probability distribution of the network are：

<mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>m</mi> </msub> <mo>,</mo> <msub> <mi>v</mi> <mi>t</mi> </msub> <mo>;</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msup> <mi>h</mi> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </msup> </mrow> </munder> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msup> <mi>h</mi> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </msup> <mo>)</mo> </mrow> <mo>&lsqb;</mo> <munder> <mo>&Sigma;</mo> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> </munder> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>m</mi> </msub> <mo>,</mo> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>h</mi> <mi>m</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>&lsqb;</mo> <munder> <mo>&Sigma;</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> </munder> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>v</mi> <mi>t</mi> </msub> <mo>,</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>h</mi> <mi>t</mi> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow>

Wherein, θ is joint probability distribution parameter, v_mRepresent space or depth perception Boltzmann machine visible layer, v_tRepresent wearable sensors Depth Boltzmann machine visible layer,Space or depth perception i-th of hidden layer of Boltzmann machine is represented,Represent that wearable sensors are deep Spend i-th of hidden layer of Boltzmann machine；

4.1) training dataset is built, includes the multi-modal human action data collection of Berkeley using multi-modal public data collection, And the real data collection obtained combines composing training data set；

4.2) a softmax grader is added in last layer of deep learning model, using the output of final layer as classification The input of device, by training grader to obtain final disaggregated model；

5. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist In, in step 5), depth network model caused by public sample data is adaptively adjusted according to user's individual character, Comprise the following steps：

5.2) individual consumer is carried out the obtained data with a high credibility of Activity recognition as having using public data network model The sample data of mark；

5.3) by having the sample data of mark and model caused by common data being carried out using Mini-batch incremental learnings Training, Mini-batch size needed for selection.

6. the Human bodys' response system based on multi-modal depth Boltzmann machine, it is characterised in that including：

Data acquisition module, for gathering the original data stream of machine Human To Human's Activity recognition platform, including visual data streams and Wearable sensors data flow；

Data preprocessing module, for carrying out noise reduction, smooth and windowing process to the initial data of collection；

Deep learning module, for being learnt and being merged pretreated data addition deep neural network, extract vision With the common trait of attitude transducer data；

Model training module, by the study and modeling to training dataset, multi-modal fusion depth Bohr after being trained Hereby graceful machine Human bodys' response model；

Activity recognition module, the knowledge of human body behavior is carried out using multi-modal fusion depth Boltzmann machine Human bodys' response model Do not classify.

7. the Human bodys' response system according to claim 6 based on multi-modal depth Boltzmann machine, its feature exist In：The data acquisition module is adopted respectively using Kinect sensor collection visual data streams using 26 axle attitude transducers Collect the data of waist and wrist, the frequency acquisition maximum using Kinect sensor is used as common frequency acquisition；The data Pretreatment module uses a kind of adding window method of dynamically changeable, is partitioned into the cycle of each action behavior；The deep learning mould Block uses multi-modal depth Boltzmann machine, and space or depth perception Boltzmann machine and wearable is merged using a common hidden layer Sensor depth Boltzmann machine；The model training module includes the multi-modal people of Berkeley using multi-modal public data collection Body action data collection, and the real data collection obtained combine composing training data set；The Activity recognition module uses Softmax regression models make an addition to last layer of deep neural network as grader.