CN107886061A - Human bodys' response method and system based on multi-modal depth Boltzmann machine - Google Patents
Human bodys' response method and system based on multi-modal depth Boltzmann machine Download PDFInfo
- Publication number
- CN107886061A CN107886061A CN201711061490.6A CN201711061490A CN107886061A CN 107886061 A CN107886061 A CN 107886061A CN 201711061490 A CN201711061490 A CN 201711061490A CN 107886061 A CN107886061 A CN 107886061A
- Authority
- CN
- China
- Prior art keywords
- data
- mrow
- depth
- boltzmann machine
- modal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
Abstract
The invention discloses a kind of Human bodys' response method and system based on multi-modal depth Boltzmann machine, the method comprising the steps of:1) vision and the data of wearable sensors are obtained;2) vision data and wearable sensors multi-modal fusion model are established;3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network;4) classified using softmax regression model graders;5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.The present invention can improve the accuracy rate of the Human bodys' response in the case of complex scene and shortage of data.
Description
Technical field
The present invention relates to artificial intelligence, Activity recognition technical field, refers in particular to one kind and is based on multi-modal depth Bohr hereby
The Human bodys' response method and system of graceful machine.
Background technology
In recent years, Robot industry is in the growth of explosion type, and robot " full application " epoch arrive.On the one hand, machine
Device people is appeared in family and daily life, and on the other hand, with industrial machine man-based development, robot is widely used in vapour
In the industry-by-industries such as car manufacturing industry, metal manufacturing, man-machine collaboration is realized.Human bodys' response is widely used in man-machine friendship
Mutually, the field such as man-machine collaboration, robot are needed from each level of abstraction come the behavior for understanding and identifying the mankind, its accuracy identified
Great effect will be played to the application development of robot technology.The Activity recognition of machine Human To Human is machine Human To Human and outer
One highly important link of boundary's environment sensing, how to reduce the noise factors such as scene diversity, background complexity and identification is imitated
The influence of fruit, it is always the focus of Human bodys' response research.
At present, mainly using view-based access control model and based on two kinds of think ofs of wearable sensors in terms of Human bodys' response technical research
Road is carried out, but is also faced with following problem at present:
1st, robot needs to improve to the accuracy rate of Human bodys' response under complex scene:Human bodys' response is main at present
By based on single vision, based on single wearable sensors, view-based access control model and the traditional data fusion side of wearable sensors
For method come what is realized, these modes all can not effectively solve the problems, such as that Human bodys' response accuracy rate is low under complex scene.
2nd, to the challenge of Human bodys' response accuracy rate when multi-modal data lacks:Study at present and rare be related to this
Problem, and in real life, due to people privacy and the reason such as block, can often occur to lack the situation of visual signal, this meeting
Large effect is caused to the accuracy of robot identification people's behavior.
3rd, robot is in the general character and a sex chromosome mosaicism to facing people during the Activity recognition of people:Study at present it is rare be related to as
What is added to the customized information of people in general character model so that model has the characteristics of personalized, and this can also influence robot
Identification to human body behavior.
The content of the invention
The shortcomings that it is an object of the invention to overcome prior art and deficiency, it is proposed that a recognition accuracy is higher and can
With the stronger Human bodys' response method and system based on multi-modal depth Boltzmann machine of property, it is intended to build view-based access control model and
The multi-modal deep neural network model of wearable sensors is so as to improving the accuracy rate of the Activity recognition under complex scene;More
Depth Boltzmann machine network is used in mode deep learning model, the Activity recognition degree of accuracy is caused so as to reduce missing data
Influence;A kind of method that combination personalization features adjustment network structure establishes adaptive general character model is proposed, so as to improve machine
Accuracy rate of the device people to specific owner's Activity recognition.
To achieve the above object, technical scheme proposed by the invention is as follows:
Based on the Human bodys' response method of multi-modal depth Boltzmann machine, comprise the following steps:
1) vision and the data of wearable sensors are obtained;
2) vision data and wearable sensors multi-modal fusion model are established;
3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network;
4) classified using softmax regression model graders;
5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.
In step 1), vision and wearable sensors data are obtained, are comprised the following steps:
1.1) adopted using the maximum frequency acquisition of vision Kinect sensor as vision and wearable sensors are common
Collect frequency;
1.2) using Kinect vision sensors as video input feature, and it is installed in robot, passes through USB interface
Convey data to notebook computer;
1.3) posture of wearable sensors selection wrist and the attitude data of waist are as input feature vector, by wireless blue
The data for storing a period of time are sent to notebook computer by tooth communication;
1.4) notebook computer is pre-processed to the data of collection and the data after processing is sent to backstage graphical Work
Stand and carry out deep learning.
In step 2), vision data and wearable sensors multi-modal fusion model are established, is comprised the following steps:
2.1) start frame, end frame and frame are added for the data in vision and wearable sensors each acquisition window time
Numbering, data are then extracted according to frame number and inputted as deep neural network;
2.2) a kind of method of dynamically changeable acquisition window length is used, dynamic partition goes out each action cycle as slip
The time span of window;
2.3) color RGB and depth D information architecture of the Kinect cameras all pixels point in an acquisition time window
Into the visual feature vector of one as input;
2.4) wearable sensors are the wrist in an acquisition time window and (the 3 axles acceleration of the axle attitude transducer of waist 6
Degree and 3 axis angular rates) data collectively form wearable sensors characteristic vector as input;
2.5) deep learning is directly acquired to initial data and obtains feature by training.
In step 3), reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network, including
Following steps:
3.1) space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine, sensing data are built respectively
As input, depth is used as two layers of depth Boltzmann machine, it is seen that the neuron of layer and hidden layer is all Gauss unit, two
The energy function of depth Boltzmann machine of layer is:
E(v,h(1),h(2), θ) and=- vTW(1)h(1)-h(1)W(2)h(2)
Wherein, θ is RBM parameter { W, a, b }, and v represents visible element, h(i)I-th layer of hidden unit is represented, W is visible list
The weight on the side between member and hidden unit;
3.2) multi-modal depth Boltzmann machine is built, space or depth perception Boltzmann is merged using a common hidden layer
Machine and wearable sensors depth Boltzmann machine, the joint probability distribution of the network are:
Wherein, θ is joint probability distribution parameter, vmRepresent space or depth perception Boltzmann machine visible layer, vtRepresent wearable biography
Sensor depth Boltzmann machine visible layer,Space or depth perception i-th of hidden layer of Boltzmann machine is represented,Represent wearable sensing
Device depth i-th of hidden layer of Boltzmann machine;
In step 4), classified using softmax regression model graders, comprised the following steps:
4.1) training dataset is built, includes the multi-modal human action datas of Berkeley using multi-modal public data collection
Collection, and the real data collection obtained combine composing training data set;
4.2) add a softmax grader in last layer of deep learning model, using the output of final layer as
The input of grader, by training grader to obtain final disaggregated model;
4.3) common trait that fusion depth Boltzmann machine obtains in step 3) is used to utilize what is trained as input
Softmax graders are classified.
In step 5), depth network model caused by public sample data is carried out according to user's individual character adaptive
Adjustment, comprises the following steps:
5.1) hidden layer is respectively added before vision input feature vector layer and wearable sensors input feature vector layer;
5.2) data with a high credibility that individual consumer is obtained using public data network model progress Activity recognition are made
To there is the sample data of mark;
5.3) by having the sample data of mark and using Mini-batch incremental learnings to model caused by common data
It is trained, the size of Mini-batch needed for selection.
Based on the Human bodys' response system of multi-modal depth Boltzmann machine, including:
Data acquisition module, for gathering the original data stream of machine Human To Human's Activity recognition platform, including vision data
Stream and wearable sensors data flow;
Data preprocessing module, for being filtered to the initial data of collection at noise reduction, smoothing processing and adding window
Reason;
Deep learning module, for being learnt and being merged pretreated data addition deep neural network, extract
The common trait of vision and attitude transducer data;
Model training module, by the study and modeling to training dataset, the multi-modal fusion depth after being trained
Boltzmann machine Human bodys' response model;
Activity recognition module, human body behavior is carried out using multi-modal fusion depth Boltzmann machine Human bodys' response model
Identification classification.
Preferably, the data acquisition module is specifically using Kinect sensor collection visual data streams, using 26 axles
Attitude transducer gathers the data of waist and wrist respectively, and the frequency acquisition maximum using Kinect sensor is adopted as common
Collect frequency.
Preferably, the data preprocessing module uses a kind of adding window method of dynamically changeable, is partitioned into each action row
For cycle.
Preferably, the deep learning module specifically uses multi-modal depth Boltzmann machine, common hidden using one
Hide layer fusion space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine.
Preferably, the model training module uses the multi-modal open number such as the multi-modal human action data collection of Berkeley
According to collection, and the real data collection obtained combines composing training data set.
Preferably, the Activity recognition module makes an addition to depth god specifically using softmax regression models as grader
Last layer through network.
The present invention compared with prior art, has the following advantages that and beneficial effect:
1st, the multi-modal neural network model of view-based access control model and wearable sensors, uses the behavior table based on global characteristics
Sign method, the data fusion of vision sensor and wearable sensors is got up the behavioural characteristic of people is identified jointly, no
Multiple sensors need not be only worn, can effectively reduce human body wearable sensors influences to caused by comfort level, and can
The limitation of compound action can not be identified by breaking through the behavior characterizing method based on local feature, can be effectively improved in complex scene
The accuracy rate of the Activity recognition of lower machine Human To Human.
2nd, using depth Boltzmann machine, the data of missing can be reconstructed, can effectively reduces shortage of data pair
Influence caused by the Activity recognition degree of accuracy.Vision shooting visual angle is impacted, object is blocked, wearable device is by extraneous electricity
In the case of having shortage of data caused by the factors such as magnetic disturbance, the accuracy rate of machine Human To Human's Activity recognition can be effectively improved.
3rd, a kind of method of combination personalization features adjust automatically general character model is proposed, can effectively solve the problem that robot right
General character and a sex chromosome mosaicism during Human bodys' response, make robot be automatically adjusted to common data model so as to meet individual
Characteristics of personality, so as to more fully understand human body behavior, improve accuracy rate of the robot to specific owner's Activity recognition.
Brief description of the drawings
Fig. 1 is the Human bodys' response method flow diagram of the invention based on multi-modal depth Boltzmann machine.
Fig. 2 is robot identification human body system of behavior platform schematic diagram of the present invention.
Fig. 3 is the multi-modal deep neural network model schematic diagram of view-based access control model and wearable sensors of the present invention.
Fig. 4 is multi-modal depth Boltzmann machine schematic diagram.
Embodiment
With reference to specific embodiment, the invention will be further described.
Human bodys' response side shown in Figure 1, that the present embodiment is provided based on multi-modal depth Boltzmann machine
Method, comprise the following steps:
1) robot identification human body system of behavior platform is established, obtains vision and the data of wearable sensors;
2) vision data and wearable sensors multi-modal fusion model are established, vision and wearable sensors information are entered
Row fusion;
3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network;
4) human body behavior classification is carried out using softmax regression models grader;
5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.
It is shown in Figure 2, in step 1), described robot identification human body system of behavior platform, obtain vision and can
The data of sensor are dressed, are comprised the following steps:
1.1) the Kinect vision sensors collection video data installed in robot;
1.2) wearable sensors use 6 axle attitude transducers (3 axle accelerations and 3 axis angular rates), are respectively arranged in intelligence
In bracelet and intelligent waistband, can select human body wrist posture and waist attitude data as input feature vector;
1.3) vision and wearable in the present embodiment, is used as using the maximum frequency acquisition of vision Kinect sensor
The common frequency acquisition of sensor;
1.4) after gathering vision data, Kinect vision sensors convey data to notebook computer by USB interface;
1.5) after gathering attitude transducer data, wearable sensors store a period of time by wireless blue tooth communication handle
Data be sent to notebook computer.
It is shown in Figure 3, in step 2), the multi-modal depth nerve net of described view-based access control model and wearable sensors
Network model, multi-modal deep neural network is built, is comprised the following steps:
2.1) in the present embodiment, for convenience of merge two kinds of sensing datas, the synchronous method used be to vision with
Data addition start frame, end frame and frame number in wearable sensors each acquisition window time;
2.2) data are extracted according to frame number to input as deep neural network, ensures wearable sensors and vision
The uniformity of Kinect sensor time;
2.3) in the present embodiment, using a kind of adding window method of dynamically changeable, isolate each action cycle, slide
The length of window is the time span of each action cycle, and sliding step is half of length of window;
2.4) while acquisition characteristics data, vision data and wearable sensors data are analyzed, found dynamic
Make the key point that changes and in this as the beginning and end of acquisition window;
2.5) it is the gatherer process of not effect characteristicses, gatherer process and analysis concurrent processization are carried out;
2.6) in an acquisition time window, color RGB and depth D information structure of the Kinect cameras all pixels point
The visual feature vector of one is built up as input;
2.7) in an acquisition time window, wearable sensors are wrist 6 axle attitude transducer (3 axle accelerations and 3
Axis angular rate) data and the axle attitude transducer of waist 6 (3 axle accelerations and 3 axis angular rates) data collectively form wearable sensing
Device characteristic vector is as input;
2.8) deep learning is directly acquired to initial data and obtains feature by training.
It is shown in Figure 4, in step 3), described multi-modal depth Boltzmann machine, to the god of missing data reconstruction
Through network structure, comprise the following steps:
3.1) in the present embodiment, it is using two layers of depth Boltzmann machine, its energy function:
E(v,h(1),h(2), θ) and=- vTW(1)h(1)-h(1)W(2)h(2)
Wherein, θ is RBM parameter { W, a, b }, and v represents visible element, h(i)I-th layer of hidden unit is represented, W is visible list
The weight on the side between member and hidden unit;
3.2) space or depth perception Boltzmann machine is built, Kinect vision sensors use depth as two layers as input
Depth Boltzmann machine, it is seen that the neuron of layer and hidden layer is all Gauss unit;
3.3) data of wearable sensors depth Boltzmann machine wearable sensors are built as input, also using deep
Spend the depth Boltzmann machine for two layers, it is seen that the neuron of layer and hidden layer is all Gauss unit;
3.4) the multi-modal depth Boltzmann being made up of two depth Boltzmann machines in the present embodiment, is built
Machine, there is a common hidden layer to combine the two depth networks in this structure.Assuming that the visible layer of a depth network is
vm, another is vt, then the joint probability distribution of the network be:
Wherein, θ is joint probability distribution parameter, vmRepresent space or depth perception Boltzmann machine visible layer, vtRepresent wearable biography
Sensor depth Boltzmann machine visible layer,Space or depth perception i-th of hidden layer of Boltzmann machine is represented,Represent wearable sensing
Device depth i-th of hidden layer of Boltzmann machine;
3.5) the multi-modal deep neural network model of view-based access control model and wearable sensors is built, by two depth Bohr
Zi Man mechanisms into multi-modal depth Boltzmann machine, have common hidden layer (representing a characteristic layer jointly) fusion vision and
The two depth networks of wearable sensors.
In step 4), described carries out human body behavior classification, including following step using softmax regression models grader
Suddenly:
4.1) training dataset is built, utilizes the multi-modal public datas such as the multi-modal human action data collection of Berkeley
Collection, and the real data collection that this research team obtains through various channels combine composing training data set;
4.2) add a softmax grader in last layer of deep learning model, using the output of final layer as
The input of grader, by training grader to obtain final disaggregated model;
4.3) obtained using fusion space or depth perception Boltzmann machine and wearable sensors Boltzmann machine in step 3)
Common trait is classified as input using the softmax graders trained.
In step 5), described personal feature is automatically adjusted to depth network model caused by common data, its
It is divided into and improves network structure and mark the incremental learning two ways that new samples are trained, implementation steps is as follows:
5.1) network structure is improved, extends original neural network structure, its specific steps includes:
5.1.1 a hidden layer) is respectively added before vision input feature vector layer and wearable sensors input feature vector layer;
When 5.1.2) together with user and robot, the training of unsupervised learning is re-started;
5.1.3 the content with individual consumer's behavior) is trained in new network structure;
5.2) individual consumer, is entered every trade by the incremental learning that mark new samples are trained using public data network model
To identify that obtained data with a high credibility include as the sample data for having mark, its specific steps:
5.2.1) determine whether the data of collection are normal according to sensor self character;
5.2.2) confidence level is obtained with reference to the Softmax disaggregated model COMPREHENSIVE CALCULATINGs of output result;
5.2.3) by having the sample data of mark and using Mini-batch incremental learnings to mould caused by common data
Type is trained, and detailed process is:Whole samples are divided into several parts, per a renewal primary parameter, every part of sample size is more,
The precision of model training is higher, but the time spent is more, in precision and the trade-off problem of time, reasonable selection Mini-batch
Size.
A kind of Human bodys' response system based on multi-modal depth Boltzmann machine provided below by the present embodiment,
Including:
Data acquisition module:For gathering the original data stream of machine Human To Human's Activity recognition platform, including vision data
Stream and wearable sensors data flow.In the present embodiment, video data is gathered using Kinect sensor, using 26 axles
Attitude transducer gathers the data of waist and wrist respectively, and the frequency acquisition maximum using Kinect sensor is adopted as common
Collect frequency.
Data preprocessing module:For being filtered to the initial data of collection at noise reduction, smoothing processing and adding window
Reason.In the present embodiment, using a kind of dynamic adding window mode, grown using the Cycle Length of each human body behavior as window
Degree, the eigenmatrix of data in each window is extracted as input.
Deep learning module:For being learnt and being merged pretreated data addition deep neural network, extract
The common trait of vision and attitude transducer data.In the present embodiment, using multi-modal depth Boltzmann machine, one is used
Individual common hidden layer fusion space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine, so as to more sensings
Device data are merged and train extraction common trait.
Model training module:By the study and modeling to training dataset, the multi-modal fusion depth after being trained
Boltzmann machine Human bodys' response model.In the present embodiment, using multi-modal human action data collection of Berkeley etc.
Multi-modal public data collection, and the real data collection that this research team obtains through various channels combine composing training data
Collection.
Activity recognition module:Human body behavior is carried out using multi-modal fusion depth Boltzmann machine Human bodys' response model
Identification classification.In the present embodiment, deep neural network is made an addition to as grader using softmax regression models
Last layer.
In the above-described embodiments, included modules are to be divided according to the function logic of the present invention, but
Above-mentioned division is not limited to, as long as corresponding function can be realized, the protection domain being not intended to limit the invention.
In summary, Human bodys' response method provided by the present invention based on multi-modal depth Boltzmann machine and it is
System, build the multi-modal neural network model of view-based access control model and wearable sensors, it is possible to increase the robot under complex scene
To the accuracy rate of the Activity recognition of people;Suitable deep neural network structure is used in multi-modal deep learning model, can
Reduce because missing data influences to caused by the Activity recognition degree of accuracy;It is proposed that one kind combines personalization features adjust automatically general character
The method of model, it is possible to increase accuracy rate of the robot to specific owner's Activity recognition.The present invention can be used for people and robot
Cooperation, so as to improve the success rate of man-machine collaboration.Supervised extremely in addition, technical method provided by the invention can also be extended to human body
The various fields such as survey, video monitoring, smart home, identity authentication and motion analysis, there is extensive Research Significance, be worth pushing away
Extensively.
Embodiment described above is only the preferred embodiments of the invention, and the practical range of the present invention is not limited with this, therefore
The change that all shape, principles according to the present invention are made, it all should cover within the scope of the present invention.
Claims (7)
1. the Human bodys' response method based on multi-modal depth Boltzmann machine, it is characterised in that comprise the following steps:
1) vision and the data of wearable sensors are obtained;
2) vision data and wearable sensors multi-modal fusion model are established;
3) reconstruct of the isomery transfer learning realization to missing data is carried out using deep neural network;
4) classified using softmax regression model graders;
5) depth network model caused by public sample data is adaptively adjusted according to user's individual character.
2. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist
In in step 1), obtaining vision and wearable sensors data, comprise the following steps:
1.1) the collection frequency that the maximum frequency acquisition of vision Kinect sensor is common as vision and wearable sensors is used
Rate;
1.2) using Kinect vision sensors as video input feature, and it is installed in robot, by USB interface number
According to sending notebook computer to;
1.3) posture of wearable sensors selection wrist and the attitude data of waist are led to as input feature vector by wireless blue tooth
The data for storing a period of time are sent to notebook computer by letter;
1.4) notebook computer is pre-processed and the data after processing is sent to backstage graphics workstation to the data of collection
Row deep learning.
3. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist
In in step 2), establishing vision data and wearable sensors multi-modal fusion model, comprise the following steps:
2.1) start frame, end frame and frame is added for the data in vision and wearable sensors each acquisition window time to compile
Number, data are then extracted according to frame number and inputted as deep neural network;
2.2) a kind of method of dynamically changeable acquisition window length is used, dynamic partition goes out each action cycle as sliding window
Time span;
2.3) Kinect cameras the color RGB of all pixels point in an acquisition time window and depth D information architectures into one
Individual visual feature vector is as input;
2.4) wearable sensors collectively form the wrist in an acquisition time window and the axle attitude transducer data of waist 6
Wearable sensors characteristic vector is as input;
2.5) deep learning is directly acquired to initial data and obtains feature by training.
4. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist
In in step 3), utilizing deep neural network to carry out reconstruct of the isomery transfer learning realization to missing data, including following step
Suddenly:
3.1) space or depth perception Boltzmann machine and wearable sensors depth Boltzmann machine, sensing data conduct are built respectively
Input, depth is used as two layers of depth Boltzmann machine, it is seen that the neuron of layer and hidden layer is all Gauss unit, two layers
The energy function of depth Boltzmann machine is:
E(v,h(1),h(2), θ) and=- vTW(1)h(1)-h(1)W(2)h(2)
Wherein, θ is RBM parameter { W, a, b }, and v represents visible element, h(i)Represent i-th layer of hidden unit, W be visible element and
The weight on the side between hidden unit;
3.2) build multi-modal depth Boltzmann machine, using a common hidden layer merge space or depth perception Boltzmann machine and
Wearable sensors depth Boltzmann machine, the joint probability distribution of the network are:
<mrow>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mi>m</mi>
</msub>
<mo>,</mo>
<msub>
<mi>v</mi>
<mi>t</mi>
</msub>
<mo>;</mo>
<mi>&theta;</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mrow>
<msubsup>
<mi>h</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>h</mi>
<mi>t</mi>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>,</mo>
<msup>
<mi>h</mi>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</msup>
</mrow>
</munder>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>h</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>h</mi>
<mi>t</mi>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>,</mo>
<msup>
<mi>h</mi>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</msup>
<mo>)</mo>
</mrow>
<mo>&lsqb;</mo>
<munder>
<mo>&Sigma;</mo>
<msubsup>
<mi>h</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
</munder>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mi>m</mi>
</msub>
<mo>,</mo>
<msubsup>
<mi>h</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>h</mi>
<mi>m</mi>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<mo>&lsqb;</mo>
<munder>
<mo>&Sigma;</mo>
<msubsup>
<mi>h</mi>
<mi>t</mi>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
</munder>
<mi>P</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>v</mi>
<mi>t</mi>
</msub>
<mo>,</mo>
<msubsup>
<mi>h</mi>
<mi>t</mi>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>,</mo>
<msubsup>
<mi>h</mi>
<mi>t</mi>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
Wherein, θ is joint probability distribution parameter, vmRepresent space or depth perception Boltzmann machine visible layer, vtRepresent wearable sensors
Depth Boltzmann machine visible layer,Space or depth perception i-th of hidden layer of Boltzmann machine is represented,Represent that wearable sensors are deep
Spend i-th of hidden layer of Boltzmann machine;
In step 4), classified using softmax regression model graders, comprised the following steps:
4.1) training dataset is built, includes the multi-modal human action data collection of Berkeley using multi-modal public data collection,
And the real data collection obtained combines composing training data set;
4.2) a softmax grader is added in last layer of deep learning model, using the output of final layer as classification
The input of device, by training grader to obtain final disaggregated model;
4.3) common trait that fusion depth Boltzmann machine obtains in step 3) is used to utilize what is trained as input
Softmax graders are classified.
5. the Human bodys' response method according to claim 1 based on multi-modal depth Boltzmann machine, its feature exist
In, in step 5), depth network model caused by public sample data is adaptively adjusted according to user's individual character,
Comprise the following steps:
5.1) hidden layer is respectively added before vision input feature vector layer and wearable sensors input feature vector layer;
5.2) individual consumer is carried out the obtained data with a high credibility of Activity recognition as having using public data network model
The sample data of mark;
5.3) by having the sample data of mark and model caused by common data being carried out using Mini-batch incremental learnings
Training, Mini-batch size needed for selection.
6. the Human bodys' response system based on multi-modal depth Boltzmann machine, it is characterised in that including:
Data acquisition module, for gathering the original data stream of machine Human To Human's Activity recognition platform, including visual data streams and
Wearable sensors data flow;
Data preprocessing module, for carrying out noise reduction, smooth and windowing process to the initial data of collection;
Deep learning module, for being learnt and being merged pretreated data addition deep neural network, extract vision
With the common trait of attitude transducer data;
Model training module, by the study and modeling to training dataset, multi-modal fusion depth Bohr after being trained
Hereby graceful machine Human bodys' response model;
Activity recognition module, the knowledge of human body behavior is carried out using multi-modal fusion depth Boltzmann machine Human bodys' response model
Do not classify.
7. the Human bodys' response system according to claim 6 based on multi-modal depth Boltzmann machine, its feature exist
In:The data acquisition module is adopted respectively using Kinect sensor collection visual data streams using 26 axle attitude transducers
Collect the data of waist and wrist, the frequency acquisition maximum using Kinect sensor is used as common frequency acquisition;The data
Pretreatment module uses a kind of adding window method of dynamically changeable, is partitioned into the cycle of each action behavior;The deep learning mould
Block uses multi-modal depth Boltzmann machine, and space or depth perception Boltzmann machine and wearable is merged using a common hidden layer
Sensor depth Boltzmann machine;The model training module includes the multi-modal people of Berkeley using multi-modal public data collection
Body action data collection, and the real data collection obtained combine composing training data set;The Activity recognition module uses
Softmax regression models make an addition to last layer of deep neural network as grader.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711061490.6A CN107886061B (en) | 2017-11-02 | 2017-11-02 | Human body behavior recognition method and system based on multi-mode deep Boltzmann machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711061490.6A CN107886061B (en) | 2017-11-02 | 2017-11-02 | Human body behavior recognition method and system based on multi-mode deep Boltzmann machine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107886061A true CN107886061A (en) | 2018-04-06 |
CN107886061B CN107886061B (en) | 2021-08-06 |
Family
ID=61783558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711061490.6A Active CN107886061B (en) | 2017-11-02 | 2017-11-02 | Human body behavior recognition method and system based on multi-mode deep Boltzmann machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107886061B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629380A (en) * | 2018-05-11 | 2018-10-09 | 西北大学 | A kind of across scene wireless signal cognitive method based on transfer learning |
CN108958482A (en) * | 2018-06-28 | 2018-12-07 | 福州大学 | A kind of similitude action recognition device and method based on convolutional neural networks |
CN109063722A (en) * | 2018-06-08 | 2018-12-21 | 中国科学院计算技术研究所 | A kind of Activity recognition method and system based on chance perception |
CN109190550A (en) * | 2018-08-29 | 2019-01-11 | 沈阳康泰电子科技股份有限公司 | Combine the deep neural network multi-source data fusion method of micro- expression multi-input information |
CN109241223A (en) * | 2018-08-23 | 2019-01-18 | 中国电子科技集团公司电子科学研究院 | The recognition methods of behavior whereabouts and platform |
CN110222598A (en) * | 2019-05-21 | 2019-09-10 | 平安科技(深圳)有限公司 | A kind of video behavior recognition methods, device, storage medium and server |
CN110222730A (en) * | 2019-05-16 | 2019-09-10 | 华南理工大学 | Method for identifying ID and identification model construction method based on inertial sensor |
CN110458033A (en) * | 2019-07-17 | 2019-11-15 | 哈尔滨工程大学 | A kind of human body behavior sequence recognition methods based on wearable position sensor |
CN111216126A (en) * | 2019-12-27 | 2020-06-02 | 广东省智能制造研究所 | Multi-modal perception-based foot type robot motion behavior recognition method and system |
CN111401440A (en) * | 2020-03-13 | 2020-07-10 | 重庆第二师范学院 | Target classification recognition method and device, computer equipment and storage medium |
CN111507281A (en) * | 2020-04-21 | 2020-08-07 | 中山大学中山眼科中心 | Behavior recognition system, device and method based on head movement and gaze behavior data |
CN111556453A (en) * | 2020-04-27 | 2020-08-18 | 南京邮电大学 | Multi-scene indoor action recognition method based on channel state information and BilSTM |
CN111680660A (en) * | 2020-06-17 | 2020-09-18 | 郑州大学 | Human behavior detection method based on multi-source heterogeneous data stream |
CN111861275A (en) * | 2020-08-03 | 2020-10-30 | 河北冀联人力资源服务集团有限公司 | Method and device for identifying household working mode |
CN112215136A (en) * | 2020-10-10 | 2021-01-12 | 北京奇艺世纪科技有限公司 | Target person identification method and device, electronic equipment and storage medium |
CN112380976A (en) * | 2020-11-12 | 2021-02-19 | 华东师范大学 | Gesture recognition system and method based on neural network visual touch sensor fusion |
CN113657487A (en) * | 2021-08-16 | 2021-11-16 | 深圳多模智能科技有限公司 | Human body attribute classification method and device based on incremental learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063720A (en) * | 2014-07-03 | 2014-09-24 | 浙江大学 | Method for detecting images of prohibited commodities of e-commerce websites based on deep Boltzmann machine |
CN106778880A (en) * | 2016-12-23 | 2017-05-31 | 南开大学 | Microblog topic based on multi-modal depth Boltzmann machine is represented and motif discovery method |
US20170220854A1 (en) * | 2016-01-29 | 2017-08-03 | Conduent Business Services, Llc | Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action |
-
2017
- 2017-11-02 CN CN201711061490.6A patent/CN107886061B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063720A (en) * | 2014-07-03 | 2014-09-24 | 浙江大学 | Method for detecting images of prohibited commodities of e-commerce websites based on deep Boltzmann machine |
US20170220854A1 (en) * | 2016-01-29 | 2017-08-03 | Conduent Business Services, Llc | Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action |
CN106778880A (en) * | 2016-12-23 | 2017-05-31 | 南开大学 | Microblog topic based on multi-modal depth Boltzmann machine is represented and motif discovery method |
Non-Patent Citations (3)
Title |
---|
CHENG WANG ET AL: "Exploring Multimodal Video Representation for Action Recognition", 《2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS(IJCNN)》 * |
张清辰: "面向大数据特征学习的深度计算模型研究", 《中国博士学位论文全文数据库信息科技辑》 * |
毕盛 等: "基于多传感器信息融合的仿人机器人跌倒检测及控制", 《华南理工大学学报( 自然科学版)》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629380B (en) * | 2018-05-11 | 2021-06-11 | 西北大学 | Cross-scene wireless signal sensing method based on transfer learning |
CN108629380A (en) * | 2018-05-11 | 2018-10-09 | 西北大学 | A kind of across scene wireless signal cognitive method based on transfer learning |
CN109063722A (en) * | 2018-06-08 | 2018-12-21 | 中国科学院计算技术研究所 | A kind of Activity recognition method and system based on chance perception |
CN109063722B (en) * | 2018-06-08 | 2021-06-29 | 中国科学院计算技术研究所 | Behavior recognition method and system based on opportunity perception |
CN108958482A (en) * | 2018-06-28 | 2018-12-07 | 福州大学 | A kind of similitude action recognition device and method based on convolutional neural networks |
CN109241223A (en) * | 2018-08-23 | 2019-01-18 | 中国电子科技集团公司电子科学研究院 | The recognition methods of behavior whereabouts and platform |
CN109241223B (en) * | 2018-08-23 | 2022-06-28 | 中国电子科技集团公司电子科学研究院 | Behavior track identification method and system |
CN109190550A (en) * | 2018-08-29 | 2019-01-11 | 沈阳康泰电子科技股份有限公司 | Combine the deep neural network multi-source data fusion method of micro- expression multi-input information |
CN110222730A (en) * | 2019-05-16 | 2019-09-10 | 华南理工大学 | Method for identifying ID and identification model construction method based on inertial sensor |
WO2020232886A1 (en) * | 2019-05-21 | 2020-11-26 | 平安科技(深圳)有限公司 | Video behavior identification method and apparatus, storage medium and server |
CN110222598A (en) * | 2019-05-21 | 2019-09-10 | 平安科技(深圳)有限公司 | A kind of video behavior recognition methods, device, storage medium and server |
CN110458033B (en) * | 2019-07-17 | 2023-01-03 | 哈尔滨工程大学 | Human body behavior sequence identification method based on wearable position sensor |
CN110458033A (en) * | 2019-07-17 | 2019-11-15 | 哈尔滨工程大学 | A kind of human body behavior sequence recognition methods based on wearable position sensor |
CN111216126A (en) * | 2019-12-27 | 2020-06-02 | 广东省智能制造研究所 | Multi-modal perception-based foot type robot motion behavior recognition method and system |
CN111401440A (en) * | 2020-03-13 | 2020-07-10 | 重庆第二师范学院 | Target classification recognition method and device, computer equipment and storage medium |
CN111507281A (en) * | 2020-04-21 | 2020-08-07 | 中山大学中山眼科中心 | Behavior recognition system, device and method based on head movement and gaze behavior data |
CN111556453A (en) * | 2020-04-27 | 2020-08-18 | 南京邮电大学 | Multi-scene indoor action recognition method based on channel state information and BilSTM |
CN111680660A (en) * | 2020-06-17 | 2020-09-18 | 郑州大学 | Human behavior detection method based on multi-source heterogeneous data stream |
CN111680660B (en) * | 2020-06-17 | 2023-03-24 | 郑州大学 | Human behavior detection method based on multi-source heterogeneous data stream |
CN111861275A (en) * | 2020-08-03 | 2020-10-30 | 河北冀联人力资源服务集团有限公司 | Method and device for identifying household working mode |
CN111861275B (en) * | 2020-08-03 | 2024-04-02 | 河北冀联人力资源服务集团有限公司 | Household work mode identification method and device |
CN112215136A (en) * | 2020-10-10 | 2021-01-12 | 北京奇艺世纪科技有限公司 | Target person identification method and device, electronic equipment and storage medium |
CN112215136B (en) * | 2020-10-10 | 2023-09-05 | 北京奇艺世纪科技有限公司 | Target person identification method and device, electronic equipment and storage medium |
CN112380976A (en) * | 2020-11-12 | 2021-02-19 | 华东师范大学 | Gesture recognition system and method based on neural network visual touch sensor fusion |
CN113657487A (en) * | 2021-08-16 | 2021-11-16 | 深圳多模智能科技有限公司 | Human body attribute classification method and device based on incremental learning |
Also Published As
Publication number | Publication date |
---|---|
CN107886061B (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107886061A (en) | Human bodys' response method and system based on multi-modal depth Boltzmann machine | |
Jalal et al. | A Triaxial acceleration-based human motion detection for ambient smart home system | |
CN107153871B (en) | Falling detection method based on convolutional neural network and mobile phone sensor data | |
CN108062170A (en) | Multi-class human posture recognition method based on convolutional neural networks and intelligent terminal | |
CN106570477A (en) | Vehicle model recognition model construction method based on depth learning and vehicle model recognition method based on depth learning | |
JP6788264B2 (en) | Facial expression recognition method, facial expression recognition device, computer program and advertisement management system | |
CN107784282A (en) | The recognition methods of object properties, apparatus and system | |
CN106127749A (en) | The target part recognition methods of view-based access control model attention mechanism | |
CN105574510A (en) | Gait identification method and device | |
CN108388876A (en) | A kind of image-recognizing method, device and relevant device | |
CN107341452A (en) | Human bodys' response method based on quaternary number space-time convolutional neural networks | |
CN107609572A (en) | Multi-modal emotion identification method, system based on neutral net and transfer learning | |
CN108764059A (en) | A kind of Human bodys' response method and system based on neural network | |
CN106485214A (en) | A kind of eyes based on convolutional neural networks and mouth state identification method | |
CN107423730A (en) | A kind of body gait behavior active detecting identifying system and method folded based on semanteme | |
CN107092894A (en) | A kind of motor behavior recognition methods based on LSTM models | |
CN105069413A (en) | Human body gesture identification method based on depth convolution neural network | |
CN107679462A (en) | A kind of depth multiple features fusion sorting technique based on small echo | |
CN108764066A (en) | A kind of express delivery sorting working specification detection method based on deep learning | |
CN110378208B (en) | Behavior identification method based on deep residual error network | |
CN107423727B (en) | Face complex expression recognition methods based on neural network | |
Liu et al. | Contrastive self-supervised representation learning for sensing signals from the time-frequency perspective | |
CN102024145A (en) | Layered recognition method and system for disguised face | |
CN107423721A (en) | Interactive action detection method, device, storage medium and processor | |
WO2021004510A1 (en) | Sensor-based separately deployed human body behavior recognition health management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |