CN108764059B

CN108764059B - Human behavior recognition method and system based on neural network

Info

Publication number: CN108764059B
Application number: CN201810422265.9A
Authority: CN
Inventors: 李争彦; 岳文静; 陈志�; 周传; 张佩迎; 邹子昕
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2021-01-01
Anticipated expiration: 2038-05-04
Also published as: CN108764059A

Abstract

The invention discloses a human behavior recognition method based on a neural network, and solves the problem that the accuracy of human behavior recognition of wearable sensor data is not high enough. The invention firstly adopts gray processing to the image information collected by the wearable image sensor, then carries out histogram equalization processing to the image data, and carries out scene recognition to the processed image information of the sensor by using an LSTM-RNN neural network algorithm. And for the motion data input of the wearable motion sensor, performing motion recognition by using the acceleration information of the motion sensor by using an LSTM-RNN neural network algorithm. And matching the motion sequence marked by the scene in a behavior database to obtain specific behavior information. The user is notified of emergency information using an alert module. The invention completes the behavior recognition to the data of the human body wearing sensor through the application of the methods and the support of the matched module in the system, can improve the accuracy and stability of the human body behavior recognition, and has good implementation and effectiveness.

Description

Human behavior recognition method and system based on neural network

Technical Field

The invention relates to a human behavior recognition method based on a neural network, belongs to the cross technical field of behavior recognition, sensor technology, machine learning and the like, and further relates to a multi-sensor multi-module human behavior recognition system with an interaction function.

Background

An important research topic in the field of computer vision for behavior recognition and classification in videos has important theoretical significance and practical application value.

With the development of economic society and the progress of science and technology in China, the recognition, analysis and understanding of tasks in videos become important contents in the fields of social science and natural science, and the daily behaviors of human bodies are closely related to human body function indexes. For example, the rest time can be calculated by monitoring the exercise behavior of the lying human body, and the energy consumption can be calculated by monitoring the behaviors of walking, running and the like. This represents the application of this technology in the fields of sports and life health, etc. In addition, the technology has wide application in a plurality of fields such as security monitoring, smart city construction and the like. Tracking of dynamic objects and high dimensional data processing, etc. are more complex and thus more challenging than analyzing data from an image sensor or a single axis acceleration sensor alone.

The existing person behavior recognition mainly uses two kinds of sensor input, one is to use a single or multiple wearable devices to acquire the acceleration and displacement information of the human body. The other method is to use a single or a plurality of image sensors to acquire the video information of the person and output the motion information of the human body by carrying out pattern matching or neural network judgment on the video information. These two approaches have respective advantages and disadvantages: data obtained by a gyroscope and an acceleration sensor are difficult to analyze and cannot be correctly matched with a character scene; and the data collected by the image sensor is high in environmental fluctuation, and the precision is not high in the process of behavior matching.

The motion recognition problem can be considered as a classification problem, and many methods in classification are designed for motion recognition, specifically using Logistic regression analysis, decision tree models, naive bayes classifiers, and support vector machines. These methods have both advantages and disadvantages in practical applications.

For the research of human body behavior system, the technology adopted at home and abroad is not mature. Most human behavior recognition systems rely on manual marking of data, and then place the data into a model for recognition. The method has strong dependence on data, has low system operation efficiency, and is not suitable for the requirements of industrialization and commercialization.

So far, a great deal of research work is required for the human behavior recognition method and system in the sensor data.

Disclosure of Invention

The technical problem is as follows: the technical problem to be solved by the invention is to collect multi-sensor motion information and image information through a set of system, extract scene information and action information, fuse the scene information and the action information, and improve the accuracy of human behavior recognition by using a neural network algorithm.

The technical scheme is as follows: the invention relates to a human behavior recognition method system based on a neural network, which comprises the following steps:

the method adopts a plurality of sensors and a plurality of modules, and comprises the following steps:

step 1) acquiring continuous image information taking a wearer as a center by starting an image sensing module worn by a monitored person, wherein the image is processed into information on an n x m area, wherein n is the number of pixels of each row in the transverse direction of the image, and m is the number of pixels of each column in the longitudinal direction of the image;

step 2) acquiring and recording x-axis acceleration, y-axis acceleration and z-axis acceleration information of the movement information of the wearer by opening a movement sensing module worn by the monitored person, wherein the x axis is vertical to the vertical direction of the human body and the positive axis points to the front, the y axis is parallel to the vertical direction of the human body and the positive axis points to the head, and the z axis is vertical to the vertical direction of the human body and the positive axis points to the left side of the human body;

and 3) receiving the information from the image sensor by the scene recognition module, carrying out image graying and equalization on the received image information of the n-x-m pixel region and the original image, and reducing errors and disturbance of the transmitted image. Taking the n × m pixel region image information after the graying and equalization processing as the input of a neural network, and carrying out scene classification by using a long-time memory-based recurrent neural network (LSTM-RNN);

step 4), the action evaluation module receives information from the motion sensor, and stores x-axis acceleration, y-axis acceleration and z-axis acceleration of the received wearer motion information as a three-dimensional vector group V { (x)₁,y₁,z₁),(x₂,y₂,z₂),...,(x_n,y_n,z_n) }; definition of t_mFor time granularity, take t_mTime-based three-dimensional vector groups as inputs to neural networks, trained using motion data sets, based on long timeShort-time memory recurrent neural network (LSTM-RNN) carries out action evaluation to obtain atomic actions in the time granularity;

step 5) the micro intelligent server module receives information from the scene recognition module and the action evaluation module, integrates the information and the information, and performs action recognition; get k_mA successive atomic action, i.e. with k_mt_mFor identifying a time unit, using a scene element as a label, searching a sub-table in a motion information database, performing fuzzy matching, and if the matching is successful, returning a behavior identification result to the micro intelligent server;

and 6) classifying the behavior recognition results of the behavior recognition module by the micro intelligent server module according to user settings. And identifying whether the currently occurring action reaches a warning level according to the use requirement and setting of the user. If the event reaches the warning level, the micro intelligent server sends a warning command to the warning module. The warning module gives a warning to the user in one or more ways;

and 7) the system interface is an entrance for the user to adjust and set the system, and the user can set specific server configuration through the interface to bind the image sensor and the motion sensor to the system. Monitoring and observing the running state of the system, and setting the running mode of the system;

and 8) monitoring and recording all running states of the micro intelligent server by the running recording module, recording warning information of different levels in the running process of the system, and storing the running records through a relational database. And the user inquires the data of the operation recording module through a system interface to maintain the system.

Wherein the content of the first and second substances,

the step 3) is as follows:

and step 31) the scene identification module receives image information of the n-x-m pixel area from the image sensor and describes the scene bottom layer characteristics mainly by preprocessing the video scene and identifying the scene. In order to facilitate the subsequent scene recognition, graying and equalization processing are required to be carried out on a scene image;

step 32) graying the original scene image. For eachBecause human eyes have different sensitivity degrees to red light, green light and blue light, different weights are given to the pixel by adopting a weighted average method, and the Gray value Gray of the pixel is obtained as c_rR+c_gG+c_bB, wherein c_r,c_g,c_bThe weight in the conversion of red, green and blue light, respectively, c_r+c_g+c_b＝1；

Step 33) histogram equalization processing is performed on the original scene image. In the case of insufficient light, scene recognition may have a large error, and thus histogram equalization is required for an image to improve contrast and brightness. The histogram equalization is a method for adjusting contrast using an image histogram in the field of image processing. Defining a gray level of

Where T (r) is a gray level transformation function, r_kIs the kth gray level, p_rIs a gray level r_kApproximation of probability of occurrence, n_kIs a gray level r in the image_kN-1, n being the total number of pixels in the image;

step 34) is subjected to preprocessing of step 32) and step 33), and graying and equalization are achieved on the scene image. Training a Recurrent Neural Network (RNN) on a plurality of data sets, and outputting the RNN as an extracted scene feature vector by using a full connection layer of the RNN;

and step 35) carrying out scene feature classification by using an LSTM type RNN neural network. First, the candidate memory cell value c at the current time is calculated_t＝tanh(W_xcx_t+W_hch_t-1+b_c) Where tanh is the hyperbolic tangent function, x_tFor the current input data, h_t-1For the last time LSTM cell output value, W_xc、W_hcRespectively representing the corresponding input data and the weight of the last-moment LSTM unit output, b_cIs an offset;

step 36) uses the input gate to control the effect of the current data input on the memory cell state value. Input gate

Where σ is the excitation function, c_t-1The candidate memory cell value at the previous time is,

and

respectively representing the weight of the memory cell values corresponding to the input gate data, the last-moment LSTM cell input gate and the last-moment input gate, b_iIs an offset;

step 37) controls the influence of the history information on the current memory unit state value using the forgetting gate. Left behind door is

Wherein the content of the first and second substances,

respectively representing the weight values of the memory unit values corresponding to the forgetting gate, the last moment LSTM unit forgetting gate and the last moment forgetting gate, b_iIs an offset;

step 38) calculating the state value c of the memory cell at the current time_t＝f_t⊙c_t-1+i_t⊙c_tWherein, <' > indicates a dot-by-dot product, the memory cell state update is dependent on the self state c_t-1And the current candidate cell value c_tThe two factors are respectively adjusted through an input door and a forgetting door;

step 39) the output gate is used to control the output of the state value of the memory unit, and the output gate is defined as

Wherein the content of the first and second substances,

respectively representing the weights of the memory cell values corresponding to the output gate, the output gate of the last-moment LSTM cell and the output gate of the last-moment LSTM cell, b_oIs an offset;

step 310) calculating the output h of the LSTM cell_t＝o_t⊙tanh(c_t) The LSTM network is trained by adopting a back propagation algorithm through time;

the step 4) is as follows:

step 41) the motion evaluation module receives three-dimensional information from the motion sensor, and pre-processes acceleration x, y and z three-axis data of the sensor, which are equivalent to three channels of RGB in an image; definition of t_mFor time granularity, take t_mThe three-dimensional vector group in time is used as the input of the neural network;

step 42) training the recurrent neural network RNN on a plurality of data sets, and outputting the action characteristic vector as the extracted action characteristic vector by using the full connection layer of the recurrent neural network;

step 42) performing action evaluation using an LSTM type RNN neural network. Step 35) to step 310) are executed, and t is output_mAnd evaluating the action in the time period.

The step 5) is as follows:

step 51) the micro intelligent server module receives the information from the scene recognition module and the action evaluation module, and k is taken_mt_mTo identify time units, at k_mt_mWithin time, the scene recognition result is unchanged, and a motion sequence set is obtained

Wherein the Action_iIs k_mt_mThe ith atomic action in time;

and step 52) retrieving the sub-tables in the motion information database by taking the scene elements as labels, carrying out fuzzy matching, and returning the behavior recognition result to the micro intelligent server if the matching is successful.

Wherein

In the step 1), n is 352 empirically obtained, and m is 288 empirically obtained.

Step (ii) of32) In (c)_rEmpirically taken 0.30, c_gEmpirically taken 0.59, c_bEmpirically 0.11 was taken.

In step 36), σ is a logistic sigmoid function empirically derived.

In the step 4), t_mTake 3 empirically.

In said step 5), k_mTake 5 empirically.

Has the advantages that: compared with the prior art, the invention adopting the technical scheme has the following technical effects:

the invention adopts gray processing and histogram equalization processing to the image information of the image sensor, uses LSTM-RNN neural network algorithm to identify the scene of the processed image information of the sensor, and uses LSTM-RNN neural network algorithm to identify the action of the acceleration information of the motion sensor. And matching the motion sequence marked by the scene in a behavior database to obtain specific behavior information. And notifies the user of emergency information using an alarm module. Through user interface adjustments and system operational settings and states. And monitoring and recording the running state of the whole system by using a running recording module. The application of the methods and the support of the matched modules are used for completing behavior recognition on data of the sensor worn by a human body, and the method has good accuracy and stability, and particularly comprises the following steps:

(1) the invention uses the LSTM-RNN neural network algorithm, can effectively consider the influence of the motion continuity on the motion recognition, has close context relation to the sensor data, and increases the accuracy of the motion behavior recognition.

(2) The invention combines the scene information and the action information into consideration, and matches the action sequence in the action database by using the scene information as a label, thereby more accurately finishing the identification of human actions.

(3) The invention provides an effective and strong-practicability system structure, and the user interface module and the operation recording module are configured, so that the stability of human behavior recognition is improved, and the specific application of the system structure in industry is facilitated.

Drawings

Fig. 1 is a flow of a human behavior recognition method based on a neural network.

Fig. 2 is a system function block diagram of a human behavior recognition method based on a neural network.

FIG. 3 is a schematic diagram of an LSTM-RNN neural network.

Detailed Description

The technical scheme of the invention is further explained in detail by combining the attached drawings:

in specific implementation, fig. 1 is a flow of a human behavior recognition method based on a neural network. The present example assumes that the system is applied for health care purposes. The user sets the system for home health monitoring through a system interface, binds a set of sensor equipment into the system and sets a warning level and a warning mode.

Firstly, a set of sensor system is worn by an identified person, and comprises an image identification sensor and a motion information sensor, wherein an image sensing module and a motion sensing module are connected with a routing node through a low-power ZigBee communication mode based on IEEE802.15.4. The routing node is connected with a host through a USB serial port, and transmits the captured image and motion data to a scene recognition module and a motion evaluation module which exist in the form of software.

The scene recognition module processes the image transmitted by the image sensing module, and firstly, grays the original scene image. For each pixel point, because the sensitivity of human eyes to red light, green light and blue light is different, different weights are given to the pixel points by adopting a weighted average method, and the Gray value Gray of the pixel point is obtained as c_rR+c_gG+c_bB, wherein c_rTake 0.30, c_gTake 0.59, c_bTake 0.11.

The scene recognition module performs histogram equalization processing on the original scene image after graying the image. In defining the function of gray-level transformation

Wherein r is_kIs the kth gray level, n_kIs a gray level r in the image_kThe number of pixels (k is 0,1, 2.),n-1), n being the total number of pixels in the image.

And the scene recognition module takes the 352 × 288 image data after the graying and equalization processing as the input of the neural network, and performs scene classification by using the LSTM type RNN neural network to obtain the scene label of the image.

The action evaluation module receives information from the motion sensor, and stores x-axis acceleration, y-axis acceleration and z-axis acceleration of the received wearer motion information as a three-dimensional vector group V { (x)₁,y₁,z₁),(x₂,y₂,z₂),...,(x_n,y_n,z_n) Get t_mAnd (3) taking the three-dimensional vector group within 3s as the input of the neural network, and performing action evaluation by using an LSTM type RNN neural network to obtain the atomic action within 3 s.

In the embodiment, the micro intelligent server, the scene recognition module and the action evaluation module are positioned on the same host, the micro intelligent server obtains an atomic action sequence and a scene mark with the length of 5 every 15s, and the mark is matched with a sub-table of human behavior data to obtain a behavior with the highest matching degree.

The micro intelligent server detects the warning level set by the behavior in the home monitoring system, and if the behavior is identified as wrestling, the server dials an alarm phone and sends an alarm short message to inform a client; if the result is identified as the security level behaviors such as watching TV, the alarm function of the alarm module is not started. And finally, the operation recording system records the atomic action sequence, the specific scene and the behavior recognition result in 15s to the database.

FIG. 3 is a concrete structure of each cell unit of the LSTM RNN neural network, and a core module in the model consists of an input gate, a forgetting gate, an output gate and a memory unit. Inputting action characteristic sequence X at time point t_tObtaining the hidden layer h by calculation_t＝o_t⊙tanh(c_t) And cell status

To the next LSTM network. The LSTM type RNN neural network has a chain form of repeating modules,the model uses a tanh layer for linking.

Claims

1. A human behavior recognition method based on a neural network is characterized in that a multi-sensor module is adopted, and the method comprises the following steps:

step 3) the scene recognition module receives information from the image sensing module, performs image graying and equalization on the received image information of the n × m pixel region and the original image, and reduces errors and disturbances of the transmitted image; taking the n × m pixel region image information after the graying and equalization processing as the input of a neural network, and carrying out scene classification by using a long-time memory-based recurrent neural network LSTM-RNN;

step 4), the action evaluation module receives the information from the motion sensing module, and stores the x-axis acceleration, the y-axis acceleration and the z-axis acceleration of the received wearer motion information as a three-dimensional vector group V { (x)₁，y₁，z₁)，(x₂，y₂，z₂)，...，(x_n，y_n，z_n) }; definition of t_oFor time granularity, take t_oThe three-dimensional vector group in time is used as the input of a neural network, motion evaluation is carried out by utilizing a motion data set which is trained and based on a long-and-short memory recurrent neural network LSTM-RNN, and atomic motion in the time granularity is obtained;

step 5) the behavior recognition module receives the information from the scene recognition module and the action evaluation module, and integrates the information and the informationPerforming behavior recognition; get _okA succession of atomic actions, i.e. with _okt_oFor identifying a time unit, using a scene element as a label, searching a sub-table in a motion information database, performing fuzzy matching, and if the matching is successful, returning a behavior identification result to the micro intelligent server;

step 6), the micro intelligent server module classifies the behavior recognition results of the behavior recognition module according to user settings, and recognizes whether the currently occurring action reaches a warning level according to the use requirements and settings of the user; if the event reaches the warning level, the micro intelligent server sends a warning command to the warning module; the warning module gives a warning to the user in one or more ways;

step 7), a system interface is an entrance for adjusting and setting the system by a user, and the user sets specific server configuration through the interface and binds the image sensing module and the motion sensing module to the system; monitoring and observing the running state of the system, and setting the running mode of the system;

step 8) the operation recording module monitors and records all operation states of the micro intelligent server, records warning information of different levels in the operation process of the system, stores the operation records through a relational database, and a user inquires data of the operation recording module through a system interface to maintain the system;

wherein:

the step 3) adopts a multi-sensor module, which comprises the following specific steps:

step 31) the scene recognition module receives image information of an n × m pixel region from the image sensing module, the bottom layer characteristics of a scene are described by preprocessing a video scene and recognizing the scene, and the scene image needs to be grayed and equalized for the convenience of subsequent scene recognition;

step 32) carrying out Gray processing on the original scene image, and for each pixel point, because the sensitivity of human eyes to red light, green light and blue light is different, giving different weights to the pixel point by adopting a weighted average method to obtain the Gray value Gray of the pixel point as c_rR+c_gG+c_bB, wherein c_r，c_g，c_bThe weight in the conversion of red, green and blue light, respectively, c_r+c_g+c_b＝1；

Step 33) histogram equalization processing is carried out on the original scene image, and under the condition of insufficient light, large errors can occur in scene identification, so that the image needs to be histogram equalized to improve the contrast and the brightness; the histogram equalization is a method for adjusting contrast by using an image histogram in the field of image processing, and the equalization result of the ith gray level is defined as

Wherein T (r)_i) Is a transformation function for the ith gray level, L is the number of gray levels of the image, j is an enumeration variable, and the value is from 0 to i, r_jIs the jth gray level, p_jIs a gray level r_jApproximation of probability of occurrence, n_jIs a gray level r in the image_jP is the total number of pixels in the image;

step 34) realizing graying and equalization of the scene image through the preprocessing of the step 32) and the step 33), then training a plurality of data sets to complete the recurrent neural network RNN, and outputting the scene characteristic vector as the extracted scene characteristic vector by using the full connection layer of the recurrent neural network;

step 35) using LSTM type RNN neural network to classify scene features, firstly calculating the candidate memory unit value c of the current time_t＝tanh(W_xcx_t+W_hch_t-1+b_c) Where tanh is the hyperbolic tangent function, x_tFor the current input data, h_t-1For the last time LSTM cell output value, W_xc、W_hcRespectively representing the corresponding input data and the weight of the last-moment LSTM unit output, b_cIs an offset;

step 36) controlling the effect of the current data input on the state value of the memory cell using an input gate, the input gate

and

step 37) using a forgetting gate to control the influence of the history information on the current state value of the memory unit, wherein the forgetting gate is

Wherein the content of the first and second substances,

respectively representing the weight values of the memory unit values corresponding to the forgetting gate, the last-moment LSTM unit forgetting gate and the last-moment forgetting gate, b_iIs an offset;

Wherein the content of the first and second substances,

respectively representWeight values of memory cell values corresponding to output gate, last-moment LSTM cell output gate and last-moment output gate, b_oIs an offset;

the step 4) adopts a multi-sensor module, which comprises the following specific steps:

step 41) the motion evaluation module receives three-dimensional information from the motion sensing module, and the three-axis data of the acceleration x, y and z of the sensor is preprocessed, wherein the three-axis data are equivalent to three channels of RGB in an image; definition of t_mFor time granularity, take t_mThe three-dimensional vector group in time is used as the input of the neural network;

step 43) using the LSTM type RNN neural network to perform action evaluation, and executing the step 35) to the step 310) to output t_mAnd evaluating the action in the time period.

2. The human behavior recognition method based on the neural network as claimed in claim 1, wherein the step 5) adopts a multi-sensor module, specifically as follows:

Wherein the Action_iIs k_mt_mThe ith atomic action in time;

3. The method for recognizing human body behaviors based on the neural network as claimed in claim 1, wherein in the step 1), n is 352 empirically obtained, and m is 288 empirically obtained.

4. The method for human behavior recognition based on neural network as claimed in claim 1, wherein in step 32), c_rEmpirically taken 0.30, c_gEmpirically taken 0.59, c_bEmpirically 0.11 was taken.

5. The human body behavior recognition method based on the neural network as claimed in claim 1, wherein σ is a logistic sigmoid function obtained empirically in step 36).

6. The human body behavior recognition method based on the neural network as claimed in claim 1, wherein in the step 4), t is_mTake 3 empirically.

7. The human body behavior recognition method based on the neural network as claimed in claim 1, wherein in the step 5), k is_mTake 5 empirically.