CN110781961B

CN110781961B - Accurate behavior recognition method based on decision tree classification algorithm

Info

Publication number: CN110781961B
Application number: CN201911025926.5A
Authority: CN
Inventors: 张玉成; 王振; 姚永康; 聂文都
Original assignee: Xijing University
Current assignee: Xijing University
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2024-02-23
Anticipated expiration: 2039-10-25
Also published as: CN110781961A

Abstract

The invention discloses an accurate behavior recognition method based on a decision tree classification algorithm, which comprises the following steps: s1, data collection: solving the label position by adopting a Chan algorithm based on TDOA; s2, extracting features: selecting characteristic values and partitioning characteristic value intervals; s3, behavior recognition: and establishing a user behavior recognition model. According to the method for realizing behavior recognition based on the decision tree algorithm, firstly, the sensor is used for acquiring user data, then the extracted characteristic value is recognized, and finally, behavior classification experiment verification is realized by adopting the decision tree algorithm. Experimental results show that the algorithm has good daily behavior recognition performance under specific conditions.

Description

Accurate behavior recognition method based on decision tree classification algorithm

Technical Field

The invention relates to a behavior recognition method, in particular to an accurate behavior recognition method based on a decision tree classification algorithm.

Background

Currently, with the continuous progress of wireless technologies such as bluetooth Zigbee UWB (ultra wideband), wireless network and the like, the explosion of wireless sensor networks for ad hoc networks and internet of things has attracted extensive attention in academia. Behavior recognition is widely recognized as an indispensable key technology. The behavior identification information facilitates pre-warning in network emergency situations, decision-making during events, and post-processing of incidents. Behavior recognition technology plays a decisive role in the further development of wireless networks. Therefore, research into behavior recognition technology is particularly important.

In indoor locations, it is difficult to accurately analyze parameters such as signal arrival time or arrival angle because the signal propagation environment is more complex than that of outdoor. However, with the continued advancement and development of wireless sensor networks WSNs, academic research has not been limited to traditional indoor positioning and location awareness. Currently, many fields have begun to utilize radio for position sensing, among which rapid development and maturation of position sensing technology based on UWB (ultra wide band) radar systems are most prominent. In the reference, a UWB channel based on human body shielding is provided, and the influence of the human body shielding on TOA ranging error is studied by measuring and analyzing the ranging error of TOA. At present, the novel indoor positioning technology based on commercial Wi-Fi equipment has good development advantages in all aspects. Such as indoor intrusion detection, campus security, market personnel detection, patient monitoring, and real-time detection of elderly people and children in the home.

The frequency band of the 2.4GHz wireless network is similar to Bluetooth, and the positioning method is also affected by the environment and inaccurate in data when encountering obstacles or electromagnetic interference. Compared with the characteristics of various positioning technologies, UWB is a long-term research hotspot in the field of domestic and foreign radio frequency communication. Many existing behavior awareness methods employ image processing methods. And extracting low-level features by utilizing image information, identifying human body movement and constructing a human body movement mode. But the disadvantage is that the extraction amount of the characteristic value is large, and the safety and privacy of the user are seriously threatened. Therefore, more and more technologies replace image processing methods with sensors that are perceptively small in size, inexpensive to deploy, and simple to tamper with. The existing algorithm adopts big data parallel classification to guide the power supply mode, but does not consider the energy consumption problem; some methods refine the original classification data and propose the maximum attribute index algorithm in the same-level conceptual refinement. The privacy budget is more reasonably allocated between different levels using a hierarchical geometric allocation mechanism. However, data distribution cannot be implemented in a dynamic data environment. Some methods improve the objective function of the decision tree generation algorithm, so that inconsistent data can be classified, and the influence factors of the function can be directly adjusted, so that the node segmentation of the decision tree is more accurate, and the classification effect is better. Some methods use a compression strategy selected based on HBase data classification. However, the data processing procedure is relatively complex.

Disclosure of Invention

The invention mainly aims to provide an accurate behavior recognition method based on a decision tree classification algorithm.

The technical scheme adopted by the invention is as follows: an accurate behavior recognition method based on a decision tree classification algorithm comprises the following steps:

s1, data collection: solving the label position by adopting a Chan algorithm based on TDOA;

s2, extracting features: selecting characteristic values and partitioning characteristic value intervals;

s3, behavior recognition: and establishing a user behavior recognition model.

Further, the step S1 specifically includes:

considering positioning accuracy and equipment cost, 4 base stations are selected more appropriately; in the two-dimensional plane rectangular coordinate system, the coordinate of the ith base station is B _uwb,i ＝[x _i ,y _i ] ^T (i=1, 2, …, 5) the coordinates of the tag are T _uwb ＝[x ₀ ,y ₀ ] ^T A non-line of sight between the base station and the tag is R _i ＝||B _uwb,i -T _uwb || ₂ (i=1, 2, …, 5); taking a first base station as a common reference node to obtain a group of TDOA observation values delta t _i,1 (i=2, 3,4, 5) indicating a signal arrival time difference between the i-th base station and the first base station;

in the case of this model of the present invention,is Deltat _i,1 True value of n _i,1 Measured by systematic error, NLOS error is n _NLOS,i ；

Let the signal propagation velocity be c and calculate R _i,1 Difference between the distances from the mark to the i-th base station and the first base station:

R _i,1 ＝c·Δt _i,1 (i＝2,3,4,5) (2)

establishing three hyperbolic equations R according to hyperbolic characteristics _i,1 ＝R _i -R ₁ (i=2, 3,4, 5), regarding T _uwb Can be established as shown in formula (3);

adopting a label positioning architecture of 4 base stations, taking one base station as a main part, and the rest 3 base stations; when a tester carrying a locating tag enters a test area, signals sent by the tag are received by one or more sensors; decoding signals from the sensors that transmit angle of arrival and timing information, and then transmitting these data to the master sensor; the main sensor collects all information sent by the base station to calculate the position information of the tag, so that data collection is realized; the sensor transmits data every second through the switch and the server, the data adopts a UDP data packet format, and the server receives the UDP data packet and can acquire the X, Y coordinate information specific to the tag.

Further, the step S2 specifically includes:

the feature value selection includes: position division, head-shoulder waist-knee height processing and distance movement in unit time of head-shoulder waist and knee joint;

the location division includes:

in real life, the position of a user has a certain relation with the behavior activity of the user; spatial locations fall into three categories: first, the user can sit in a place where he lies for rest; the second type is a distance of 0.1 to 0.3 meters from the region, denoted Da, depending on the object; the remaining space is of the third class, denoted La;

the head, shoulder, waist and knee height treatment comprises head height, shoulder height, waist height and knee height;

z-axis data of the head, shoulder, waist and knee joints of a user represent the height of a user space and are directly read from tag coordinates;

the distance moves within unit time of the head, shoulder, waist and knee joints and comprises the distance from the head to the shoulder to the waistline and the knee;

direct calculation of the distance between the head, shoulder and waist of the user and the knee joint in unit time is difficult to achieve, mainly because the unit time is difficult to determine; the excessive numerical value influences the displacement calculation result, so that the user behavior cannot be accurately described, and the accuracy of the user behavior recognition error is reduced; too small a value increases the amount of computation due to delay overhead; through multiple experiments, the optimal unit time of weighing accuracy and operation delay is LS;

the partition characteristic value interval comprises:

after determining the above-mentioned feature values, classification boundaries must be determined to ensure similarity between data and intra-category, inter-category differences; combining experiments, and processing the mixture by adopting a layering classification method; classification focuses on how to determine the boundaries of each level; currently, determining boundary values is two algorithms: length equivalent method and distributed equivalent method;

set the range of eigenvalues phi= [ c ] _min ,c _max ]The method is divided into N grades, and grade labels are 1-N; from the value range phi= [ c _min ,c _max ]A range of sensor values r=c can be obtained _max -c _min The method comprises the steps of carrying out a first treatment on the surface of the To ensure that the length of each interval within the region is the same; then calculating to obtain the length of each interval as r=r/N; thus, the value range of each section can be determined.

Further, the step S3 specifically includes:

let D be the training tuples divided by category, then the entropy of D is expressed as:

wherein p is _i Representing the probability that the ith class appears in the entire training tuple, can be estimated by dividing the number of elements belonging to that class by the total number of elements in the training tuple; the actual meaning of entropy represents the average amount of information required for tuple class labels in D;

let the training tuple D divided by the attribute A, the expected information for the D partition is:

the information delta is the difference between them:

gain(A)＝inf o(D)-inf o _A (D) (6)

establishing a user behavior recognition model; the specific behavior recognition steps are as follows:

s31, classifying and collecting various behaviors according to a training data set, and dividing a training tuple into entropy info (D) of a training tuple;

s32, extracting position height characteristic values from data preprocessing, calculating characteristic value intervals and dividing the characteristic values;

s33, obtaining information gain (A) about the characteristic value of the partition before the prediction in the step 2;

s34, the difference between the information increment is caused by the expected information difference;

when the maximum value of the output is reached, memorizing the maximum gain as the maximum memory; the delta information is then the corresponding behavior information.

The invention has the advantages that:

according to the method for realizing behavior recognition based on the decision tree algorithm, firstly, the sensor is used for acquiring user data, then the extracted characteristic value is recognized, and finally, behavior classification experiment verification is realized by adopting the decision tree algorithm. Experimental results show that the algorithm has good daily behavior recognition performance under specific conditions.

In addition to the objects, features and advantages described above, the present invention has other objects, features and advantages. The present invention will be described in further detail with reference to the drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a diagram of a behavior recognition framework of the present invention;

FIG. 2 is a user body part compartmentalization of the wait for test of the present invention;

FIG. 3 is a schematic representation of an experimental environment of the present invention;

fig. 4 is a comparison graph of behavior recognition error rates of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, as shown in fig. 1, a method for identifying accurate behaviors based on a decision tree classification algorithm includes the following steps:

s3, behavior recognition: and establishing a user behavior recognition model.

The step S1 specifically includes:

R _i,1 ＝c·Δt _i,1 (i＝2,3,4,5) (2)

three hyperbolas are established according to hyperbolic characteristicsEquation R _i,1 ＝R _i -R ₁ (i=2, 3,4, 5), regarding T _uwb Can be established as shown in formula (3).

The step S2 specifically includes:

the location division includes:

in real life, the position of a user has a certain relation with the behavior activity of the user; for example, on a sofa, a user may sit or lie down; in hallways, when a user falls or lies near an item at home, they may run while walking, which is easier to fall than if they were near an item at home; thus, spatial locations fall into three categories: first, the user can sit in a resting area, such as a sofa, a bed, a chair, etc., recorded as Ra (reserra); the second type is a distance of 0.1 to 0.3 meters from the region, denoted Da (distanceaera), depending on the object; the remaining space is of the third class, denoted La (Last aero); the division for measuring the body parts of the user is shown in fig. 2;

the head-shoulder-waist-knee height treatment includes (head height, shoulder height, waist height, knee height);

z-axis data of the head, shoulder, waist and knee joints of a user represent the height of a user space and can be directly read from tag coordinates;

the distance moves within a unit time of the head-shoulder waist and knee joint includes (the distance of the head from the shoulder from the waistline knee);

the partition characteristic value interval comprises:

set the range of eigenvalues phi= [ c ] _min ,c _max ]The method is divided into N grades, and grade labels are 1-N; from the value range phi= [ c _min ,c _max ]A range of sensor values r=c can be obtained _max -c _min The method comprises the steps of carrying out a first treatment on the surface of the To ensure that the length of each interval within the region is the same; then calculating to obtain the length of each interval as r=r/N; thus, the value range of each section can be determined. For example, the value range of the i-th section is [ c _min +(i-1)r,c _min +ir]。

The step S3 specifically includes:

the decision tree algorithm is a method for approximating discrete function values, which is a typical classification method, and the basic idea is to process data firstly, then generate readable rules and decision trees by using a generalization algorithm, and then analyze new data by using a decision; essentially, a decision tree is a process of classifying data by a series of rules; common decision tree classification algorithms include ID3, C4.5, CART, etc.; the smaller the expected information, the larger the information gain and the higher the purity, while the core of the ID3 algorithm thinks to choose the information gain as an attribute; the ID3 algorithm is used for identification. Let D be the training tuples divided by category, then the entropy of D is expressed as:

now suppose that training tuple D is divided by attribute A, then the expected information for the D partition is:

the information delta is the difference between them:

gain(A)＝inf o(D)-inf o _A (D) (6)

the ID3 algorithm calculates the gain rate of each attribute every time the division is needed, and then selects the attribute with the maximum gain rate for division; therefore, as long as the maximum gain ratio can be found, the optimal dividing effect can be obtained;

according to the analysis, the abstract description of the user behavior is a behavior classification model and is also the basis of behavior recognition; based on the decision tree classification algorithm, a user behavior recognition model is established; the specific behavior recognition steps are as follows:

To verify the performance of the algorithm in behavior recognition, a scenario experiment was performed and three scenarios were selected from the office conference room and laboratory. The resulting behavior recognition accuracy is compared to other algorithms. In the experiment, researchers held the positioning tag, simulating six basic actions of sitting, standing, falling, lying, walking and running. The experimental environment plan is shown in fig. 3:

during the measurement, there may be many data different from the actual behavior to be eliminated during the preprocessing due to various behavioral instabilities simulated by the human hand tag and instability of the sensor itself. Finally, as shown in table 1, valid data is provided.

TABLE 1 effective dataset

The region indicates that three features of sitting, lying and walking are three features respectively. Y stands for user behavior, 1 for sitting, 2 for standing, 3 for falling, 4 for lying, 5 for walking, 6 for running. The units of height and distance are meters.

Through algorithm analysis, the accuracy of each behavior recognition is found to be classified into three types. All the characteristic values participate in class 1, and the identification result is shown in table 2; the second category does not contain the location feature recognition results, which are shown in table 3; the third class does not contain high feature recognition results, which are shown in table 4.

TABLE 2 behavior correctness of all eigenvalues (%)

It can be seen from table 2 that both standing and lying behaviors are most easily identified, since both states are more easily identified and the tag data features are obvious, i.e. there is no significant change in height and position. Namely, the moving distance per unit time is 0; the recognition effects of sitting and walking states are relatively different, and the movement positions and heights of the people are slightly changed and the characteristic values are relatively close; because the range of variation is also better distinguished.

TABLE 3 accuracy of behavior recognition except for position eigenvalues (%)

As can be seen from table 3, the recognition rate of each behavior is lower than that of all the eigenvalues; standing and lying behaviors are still the easiest to identify; the standing and sitting posture recognition effect is not obvious and is influenced by the position characteristic value; however, knowledge of the decline and the operating state remains apparent.

TABLE 4 accuracy of behavior recognition except for height eigenvalues (%)

In table 4, it is shown that standing and lying behaviors are most easily recognized because both states are less affected by the height characteristic value, and walking and sitting recognition rates are significantly reduced.

Compared to tables 3 and 4, the accuracy of behavior recognition is significantly higher than the position eigenvalues due to the influence of the height eigenvalues. To further demonstrate the best performance of the decision tree algorithm, the present invention compares naive bayes networks (NBN, naive Bayesiannetwork), random Forest (RF), random Forest (KNN, K-Nearest Neighbor). The behavior recognition error rates were compared with data sets D and 50 100 150 200, respectively, as shown in fig. 4 below:

according to fig. 4, the invention shows that the error rate of the decision tree algorithm DR is significantly lower than the other three algorithms, especially with the increase of the data set, the performance is better and the error rate is lower. When the algorithm is greatly influenced by the sensor, the KNN algorithm has the largest error rate due to the influence on the sensor; when d=100, the algorithm is similar to RF, and the KNN algorithm error rate is maximum; when d=100, the error rate of the RF algorithm is maximum, NBN is the second; the error rate of the RF algorithm is greatest when d=200.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. The accurate behavior recognition method based on the decision tree classification algorithm is characterized by comprising the following steps of:

s3, behavior recognition: establishing a user behavior recognition model;

the step S1 specifically includes:

selecting 4 base stations; in the two-dimensional plane rectangular coordinate system, the coordinate of the ith base station is B _uwb,i ＝[x _i ,y _i ] ^T (i=1, 2, …, 5) the coordinates of the tag are T _uwb ＝[x ₀ ,y ₀ ] ^T A non-line of sight between the base station and the tag is R _i ＝||B _uwb,i -T _uwb || ₂ (i=1, 2, …, 5); taking a first base station as a common reference node to obtain a group of TDOA observation values delta t _i,1 (i=2, 3,4, 5) indicating a signal arrival time difference between the i-th base station and the first base station;

R _i,1 ＝c·Δt _i,1 (i＝2,3,4,5) (2)

adopting a label positioning architecture of 4 base stations, taking one base station as a main part, and the rest 3 base stations; when a tester carrying a locating tag enters a test area, signals sent by the tag are received by one or more sensors; decoding signals from the sensors that transmit angle of arrival and timing information, and then transmitting these data to the master sensor; the main sensor collects all information sent by the base station to calculate the position information of the tag, so that data collection is realized; then the sensor transmits data every second through the exchanger and the server, the data adopts a UDP data packet format, the server receives the UDP data packet, and the X, Y coordinate information specific to the tag can be obtained;

the step S2 specifically includes:

the location division includes:

the partition characteristic value interval comprises:

set the range of eigenvalues phi= [ c ] _min ,c _max ]The method is divided into N grades, and grade labels are 1-N; from the value range phi= [ c _min ,c _max ]A range of sensor values r=c can be obtained _max -c _min The method comprises the steps of carrying out a first treatment on the surface of the To ensure that the length of each interval within the region is the same; then calculating to obtain the length of each interval as r=r/N; thus, a value range for each interval can be determined;

the step S3 specifically includes:

the information delta is the difference between them:

gain(A)＝info(D)-info _A (D) (6)