CN110781961A

CN110781961A - Accurate behavior identification method based on decision tree classification algorithm

Info

Publication number: CN110781961A
Application number: CN201911025926.5A
Authority: CN
Inventors: 张玉成; 王振; 姚永康; 聂文都
Original assignee: Xijing University
Current assignee: Xijing University
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2020-02-11
Anticipated expiration: 2039-10-25
Also published as: CN110781961B

Abstract

The invention discloses an accurate behavior identification method based on a decision tree classification algorithm, which comprises the following steps: s1, data collection: solving the label position by adopting a Chan algorithm based on TDOA; s2, feature extraction: selecting a characteristic value and partitioning a characteristic value interval; s3, behavior recognition: and establishing a user behavior recognition model. The invention discloses a method for realizing behavior recognition based on a decision tree algorithm. Experimental results show that the algorithm has good daily behavior recognition performance under specific conditions.

Description

Accurate behavior identification method based on decision tree classification algorithm

Technical Field

The invention relates to a behavior recognition method, in particular to an accurate behavior recognition method based on a decision tree classification algorithm.

Background

At present, with the continuous progress of wireless technologies such as bluetooth Zigbee UWB (ultra wide band) and wireless network, the vigorous development of ad hoc network wireless sensor networks and internet of things has attracted extensive attention in academia. Behavior recognition is widely recognized as an indispensable key technology. The behavior recognition information helps in pre-warning, decision-making during events, and post-processing emergencies in network emergencies. The behavior recognition technology plays a decisive role in the further development of wireless networks. Therefore, research on behavior recognition technology is important.

In an indoor location, since a signal propagation environment is more complicated than that in an outdoor location, it is difficult to accurately analyze parameters such as a signal arrival time or an arrival angle. However, with the continuous progress and development of wireless sensor networks WSNs, academic research has not been limited to traditional indoor positioning and location awareness. Currently, position sensing using radio has been started in many fields, and rapid development and maturity of position sensing technology based on UWB (ultra wide band) radar systems are most prominent. In the reference, a UWB channel based on human occlusion is proposed, and the influence of human occlusion on TOA ranging errors is studied by measuring and analyzing the ranging errors of TOAs. At present, a novel indoor positioning technology based on commercial Wi-Fi equipment has better development advantages in all aspects. Such as indoor intrusion detection, campus security, staff detection in shopping malls, patient monitoring, real-time detection of old people and children at home, etc.

The frequency band of the 2.4GHz wireless network is similar to that of Bluetooth, and the positioning method is also influenced by the environment, so that the data are inaccurate when obstacles or electromagnetic interference is encountered. Compared with the characteristics of various positioning technologies, UWB is a long-term research hotspot in the field of radio frequency communication at home and abroad. Many existing behavior awareness methods employ image processing methods. And extracting low-level features by using the image information, identifying human motion and constructing a human motion mode. But the disadvantage is that the extraction amount of the characteristic value is large, and the safety and the privacy of the user are seriously threatened. Therefore, more and more technologies adopt a sensor which senses small size, is cheap to deploy, is simple and is resistant to interference to replace an image processing method. The existing algorithm adopts big data parallel classification to guide a power supply mode, but does not consider the problem of energy consumption; some methods refine the original classification data and provide a maximum attribute index algorithm in the concept refinement of the same level. A hierarchical geometric distribution mechanism is used between different levels to more reasonably distribute the privacy budget. However, data distribution cannot be implemented in a dynamic data environment. Some methods improve the objective function of the decision tree generation algorithm, so that inconsistent data can be classified, and influence factors of the function are directly adjusted, so that node segmentation of the decision tree is more accurate, and the classification effect is better. Some methods adopt HBase data classification-based compression strategy selection. However, the data processing process is relatively complex.

Disclosure of Invention

The invention mainly aims to provide an accurate behavior identification method based on a decision tree classification algorithm.

The technical scheme adopted by the invention is as follows: an accurate behavior identification method based on a decision tree classification algorithm comprises the following steps:

s1, data collection: solving the label position by adopting a Chan algorithm based on TDOA;

s2, feature extraction: selecting a characteristic value and partitioning a characteristic value interval;

s3, behavior recognition: and establishing a user behavior recognition model.

Further, the step S1 specifically includes:

considering the positioning accuracy and the equipment cost, selecting 4 base stations is more suitable; in a two-dimensional rectangular plane coordinate system, the coordinate of the ith base station is B _uwb,i＝[x _i,y _i] ^T(i-1, 2, …,5) with the label coordinate T _uwb＝[x ₀,y ₀] ^TThe non-line of sight between the base station and the tag is R _i＝||B _uwb,i-T _uwb|| ₂(i ═ 1,2, …, 5); obtaining a set of TDOA observations Δ t using a first base station as a common reference node _i,1(i ═ 2,3,4,5) indicating a signal arrival time difference between the ith base station and the first base station;

in the case of this model, it is,

is Δ t _i,1True value of (n) _i,1Measured by systematic error, NLOS errorIs n _NLOS,i；

Let the signal propagation speed be c, and calculate R _i,1Difference between distance from marker to ith base station and first base station:

R _i,1＝c·Δt _i,1(i＝2,3,4,5) (2)

establishing three hyperbolic equations R according to hyperbolic characteristics _i,1＝R _i-R ₁(i ═ 2,3,4,5), for T _uwbCan be established as shown in equation (3);

a 4 base station label positioning framework is adopted, one base station is taken as a main part, and the rest 3 base stations are all from the base station; when a tester carrying the positioning label enters a testing area, a signal sent by the label is received by one or more sensors; decoding signals from the sensors that send angle of arrival and timing information and then transmitting these data to the master sensor; the main sensor collects all information sent by the base station to calculate the position information of the label, so that data collection is realized; the sensor then transmits data through the switch and the server every second, the data is in a UDP data packet format, and the server receives the UDP data packets, so that label-specific X, Y coordinate information can be obtained.

Further, the step S2 specifically includes:

the feature value selection includes: position division, head, shoulder, waist and knee height processing and distance movement in unit time of the head, shoulder, waist and knee joint;

the location division includes:

in real life, the position of a user has a certain relationship with the behavior and activity of the user; spatial positions are divided into three categories: first, the user can sit at a place to rest on his back; the second type is the distance of 0.1-0.3 meters in area, depending on the object, denoted Da; residual space is of a third type, denoted La;

the head, shoulder, waist and knee height processing comprises head height, shoulder height, waist height and knee height;

the Z-axis data of the head, the shoulder, the waist and the knee joint of the user represents the height of the user space and is directly read from the label coordinate;

the distance moves within unit time of the head, the shoulder, the waist and the knee joint and comprises the distance from the head to the shoulder to the waist and the knee joint;

the direct calculation of the distance between the head, shoulder, waist and knee joint of the user in unit time is difficult to realize, mainly because the unit time is difficult to determine; the displacement calculation result cannot accurately describe the user behavior due to the influence of the overlarge numerical value, so that the accuracy of the user behavior identification error is reduced; too small a value increases because of the large amount of computation due to delay overhead; through multiple experiments, the optimal unit time LS of weighing accuracy and operation delay is obtained;

the partition eigenvalue interval includes:

after determining the above feature values, classification boundaries must be determined to ensure similarity between data and differences within and between classes; combining with experiments, and processing the test sample by adopting a layered classification method; the classification focuses on how to determine the boundaries of each level; currently, two algorithms are used to determine the boundary values: length equivalence and distributed equivalence;

let the range of characteristic values phi ═ c _min,c _max]Dividing the data into N levels, wherein the level labels are 1-N; from the range of values phi ═ c _min,c _max]The range of sensor values R ═ c can be found _max-c _min(ii) a To ensure that the length of each interval in the region is the same; then, the length of each interval is obtained by calculation as R ═ R/N; thus, the value range of each interval can be determined.

Further, the step S3 specifically includes:

assuming that D is the training tuples divided by category, the entropy of D is expressed as:

wherein p is _iRepresents the probability that the ith class appears in the entire training tuple, which can be estimated by dividing the number of elements belonging to that class by the total number of elements in the training tuple;the actual meaning of entropy represents the average amount of information needed for tuple class marking in D;

assuming that the training tuple D is divided by the attribute A, the expected information of the partition D is:

the information increment is the difference between them:

gain(A)＝inf o(D)-inf o _A(D) (6)

establishing a user behavior recognition model; the specific behavior identification steps are as follows:

s31, classifying and collecting various behaviors according to the training data set, and dividing the training tuples into entropy inf o (D) of the training set;

s32, extracting a position height characteristic value from data preprocessing, calculating a characteristic value interval and dividing the characteristic value;

s33, obtaining information gain (a) about the partition characteristic values before expectation by step 2;

s34, differences between information increments due to expected information differences;

when the maximum output value is reached, the maximum gain is memorized as the maximum memory; the incremental information is then the corresponding behavior information.

The invention has the advantages that:

the invention discloses a method for realizing behavior recognition based on a decision tree algorithm. Experimental results show that the algorithm has good daily behavior recognition performance under specific conditions.

In addition to the objects, features and advantages described above, other objects, features and advantages of the present invention are also provided. The present invention will be described in further detail below with reference to the drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention.

FIG. 1 is a diagram of a behavior recognition framework of the present invention;

FIG. 2 is a sectional view of a user's body part awaiting testing in accordance with the present invention;

FIG. 3 is an experimental environment plan of the present invention;

fig. 4 is a comparison graph of the behavior recognition error rates of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, as shown in fig. 1, a method for identifying an accurate behavior based on a decision tree classification algorithm includes the following steps:

s3, behavior recognition: and establishing a user behavior recognition model.

The step S1 specifically includes:

in the case of this model, it is,

is Δ t _i,1True value of (n) _i,1Measured by systematic error, NLOS error is n _NLOS,i；

R _i,1＝c·Δt _i,1(i＝2,3,4,5) (2)

establishing three hyperbolic equations R according to hyperbolic characteristics _i,1＝R _i-R ₁(i ═ 2,3,4,5), for T _uwbCan be established as shown in equation (3).

The step S2 specifically includes:

the location division includes:

in real life, the position of a user has a certain relationship with the behavior and activity of the user; for example, on a sofa, a user may be sitting or lying; in a hallway, when a user falls or lies near an item at home, they may run while walking, which is more likely to fall than they are near an item at home; therefore, spatial locations are divided into three categories: first, a user may sit in a place where he or she is lying to rest, such as a sofa, bed, chair, etc., and is recorded as ra (restarera); the second category is the distance of the area 0.1-0.3 meters, depending on the object, denoted da (distanceaera); residual space is of a third type, denoted la (latarata); the division for measuring the body part of the user is shown in fig. 2;

the head, shoulder, waist and knee height processing comprises (head height, shoulder height, waist height and knee height);

the Z-axis data of the head, the shoulder, the waist and the knee joint of the user represents the height of the user space and can be directly read from the label coordinate;

the distance moves in unit time including the distance from the head to the shoulder to the waist and the knee joint;

the partition eigenvalue interval includes:

let the range of characteristic values phi ═ c _min,c _max]Dividing the data into N levels, wherein the level labels are 1-N; from the range of values phi ═ c _min,c _max]The range of sensor values R ═ c can be found _max-c _min(ii) a To ensure that the length of each interval in the region is the same; then, the length R of each interval is obtained by calculationN; thus, the value range of each interval can be determined. For example, the value range of the ith interval is [ c ] _min+(i-1)r,c _min+ir]。

The step S3 specifically includes:

the decision tree algorithm is a method of approximate discrete function value, which is a typical classification method, and the basic idea is to process data first, then use induction algorithm to generate readable rules and decision tree, and then use decision to analyze new data; essentially, a decision tree is a process of classifying data through a series of rules; common decision tree classification algorithms include ID3, C4.5, CART, etc.; the smaller the expected information, the larger the information gain and the higher the purity, and the core idea of the ID3 algorithm is to select the information gain as an attribute; the identification is performed using the ID3 algorithm. Assuming that D is the training tuples divided by category, the entropy of D is expressed as:

wherein p is _iRepresents the probability that the ith class appears in the entire training tuple, which can be estimated by dividing the number of elements belonging to that class by the total number of elements in the training tuple; the actual meaning of entropy represents the average amount of information needed for tuple class marking in D;

now assuming that the training tuple D is divided by the attribute A, the expected information for the D partition is:

the information increment is the difference between them:

gain(A)＝inf o(D)-inf o _A(D) (6)

the ID3 algorithm calculates the gain rate of each attribute each time segmentation is required, and then selects the attribute with the largest gain rate for segmentation; therefore, as long as the maximum gain rate can be found, the best segmentation effect can be obtained;

according to the analysis, the abstract description of the user behavior is a behavior classification model and is also the basis of behavior recognition; based on the decision tree classification algorithm, a user behavior recognition model is established; the specific behavior identification steps are as follows:

To verify the performance of the algorithm in behavior recognition, a scenario experiment was performed and three scenarios were selected from office meetings and laboratories. The resulting behavior recognition accuracy is compared to other algorithms. In the experiment, the researcher holds the positioning tag and simulates six basic actions of sitting, standing, falling, lying, walking and running. The experimental environment plan is shown in fig. 3:

during the measurement process, due to various behaviors simulated by the human hand label and instability of the sensor, during the preprocessing process, many data different from the actual behaviors may need to be eliminated. Finally, as shown in table 1, valid data is provided.

TABLE 1 efficient data set

The areas show that the three characteristics of sitting posture, lying and walking are respectively three characteristics. Y represents user behavior, 1 represents sitting, 2 represents standing, 3 represents falling, 4 represents lying, 5 represents walking, and 6 represents running. Height and distance are in meters.

Through algorithm analysis, the accuracy of each behavior recognition is found to be divided into three categories. All the characteristic values participate in the 1 st class, and the identification result is shown in the table 2; the second category does not contain a position feature recognition result, and the recognition result is shown in table 3; the third category does not contain high feature recognition results, which are shown in table 4.

TABLE 2 behavior accuracy (%)

As can be seen from table 2, both standing and lying behaviour are most easily identified, since both states are easier to identify and the tag data characteristics are obvious, i.e. there is no significant change in height and position. Namely the moving distance per unit time is 0; the recognition effect of the sitting and walking states is relatively different, and the movement position and the height of the person are slightly changed and the characteristic value is relatively close; since the range of variation is also better differentiated.

TABLE 3 behavior recognition accuracy (%), excluding position feature value

As can be seen from table 3, the recognition rate of each behavior is lower than that of all the feature values; standing and lying behaviour is still most easily recognized; the standing and sitting posture identification effect is not obvious and is influenced by the position characteristic value; however, knowledge of the decline and the operating state is still apparent.

TABLE 4 behavior recognition accuracy (%), excluding height eigenvalue

In table 4, it is shown that the standing and lying behaviors are most easily recognized because both states are less affected by the height characteristic value and the walking and sitting posture recognition rate is significantly reduced.

Compared with tables 3 and 4, the accuracy of behavior recognition is significantly higher than the position feature value due to the influence of the height feature value. To further show the best performance of the decision tree algorithm, the present invention compares naive Bayesian network (NBN, NaiveBayesian network), Random Forest (RF, Random Forest), Random Forest (KNN, K-Nearest Neighbor). Comparing data sets D and 50100150200, respectively, for behavior recognition error rates, is shown in fig. 4 below:

according to fig. 4, the present invention shows that the error rate of the decision tree algorithm DR is significantly lower than the error rates of the other three algorithms, especially with the increase of the data set, the performance is better and the error rate is lower. When the algorithm is greatly influenced by the sensor, the error rate of the KNN algorithm is the largest due to the influence on the sensor; when D is 100, the algorithm is similar to RF, and the KNN algorithm error rate is the largest; when D is 100, the error rate of the RF algorithm is the largest, NBN is the second; when D is 200, the error rate of the RF algorithm is the largest.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An accurate behavior identification method based on a decision tree classification algorithm is characterized by comprising the following steps:

s3, behavior recognition: and establishing a user behavior recognition model.

2. The method for identifying an accurate behavior based on a decision tree classification algorithm according to claim 1, wherein the step S1 specifically comprises:

selecting 4 base stations; in a two-dimensional rectangular plane coordinate system, the coordinate of the ith base station is B _uwb,i＝[x _i,y _i] ^T(i-1, 2, …,5) with the label coordinate T _uwb＝[x ₀,y ₀] ^TThe non-line of sight between the base station and the tag is R _i＝||B _uwb,i-T _uwb|| ₂(i ═ 1,2, …, 5); obtaining a set of TDOA observations Δ t using a first base station as a common reference node _i,1(i ═ 2,3,4,5) indicating a signal arrival time difference between the ith base station and the first base station;

in the case of this model, it is,

R _i,1＝c·Δt _i,1(i＝2,3,4,5) (2)

3. The method for identifying an accurate behavior based on a decision tree classification algorithm according to claim 1, wherein the step S2 specifically comprises:

the location division includes:

the partition eigenvalue interval includes:

4. The method for identifying an accurate behavior based on a decision tree classification algorithm according to claim 1, wherein the step S3 specifically comprises:

the information increment is the difference between them:

gain(A)＝info(D)-info _A(D) (6)

s31, classifying and collecting various behaviors according to the training data set, and dividing the training tuples into entropy info (D) of the training set;