CN108875597B

CN108875597B - Large-scale data set-oriented two-layer activity cluster identification method

Info

Publication number: CN108875597B
Application number: CN201810538902.9A
Authority: CN
Inventors: 郑增威; 杜俊杰; 孙霖; 霍梅梅; 陈垣毅
Original assignee: Zhejiang University City College ZUCC
Current assignee: Hangzhou City University
Priority date: 2018-05-30
Filing date: 2018-05-30
Publication date: 2021-03-30
Anticipated expiration: 2038-05-30
Also published as: CN108875597A

Abstract

The invention relates to a two-layer activity cluster identification method facing a large-scale data set, which comprises the following steps: 1) sparse coding based active clustering; 2) feature selection and training of a group classifier; 3) feature selection and training of intra-group classifiers. The invention has the beneficial effects that: according to the method, activities are divided into different groups on a large-scale data set according to the similarity, characteristics with higher pertinence are selected, and the accuracy of activity identification is improved; compared with a single-layer classification model, the classification effect of the two-layer activity clustering recognition model is obviously improved, and the selected features are more targeted; the feature selection method can select important features more and use fewer features to achieve satisfactory classification accuracy.

Description

Large-scale data set-oriented two-layer activity cluster identification method

Technical Field

The invention relates to a wearable sensor-based activity recognition method, in particular to a two-layer activity cluster recognition method for a large-scale data set.

Background

At present, the activity recognition based on the wearable sensor achieves high recognition accuracy, but most of research works are carried out on a small-scale data set, and the number of involved experimenters is often small. However, in the practical application process, the number of objects involved in activity recognition is huge, and meanwhile, the activity data of the objects cannot be obtained in advance. Therefore, establishing an efficient object-independent activity recognition method on large-scale data sets remains a problem to be solved.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a two-layer activity cluster identification method for a large-scale data set.

The two-layer activity cluster identification method facing the large-scale data set comprises the following steps:

1) sparse coding based active clustering

1.1) carrying out sparse coding on sample data according to a formula (1), and solving a sparse coefficient alpha, wherein A is training data, D is a dictionary, and alpha is a sparse coefficient to be solved;

A＝Dα (1)

1.2) calculating the distance between different activity categories according to the formula (2) to obtain a matrix M with the size of n multiplied by n, wherein n is the number of the activity categories;

wherein Δ_i,jIs an activity A_iAnd A_jThe smaller the distance, the more similar the activities are, f is the number of features, N_i,kIs solved in 1) to obtain alpha_i,kNumber of non-0 coefficients, S, on the kth feature_iIs an activity A_iThe number of samples of (a);

1.3) clustering mutually selected activities into the same activity group G according to the matrix M_kIn the step (c), a preliminary active set G ═ G is obtained₁,G₂...G_kAnd A is_iAnd A_jRemove from active set a;

1.4) search the active set A, querying each A from the matrix M_pMost similar Activity A for E A_qIf A is_qE is G, then A is_pAdding A_qIn the corresponding activity group, and A_pRemove from active set a;

1.5) repeating the step 1.4) until the active set A is an empty set or the number of activities in the active set A is not changed any more;

1.6) if A is not empty, then cluster all the activities left in A into a new activity group G_mIn (1), G_mAdding the mixture into G;

1.7) outputting an activity group set G to complete the grouping of activities;

2) feature selection and training of group classifiers

2.1) according to the activity group set completed in the step 1), taking the same activity group as the same type of activity to perform feature selection, wherein the feature selection method is shown as a formula (3):

wherein W_kIs the weight value of the kth feature, var (f)_k) Is the variance of the kth feature, var (f)_k,i) Is the variance of the kth feature over activity i;

2.2) carrying out group classifier training according to the features selected in the step 2.1) to obtain a first-layer classifier, wherein the classifier is used for classifying activities into a certain activity group;

3) feature selection and training of intra-group classifiers

3.1) for each activity group, respectively selecting features in different groups by using the formula (3);

and 3.2) training intra-group classifiers according to the selected features in each activity group to obtain a second-layer classifier, wherein the classifier is used for classifying the activities to the final specific activities.

Preferably, the method comprises the following steps: in the step 1.3), A is mutually selected_iThe most similar activity of is A_j，A_jThe most similar activity of is A_i。

The invention has the beneficial effects that:

1. according to the invention, activities are divided into different groups according to the similarity on a large-scale data set, characteristics with higher pertinence are selected, and the accuracy of activity recognition is improved.

2. Compared with a single-layer classification model, the classification effect of the two-layer activity clustering recognition model is obviously improved, and the selected features are more targeted.

3. The feature selection method can select important features more and use fewer features to achieve satisfactory classification accuracy.

Drawings

FIG. 1 is a general flow diagram of the present method;

FIG. 2 is a two-tier active cluster identification model constructed on a HASC-PAC dataset;

fig. 3 is a comparison of different feature selection methods.

Detailed Description

The present invention will be further described with reference to the following examples. The following examples are set forth merely to aid in the understanding of the invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

The large-scale dataset used in the experiment was HASC-PAC2016, which contained behavioral data for 510 different people, with specific activity types and their labels: standing (1), walking (2), jogging (3), jumping (4), going upstairs (5) and going downstairs (6).

The two-layer activity cluster recognition method facing the large-scale data set is characterized in that an overall training flow chart is shown in figure 1, and the method comprises the following specific steps:

step one, active clustering based on sparse coding

1) And carrying out sparse coding on the HASC-PAC2016 data set, and solving a sparse coefficient.

2) The distance between different activity classes is calculated according to the sparse coefficient to obtain a matrix M of 6 × 6, and as shown in table 1, the lower the value, the more similar the activities are.

TABLE 1 distance between different Activity classes

	Standing up	Walk	Jogging	Jumping toy	Go upstairs	Go downstairs
							Standing up	0	6.9563	7.6925	7.6735	6.8843	6.9700
Walk	6.9563	0	1.3304	1.0204	0.3730	0.3739
							Jogging	7.6925	1.3304	0	0.8013	1.3143	1.2329
Jumping toy	7.6735	1.0204	0.8013	0	1.0558	0.9272
							Go upstairs	6.8843	0.3730	1.3143	1.0558	0	0.2886
Go downstairs	6.9700	0.3739	1.2329	0.9272	0.2886	0

3) The activities selected from each other are clustered according to the matrix M, wherein the activities of jogging and jumping are selected from each other, and the activities of going upstairs and downstairs are selected from each other, so that two preliminary activity sets G1: { jogging, jumping }, G2: { upstairs and downstairs }, can be obtained, and the four activities are removed from the activity sets.

4) At this point in the active set there are two activities remaining standing and walking, searching the active set, the activity most similar to standing is ascending stairs, which belong to the active group G2, so standing is added to G2 and removed from the active set.

5) And repeating the step 4) until the active set is an empty set or the number of activities in the active set is not changed any more.

6) And outputting the activity group set G to complete the grouping of the activities.

The final grouping results of the experiment on the HASC-PAC2016 dataset were 2 different groups, as shown in table 2:

TABLE 2 grouping results for HASC-PAC2016 dataset

Group of	Active set
		First group	Jogging and jumping
Second group	Standing, walking, going upstairs and downstairs

Step two, feature selection and training of group classifier

According to table 2, the activities in the same group are considered as the same type of activities, the learning of feature weights and the feature extraction are performed using formula (3), and then the training of a group classifier for classifying the activities into a certain group is performed using the selected features.

Step three, feature selection and training of intra-group classifier

And (3) respectively performing feature weight learning and feature selection on 2 different groups by using a formula (3), wherein the features selected on the different groups are different, and then performing training of an intra-group classifier according to the selected features, wherein the intra-group classifier is used for classifying the activities to be recognized into specific activities.

The resulting two-layer active cluster recognition model trained on the HASC-PAC2016 dataset is shown in FIG. 2.

Experiments and results are as follows:

the invention aims to divide activities into different groups according to similarity on a large-scale data set, select more targeted characteristics and improve the accuracy of activity recognition. To measure the effectiveness of this method, we performed experiments on a large-scale data set HASC-PAC2016 using impersonal and 5-fold cross validation using SVM and Random forest as the basic classifiers. The results of the experiment are shown in table 3:

TABLE 3 Experimental results of two-layer Activity group recognition model

Method	HASC-PAC2016
		Single-layer SVM	0.6037
Two-layer SVM	0.6379
		Single layer Random forest	0.7035
Two-layer Random forest	0.7441

As can be seen from table 3, the classification effect of the two-layer active clustering recognition model is significantly improved compared with the original classification model, which indicates that the features selected by the two-layer classification model are more targeted. Meanwhile, in order to verify the performance of the proposed feature selection method, other three common feature selection methods are selected to be compared on two layers of classification models, namely Laplacian score and Relief-F, MCFS. The experimental results are shown in fig. 3, and the selected number of features is 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, respectively. We can see that the feature selection method of the present invention can select more important features than the other three methods, and can use fewer features to achieve a satisfactory classification accuracy.

Claims

1. A two-layer activity cluster identification method facing a large-scale data set is characterized by comprising the following steps:

1) sparse coding based active clustering

A＝Dα (1)

1.3) clustering mutually selected activities into the same activity group G according to the matrix M_kIn the middle, get preliminary activitiesSet G ═ G₁,G₂...G_kAnd A is_iAnd A_jRemove from active set a;

1.7) outputting an activity group set G to complete the grouping of activities;

2) feature selection and training of group classifiers

3) feature selection and training of intra-group classifiers

2. The large-scale data set-oriented two-tier active cluster recognition method according to claim 1, wherein in the step 1.3), A is selected as each other_iThe most similar activity of is A_j，A_jThe most similar activity of is A_i。