CN114118167A - Action sequence segmentation method based on self-supervision less-sample learning and aiming at behavior recognition - Google Patents

Action sequence segmentation method based on self-supervision less-sample learning and aiming at behavior recognition Download PDF

Info

Publication number
CN114118167A
CN114118167A CN202111471435.0A CN202111471435A CN114118167A CN 114118167 A CN114118167 A CN 114118167A CN 202111471435 A CN202111471435 A CN 202111471435A CN 114118167 A CN114118167 A CN 114118167A
Authority
CN
China
Prior art keywords
sample
data
sensor
segmentation
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111471435.0A
Other languages
Chinese (zh)
Other versions
CN114118167B (en
Inventor
肖春静
陈世名
韩艳会
康红霞
王一凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202111471435.0A priority Critical patent/CN114118167B/en
Publication of CN114118167A publication Critical patent/CN114118167A/en
Application granted granted Critical
Publication of CN114118167B publication Critical patent/CN114118167B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of sensor action recognition, and discloses an action sequence segmentation method based on self-supervision less sample learning for action recognition, which comprises the following steps: constructing an automatic supervision small sample action sequence segmentation framework SFTSeg; the framework is based on a twin neural network, and takes marked samples of a large number of source sensors, marked samples of a small number of target sensors and unmarked samples of target sensors as input data; respectively constructing a cross entropy loss function, a consistency regularization loss function and an automatic supervision loss function to carry out twin neural network training; the trained SFTSeg is then used for state label prediction and activity segmentation. The method has good activity segmentation effect under different sensors in different scenes, and can achieve the good activity segmentation effect only by marking samples with few target sensors.

Description

Action sequence segmentation method based on self-supervision less-sample learning and aiming at behavior recognition
Technical Field
The invention belongs to the technical field of sensor action recognition, and particularly relates to an action sequence segmentation method based on self-supervision few-sample learning and aiming at action recognition.
Background
Human activity recognition is considered a key aspect of many emerging internet of things applications, such as smart home and healthcare, where the effect of activity segmentation is crucial. Prior to activity classification, the continuously received sensor data is typically subdivided into subsequences (each corresponding to a single activity). And the segmentation results will be input into the classification model for behavior recognition. Accordingly, these results have a significant impact on the performance of the activity classification. Therefore, a great deal of research has been conducted on activity segmentation, including unsupervised methods and supervised models.
For unsupervised approaches in the activity segmentation task: both CPD (change point detection) and threshold-based approaches require a threshold to distinguish active boundaries. However, the optimal threshold value requires a user to have a great deal of experience and is determined according to the actual scenario. Furthermore, time shape based methods (such as FLOSS) require specific information for a given problem to determine time constraint parameters, which are relatively environmentally dependent.
For the supervision method in the activity segmentation task: although the problems of subjectivity and environmental dependence can be alleviated, a large number of target sensor marking samples are needed to train the model, and the practical situation that the model is time-consuming, labor-consuming and limited by the human environment cannot be achieved necessarily.
Disclosure of Invention
The invention provides an action sequence segmentation method based on self-supervision less sample learning aiming at behavior recognition, aiming at the problems that the existing method for segmenting activities is relatively dependent on environment and needs a large number of labeled samples to train a model, wherein the action sequence segmentation method has good activity segmentation effect under different sensors in different scenes, and can achieve good activity segmentation effect only by using few labeled samples of a target sensor.
In order to achieve the purpose, the invention adopts the following technical scheme:
an action sequence segmentation method aiming at behavior recognition and based on self-supervision few-sample learning comprises the following steps:
step 1: constructing an automatic supervision small sample action sequence segmentation framework SFTSeg; the framework is based on a twin neural network, and takes marked samples of a large number of source sensors, marked samples of a small number of target sensors and unmarked samples of target sensors as input data; the marking sample of the source sensor and the marking sample of the target sensor correspond to four state labels which are respectively in a static state, a starting state, a motion state and an ending state; the samples refer to a sequence of actions derived from sensor data;
step 2: constructing a cross entropy loss function for the labeled sample of the source sensor to carry out twin neural network training;
and step 3: for the labeled sample of the target sensor, taking the labeled sample of the source sensor as disturbance, injecting the disturbance into the labeled sample of the target sensor as enhancement data, and constructing a consistency regularization loss function to perform twin neural network training;
and 4, step 4: constructing a positive sample pair and a negative sample pair based on the unlabeled sample of the target sensor, and constructing an auto-supervision loss function based on the positive sample pair and the negative sample pair to train the twin neural network so that the twin neural network can capture the characteristics of the unlabeled sample of the target sensor;
and 5: and (3) obtaining the trained SFTSeg through the steps 1-4, inputting the sample of the target sensor serving as the test sample into the trained SFTSeg, predicting the state label of the test sample by the trained SFTSeg, and then performing activity segmentation on the test sample according to the predicted state label.
Further, the step 3 comprises:
the enhancement data is constructed according to the following rules:
A. the labeled samples of the compressed source sensor as perturbations have the same class as the labeled samples of the target sensor;
B. adding the compressed labeled sample of the source sensor to the labeled sample of the target sensor according to the warping path; the warp path is generated by a dynamic time warping algorithm.
Further, the step 4 comprises:
dispersing the action sequence into an overlapped window with a fixed size of w by adopting a sliding window, wherein the sliding step length is l;
two windows are considered a positive sample pair if they meet the following constraints: the two windows are adjacent; the two windows contain the same number of change points, and the difference of the two windows does not contain any change points;
two windows are considered as a negative example pair if they meet the following constraints: the two windows are spaced apart by more than a given minimum distance in time; the two windows contain different numbers of change points; the change point is a time point when the action sequence behavior changes suddenly.
Further, the step 4 further includes:
for the positive sample pairs, firstly calculating the SEP score of a difference set of the positive sample pairs, and then filtering the positive sample pairs according to the SEP score;
for the negative sample pair, dividing each sample of the negative sample pair into h disjoint parts, and then calculating SEP scores of all two continuous parts to obtain the highest SEP score of each sample of the negative sample pair; then, calculating the dissimilarity score of the negative sample pair according to the highest SEP score of each sample of the negative sample pair; and eliminating negative sample pairs with lower dissimilarity scores.
Compared with the prior art, the invention has the following beneficial effects:
the invention proposes a self-supervised, sample-less motion sequence segmentation framework SFTSeg to segment the activity on motion sequence data and to implement sample-less learning and classification using a twin neural network. The conventional activity segmentation method is usually based on the same sensor, the method can enhance the identification accuracy of target sensor data by using source sensor data, and can realize good activity segmentation and identification effects by using few target sensor marking samples. The method realizes a few-sample activity segmentation technology, and adopts a twin neural network as a main realization method of few-sample learning. Aiming at three different data, the invention respectively designs different loss functions to enhance the training effect: aiming at the marked samples of the source sensor, constructing a cross entropy loss function to force the input samples to corresponding categories; in order to enhance the generalization capability of the target sensor data, a consistency regularization method is introduced, a labeled sample of a source sensor is used as disturbance, the disturbance is used as enhancement data and is injected into the labeled sample of the target sensor, and the limited labeled sample of the target sensor is utilized for model training; to mitigate the large amount of drift between the source domain and the target domain, an auto-supervised learning is introduced, and a twin neural network is trained by constructing a positive sample pair and a negative sample pair based on unlabeled samples of the target sensor, so that the twin neural network can capture the characteristics of the target data.
The invention solves the problems of environmental dependence and subjectivity of designers of unsupervised methods (such as detection based on change points and threshold) in the activity segmentation task, and has good activity segmentation effect under different sensors in different scenes. The invention also solves the problem that the supervision method in the activity segmentation task needs a large amount of target sensor marking data (high cost and is limited by various conditions), and realizes that good activity segmentation effect can be achieved only by few target sensor marking samples.
Drawings
Fig. 1 is an exemplary diagram of four motion states extracted from one motion sequence;
FIG. 2 is a flowchart illustrating an action sequence segmentation method based on self-supervised low-sample learning for behavior recognition according to an embodiment of the present invention;
FIG. 3 is a diagram of an example of the difference between the surrounding and shortest paths;
FIG. 4 is a diagram of a pair of exemplary positive (negative) samples taken from an action sequence;
FIG. 5 is an exemplary diagram of activity start point detection;
FIG. 6 is a graph of the segmentation performance (F1-score) lines for different size label target data.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the accompanying drawings:
activity segmentation aims at determining the start and end times of an activity, which is the first step in human activity recognition. Because of the difficulty in collecting large amounts of label data from target sensors, unsupervised methods are widely used for activity segmentation, such as CPD-based methods and threshold-based methods. However, these methods all suffer from experience and environmental dependence. Therefore, we transform the activity segmentation task into a classification problem by the following steps: (1) firstly, discretizing continuous time sequence data into windows with equal size; (2) each window is then divided into four state categories: a stationary state, a starting state, a moving state, and an ending state; (3) finally, the start and end points of the activity are identified based on the status tags. Here, four status tags are defined as: a static state: the window is filled with time series data with no activity; a starting state: the window contains the start of the activity; and (3) motion state: the window is filled with time sequence data of human body activity; and (4) ending state: the window contains the end of the activity. An example of these four states extracted from an activity is shown in fig. 1, which illustrates the first order difference in WiFi Channel State Information (CSI) amplitude for one subcarrier as a function of time. The vertical dashed lines here are the actual start and end points of the activity.
Thus, the segmentation result depends largely on the state inference effect. Thus, the activity segmentation problem becomes a way of how to design a suitable state inference model to predict the state labels of the discrete data from the target sensors. Since labeled samples are limited, we introduce sample-less learning as a state inference model. Given that it is feasible to collect multiple labeled samples of a source sensor, our goal becomes how to construct robust small sample learning models for state inference of sensor data using three types of input data: a large amount of labeled sample from the source sensor, a small amount of labeled sample from the target sensor, and a portion of unlabeled sample (i.e., a large amount of labeled sample from the source sensor, a small amount of labeled sample from the target sensor, an unlabeled sample from the target sensor). In practical applications, since the target data may be collected in different scenarios (e.g., different people, environments, and sensor devices), their styles and characteristics may be quite different. Therefore, the low-sample learning model should be able to solve the problem of large differences in the source domain and target domain data distributions.
Specifically, the action sequence segmentation method aiming at behavior recognition and based on self-supervision few-sample learning comprises the following steps:
step 1: constructing an automatic supervision small sample action sequence segmentation framework SFTSeg; the framework is based on a twin neural Network (Simense Network) and takes a marked sample of a source sensor, a marked sample of a target sensor and an unmarked sample of the target sensor as input data; the marking sample of the source sensor and the marking sample of the target sensor correspond to four state labels which are respectively in a static state, a starting state, a motion state and an ending state; the samples refer to a sequence of actions derived from sensor data; specifically, the framework is a twin neural network-based state inference model;
step 2: constructing a cross entropy loss function for the labeled sample of the source sensor to carry out twin neural network training;
and step 3: for the labeled sample of the target sensor, taking the labeled sample of the source sensor as disturbance, injecting the disturbance into the labeled sample of the target sensor as enhancement data, and constructing a consistency regularization loss function to perform twin neural network training;
and 4, step 4: constructing a positive sample pair and a negative sample pair based on the unlabeled sample of the target sensor, and constructing an automatic supervision loss function based on the positive sample pair and the negative sample pair to train the twin neural network, so that the twin neural network can capture the characteristics of the unlabeled sample of the target sensor, and construct the automatic supervision loss function to carry out training constraint;
and 5: and (3) obtaining the trained SFTSeg through the steps 1-4, inputting the sample of the target sensor serving as the test sample into the trained SFTSeg, predicting the state label of the test sample by the trained SFTSeg, and then performing activity segmentation on the test sample according to the predicted state label.
On the basis of the above embodiment, the present invention further provides another action sequence segmentation method based on self-supervision few-sample learning for behavior recognition, which specifically includes:
A. overview of State inference model
Specifically, for the problems of subjective and environmental dependence and insufficient target sensor labeling data, we introduce a sample-less learning model to predict the state labels of discrete data and further use the labels for segmentation activities. However, unlike general few sample learning, in an activity segmentation scenario, there is a large offset between the source domain and the target domain, and the same class label exists. To this end, we propose an auto-supervised, sample-less motion sequence segmentation framework, SFTSeg, which is specifically a twin neural network-based state inference model. As shown in fig. 2.
Specifically, the framework is based on a twin neural network with three data as inputs: the marked samples of a large number of source sensors, the marked samples of a small number of target sensors and the unmarked samples of the target sensors respectively correspond to classification loss, consistency regularization loss and self-supervision loss. First, since the labeled sample of the source sensor and the labeled sample of the target sensor have the same four categories, we are based on the labeled sample of the source sensor
Figure BDA0003392651770000061
A classification (cross entropy) loss L is constructedcl. Second, since models trained based on source sensor data alone may not accurately capture features of target sensor data, we are based on limited labeled samples of target sensors
Figure BDA0003392651770000062
Exploit the consistency regularization loss LcrTo enhance the smoothness of the model. Here, the labeled sample of the source sensor
Figure BDA0003392651770000063
Labeled sample narrowed for injection as perturbation to target sensor
Figure BDA0003392651770000064
To generate enhanced data
Figure BDA0003392651770000065
Then the sample pair
Figure BDA0003392651770000066
Is used to construct the consistency regularization loss Lcr. Third, to enhance the generalization ability of target sensor data, we designed a sample-based pair
Figure BDA0003392651770000067
Is self-monitoring of loss
Figure BDA0003392651770000068
This sample pair is extracted from unlabeled target data using our designed secondary task for the time series. We further propose an adaptive weighting method to enhance this loss.
Specifically, we achieve sample-less learning through a twin neural network. A typical twin neural network employs a convolutional neural network that is trained using a large amount of labeled data from source sensors to extract feature vectors and perform sample-less classification by measuring the distance between the new sample and the samples of each class of target sensors. Specifically, a twin neural network is composed of two networks, and its parameters are shared. Each branch uses the same network architecture, such as Convolutional Neural Network (CNN). Generally, the twin network is trained by minimizing contrast loss based on sample pairs. Given input pair sample pair (x)1,x2) And its feature vector pair (f (x) obtained by twin network1),f(x2) A distance between feature vectors in the potential space is calculated as
de=||f(x1)-f(x2)|| (1)
Contrast loss function LctIs defined as:
Figure BDA0003392651770000069
where y is the binary label assigned to the pair, i.e. if x1And x2Belonging to the same class, y is 0, otherwise y is 1; m is the margin.
B. Enhancing sample-less learning with consistency regularization
Here we try to enhance the twin neural network model from two aspects using the marker data. First, for labeled source data, we use cross-entropy (classification) loss for model training since they have the same four classes as the target data. Secondly, for labeled target data, because of their very small number, we propose a line-level data enhancement method and design a consistency regularization loss, which forces the enhanced data and the original data to have the same label distribution to strengthen model smoothness.
The classification is lost. Unlike typical few-sample learning tasks, for activity segmentation, the source sensor data and the target sensor data have the same four categories: a rest state, a start state, a motion state, and an end state. Therefore, to take advantage of labeled source sensor data, we train the neural network with cross-entropy loss rather than the general loss of few-sample learning, which can enhance the classification capability of the network. Let Dls=(xi ls,yi)N i=1Is a labeled sample set of a source sensor, where yiIs xi lsThe status tag of (1). The classifier f is a function that maps the input feature space to the label space. Labeled sample D taking into account all active sensorslsThe cross-entropy (classification) penalty is as follows:
Figure BDA0003392651770000071
where θ is the model parameter, yijRepresents a sample xiJ of the tag of one-hot form ofElement, fjIs the jth element of f.
And (5) consistency regularization. Due to the large offset between the source and target domains, a model trained based only on source sensor data may not fully capture features of the target sensor data, and accordingly may not effectively predict the state label of the target sensor data. Therefore, we introduce consistency regularization here to enhance the generalization capability of the model with limited labeled target sensor data. In other words, we devised a line-level data enhancement method for action sequence data that will generate enhancement data to build a consistency regularization loss.
The consistency regularization is intended to ensure that the classifier assigns the same class labels to the unlabeled samples that inject the perturbations. Although widely used perturbation methods such as random noise, gaussian noise, and attenuated noise can be effectively applied to image and natural language data processing, they are not suitable for time-series data due to their intrinsic properties. For example, the image perturbation method is mainly to generate pixel level variation, and the time series data needs to be linearly varied because the time series data is a waveform that varies with time. Furthermore, the augmented data should have a similar style as the target sensor data, which facilitates the inference model learning characteristics of the target sensor data.
To do this, we reduce the labeled sample of the source sensor as a perturbation and inject the perturbation into the labeled sample of the target sensor to generate the enhanced data. The original sample (labeled sample of the target sensor) and the enhanced data are then input into the twin neural network, and the distance between the corresponding features of the original sample and the enhanced sample is minimized to train the twin neural network.
Specifically, to generate enhanced data having the style of target sensor data, we build the enhanced data according to two rules: (i) the labeled samples of the source sensor that are compressed as perturbations should be of the same class as the labeled samples of the target sensor; (ii) injecting the compressed labeled sample of the source sensor into the labeled sample of the target sensor according to the warp path. Here, the warp path generated using a Dynamic Time Warping (DTW) algorithm maps elements of two data sequences to minimize the distance between them. An example of warped paths and shortest paths for two motion sequence samples is shown in fig. 3. Here, the solid black and gray lines represent waveforms of two time-series samples, and the dashed gray lines represent the warp path in fig. 3(a) and the shortest path in fig. 3 (b). When the solid black line is used as the disturbance, if the disturbance is added to the solid gray line by the shortest path, the waveform will be distorted sharply, as shown in fig. 3 (c). The result obtained with the warp path in fig. 3(d) is that the basic shape is maintained and that there is some change in the waveform.
Thus, the data is enhanced
Figure BDA0003392651770000081
The calculation is as follows:
Figure BDA0003392651770000082
where Aggregate (x, x')) sums the two sequences according to the warp path. And h (x) is a function of the amplitude of the systolic sensor data, e.g., h (x) γ x. Where γ ∈ (0,1) may be a hyper-parameter that adjusts the degree of shrinkage.
Therefore, to penalize the original sample
Figure BDA0003392651770000083
And enhanced data
Figure BDA0003392651770000084
The consistency regularization loss is calculated by the following formula:
Figure BDA0003392651770000085
wherein f (x) refers to a feature vector through a twin neural network, DtIs a marker data set from a target sensor.
C. Facilitating few sample learning through self-supervision
To further enable the inference model to learn characteristics of the target data, we herein incorporate an auto-supervised technique into the sample-less learning that utilizes a large amount of unlabeled target data for model training. To this end, we propose an auxiliary task that fits the time series data to build the auto-supervised loss, and adjust the importance of each training sample pair by designing the adaptive weights to further enhance this loss.
Loss of self-supervision. In order to train the twin network by using the self-supervision method, an auxiliary task based on unlabeled data needs to be designed for the twin network. Although there are some effective auxiliary tasks in the fields of computer vision and natural language processing (e.g., image rotation, warping, and cropping), they are not applicable to continuous time-series data. For example, a widely used image rotation task in computer vision aims to assign the same label to the rotated image as the original image. However, for active segmentation, when the sequence with the start state label is rotated by 180 °, it is easily confused with the sequence of the end state. As shown in fig. 1, the data with the start state tag is rotated by 180 deg. to have a shape very similar to the data with the end state.
To this end, we propose an auxiliary task adapted to the time series data, which constructs many pairs of positive and negative samples based on unlabeled target data to train the twin network. Here, a positive exemplar pair means that both exemplars have the same state label, while a negative exemplar pair is opposite. We consider two consecutive windows of similar shape as a pair of positive samples and two separate windows of different shape as a pair of negative samples. In particular, we discretize the sequence of actions into overlapping windows of size w using a sliding window, where the sliding step is l. Two windows are considered a positive sample pair if they meet the following constraints: (i) the two windows are adjacent; (ii) they contain the same number of change points and the difference of the two windows does not contain any change points. Accordingly, two windows are considered a negative example pair if they meet the following constraints: (i) the two windows are sufficiently separated from each other that they should be spaced apart by more than a given minimum distance in time (e.g. 2 x w); (ii) they contain a different number of change points, i.e. one window contains one change point and another window has no change point. Here, the change point is a time point at which the action sequence behavior abruptly changes. For a time series containing motion data, the activity transition may be considered a change point. Thus, if two consecutive windows contain the same number of change points, they should have the same state label and be considered as a positive sample pair. And two windows with different change points should have different state labels and be regarded as a negative sample pair. We used a density ratio-based method SEP (S.Aminikhanghahi, T.Wang, and D.J.Cook, "Real-time change point detection with application to smart home time series Data," IEEE Transactions on Knowledge and Data Engineering, vol.31, No.5, pp.1010-1023,2019) to detect the change points. The method determines the change point by comparing the probability metric and the alteration score with corresponding thresholds, achieving better performance.
Fig. 4 gives an example of positive and negative sample pairs, where the vertical dashed lines are the true start and end points of the campaign. As shown in fig. 4, sample pairs
Figure BDA0003392651770000091
Are positive because they are two adjacent windows and they have only one point of change, and two difference sets
Figure BDA0003392651770000101
And
Figure BDA0003392651770000102
without any change points. In this example, the vertical dashed lines are change points, as they are active transition points. Accordingly, the sample pair
Figure BDA0003392651770000103
Are pairs of negative samples because they are far apart and have different numbers of change points.
According to these rules, the label can be never markedA large number of positive and negative pairs are obtained in the target data. To improve the sample quality, we further eliminated samples with low confidence. Specifically, a larger SEP score means that there is a greater probability of a change point being present. For positive sample pairs, due to the difference set of sample pairs
Figure BDA00033926517700001013
And
Figure BDA00033926517700001014
there should be no change points, so we discard pairs of samples with higher SEP scores in the disparity set. To do this, we first compute the SEP scores for the difference set of a sample pair and then filter the sample pairs according to their scores. For a difference set of a sample pair, the difference set is divided equally into two parts: x is the number oft-1And xtEach length is s, then we calculate their density ratio as follows:
Figure BDA0003392651770000104
wherein f ist-1(x) And ft(x) The probability estimation densities corresponding to the two parts, respectively. Next, the structure of the SEP change point score is as follows:
Figure BDA0003392651770000105
in this way, a set of differences can be computed
Figure BDA0003392651770000106
And
Figure BDA0003392651770000107
SEP value of
Figure BDA0003392651770000108
And
Figure BDA0003392651770000109
then, to ensure the quality of these training samples and avoid overfitting, we eliminated 10% of the positive sample pairs:
Figure BDA00033926517700001010
wherein f isdropThe situation of the rejection is reflected,
Figure BDA00033926517700001011
is the average of the two scores, and epsilon is a threshold determined by the ranking of the SEP values and the culling rate for all positive sample pairs.
For negative sample pairs, we expect one sample to have one change point, while the other sample has no change point. To meet this requirement, we filter out pairs of samples with smaller degrees of difference at the point of change. For this reason, we design a dissimilarity score based on the SEP score to eliminate negative sample pairs with lower confidence. Specifically, each sample of the negative sample pair is divided into h disjoint parts, and then the SEP scores for all two consecutive parts are calculated using equation (7). Let
Figure BDA00033926517700001012
Representing SEP scores for the jth and (j +1) th parts. The highest SEP score was:
Figure BDA0003392651770000111
thus, each sample from a negative sample pair can calculate a maximum SEP score,
Figure BDA0003392651770000112
suppose that
Figure BDA0003392651770000113
And
Figure BDA0003392651770000114
the maximum SEP score for samples with 1 and 0 change points, respectively. The dissimilarity score for this sample pair can be calculated as:
Figure BDA0003392651770000115
negative pairs with lower dissimilarity scores are removed except for
Figure BDA0003392651770000116
Replacement of
Figure BDA0003392651770000117
In addition, the formula (8) is still adopted as the filtering method.
After a 10% reduction of the negative and positive samples with low confidence, the remaining pairs of samples train the twin neural network with the following auto-supervised loss:
Figure BDA0003392651770000118
wherein d iseIs a pair of input samples
Figure BDA0003392651770000119
And
Figure BDA00033926517700001110
feature vector pair of
Figure BDA00033926517700001111
And
Figure BDA00033926517700001112
is a distance therebetween, i.e.
Figure BDA00033926517700001113
y is the label assigned to this pair of samples, i.e. if
Figure BDA00033926517700001114
And
Figure BDA00033926517700001115
if the state labels are the same, y is 0, otherwise y is 1; m is an over-parameter with respect to the margin.
And (4) self-adaptive weighting. After the sample pairs are screened, the remaining positive and negative sample pairs have high confidence. However, different sample pairs may provide different clues to the learning data representation. In general, a sample without any activity data corresponds to a clue that contains fewer representations of learning data. Accordingly, the sample pairs containing activity data should provide more clues and play a more important role in model training. Fig. 4 shows an example of two sample pairs. In this figure, due to the positive sample pairs
Figure BDA00033926517700001116
Containing activity data, and negative sample pairs
Figure BDA00033926517700001117
There is no active data, so positive sample pairs are worth more attention when the model is trained.
Since the amplitude range of the sensor data is much larger when activity occurs than when there is no activity, the sample pairs with larger variation range may contain activity data and should be emphasized in the model training. We use the amplitude variance of the sample pairs to estimate the fluctuation amplitude. Thus, the amplitude of the fluctuation of the sample pair can be described by the following formula:
Figure BDA0003392651770000121
wherein
Figure BDA0003392651770000122
And
Figure BDA0003392651770000123
are respectively a sample pair
Figure BDA0003392651770000124
And
Figure BDA0003392651770000125
the variance of the amplitude of (c). Then V is putpairAs a weight, the importance of the model is adjusted during training. Considering this weight, the auto-supervised loss in equation (11) becomes:
Figure BDA0003392651770000126
finally, after combining the classification penalty in equation (3), the consistency penalty in equation (5), and the weighted unsupervised penalty in equation (13), the final penalty function is illustrated as follows: :
Figure BDA0003392651770000127
in this loss, the consistency loss is based on the labeled data of the target sensor, while the self-supervised contrast loss is based on the unlabeled data of the target sensor. Therefore, models trained with these losses can effectively capture the features of the target sensor data and apply to the target sensor. The model adopts an Adam algorithm with default hyper-parameters as an optimization method.
D Activity segmentation
After obtaining the trained state inference model, we predict the state label for a given motion sequence from the target sensor, i.e. we compare the distance of the target sample vector to the sample vector of each class, and the target sample is labeled as that class if the distance of the target sample to the samples of that class is the smallest. The activity is then segmented according to the inferred state labels. Specifically, the start point and the end point of the activity are detected based on the following manner. First, a continuous motion sequence (sensor data stream) is divided into overlapping windows using a sliding window, each window having a length w, with a sliding step size of 1. Second, the state label for each window is inferred using a state inference model. Finally, the start and end points of the activity are identified by observing changes in the mode of a set of window labels, according to the state labels of the windows. Here the mode is the most frequently occurring number in the set. In other words, if the mode is changed from 1 (stationary state) to 2 (start state), the corresponding window is regarded as the start of activity. If the mode changes from 4 (end state) to 1 (quiescent state), then this window is considered the end of the activity.
For a more intuitive representation, fig. 5 gives an example of how to detect an activity start point by observing mode changes, where the length m of the window label list used to calculate the modes is set to 10. In the figure, there are 18 data points, points from 1 to 13 being static data and points from 14 to 18 being active data. The data is first divided into 11 overlapping windows, each window having a length w of 8. Second, for each window, its state label is inferred using a trained state inference model. The state labels from w1 to w6 are static states, 1. The state label from w7 to w11 is the starting state, 2. Finally, each window is traversed to detect the start and end points of the activity. When checking w10, the mode of the state tag list from w1 to w10 is 1. When checking w11, the index of the current data point i is 18, the mode from w2 to w11 becomes 2, which means that the mode changes from 1 (quiescent state) to 2 (start state), indicating that there is an active start point here. When the frequency of occurrence of the values is the same, the mode will be set to the largest of the values. Here, the start point tstartSet to i-m/2+ 1. Thus, in this example, when i is 18 and m is 10, t isstartEqual to the actual starting point 14. After human activities are segmented, these data can be used for activity classification.
To verify the effect of the present invention, the following experiment was performed:
we used four data sets and evaluated the effectiveness of SFTSeg based on different sensor devices, users and environments. In addition, the contribution of individual components and the impact of training data size on performance were also investigated.
A. Experimental data and settings
Experimental data. We performed experiments on four behavioral recognition datasets from different types of sensors, such as WiFi devices, smartphones and RFID tags.
HandGesture: the data set includes twelve hand motion activities performed by two experimental subjects and captured by an inertial measurement unit. The activities comprise opening and closing a window, drinking water, watering flowers, shearing, splitting, stirring, reading books, tennis forehand, tennis backhand, catching a ball and the like. And the activity is continuous.
USC-HAD: the data set consisted of twelve human activities, each activity recorded in 14 categories using a 3-axis accelerometer and a 3-axis gyroscope, respectively. Each category of activity repeats five times, including walk forward, walk left, walk right, go upstairs, go downstairs, run forward, jump up, sit down, stand, sleep, elevator up, and elevator down. Since these data are discontinuous, the active set is manually randomly stitched for segmentation.
RFID: the experimental data set contained data from six people, each posing between a wall and an RFID antenna, with nine passive RFID tags placed on the wall. The RFID data is a non-continuous data set because it is concatenated with twelve poses for each of the six subjects, and the data is still manually stitched for the experiment to be conducted.
WiFiAction: the data set consisted of ten activities by five people using WiFi equipment and collected by the Channel State Information (CSI) collection Tool d.halferin, w.hu, a.sheth, and d.wetall, "Tool release: heating 802.11n channels with channel state information," sigcom company.com.rev., vol.41, No.1, pp.53-53,2011 ]. These activities are continuous, including 1500 samples-about 5 fine-grained activities (hand swing, hand lift, push, draw O, and draw X) and 5 coarse-grained activities (boxing, pick, run, squat, walk).
Since source and target data may be collected under different sensor devices, personnel and environments, in the following experiments, WiFiAction data was considered source data when evaluating the performance of other data sets. In evaluating the WiFiAction performance, the HandGesture data is treated as source data. Furthermore, by default we chose three labeled samples from each category in the target data for model training, i.e. the following experiments were performed in the context of three-sample learning. For data from the target sensors, 80% of unlabeled data was selected for model training, and the remaining 20% was used as the test set.
And evaluating the index. We proposed SFTSeg and baseline models were evaluated using two indices: (i) f1-score: f1-score is the average of Precision and Recall. A predicted segmentation point is considered to be a true positive when it lies within a specified time window of a real boundary and a false negative when it falls outside the time windows of all real boundaries. According to the sampling rate of the sensor, the specified time windows of the WiFiAction and HandGesture data sets are set to be 0.3 and 0.5 seconds respectively, and other data sets are set to be 2 seconds. (ii) RMSE: the Root Mean Square Error (RMSE) is calculated from the deviation of the true boundary time from the predicted boundary time. Here RMSE is normalized to between 0,1 in terms of the duration of the time series.
Details of the experiment. The learning rate of the model was set to 0.001 and the mini-batch size of the data was 60. The shrinkage γ of h (x) calculated in equation (4), the margin m in equation (11), and the window size w are set to 0.05, 1, and 120, respectively, according to experimental experience. The CNN architecture in twin networks is the same as our previous work [ C.Xiao, Y.Lei, Y.Ma, F.Zhou, and Z.Qin, "deep-based activity segmentation framework for activity recognition wifi," IEEE Internet of Things Journal, vol.8, No.7, pp.5669-5681,2021 ].
B. Baseline method
In order to demonstrate the effectiveness and superiority of SFTSeg, eight different techniques of segmentation methods were chosen as baseline methods, including threshold-based WiAG, Wi-Multi, CPD-based AR1seg, SEPseg, IGTS, time-shape-based FLOSS, esponso, and supervised method DeepSeg.
WiAG: a typical threshold-based gesture extraction segmentation method. The method identifies the beginning and end of a gesture by comparing the amplitude of the principal component in the data stream to a given threshold.
Wi-Multi: a novel activity segmentation algorithm under a multi-subject complex environment. The algorithm can eliminate potential false detections by calculating the maximum eigenvalues of the amplitude and calibration phase correlation matrices and can improve accuracy in noisy environments or/and scenarios with multiple objects.
AR1 seg: a change point detection method typical of the statistical field. This method uses a first-order autoregressive process to infer the point of change [ S.Chakar, E.Lebarbier, C.Lvy-lead, and S.Robin, "A robust approach for simulating change-points in the mean of an AR (1) process," Bernoulli, vol.23, No.2, pp.1408-1447,052017 ].
SEPseg: an inventive method for detecting time series data change points. This algorithm has been used to efficiently identify activity boundaries and identify human daily activities [ S.Aminikhanghahi, T.Wang, and D.J.Cook, "Real-time change point detection with application to smart home time series Data," IEEE Transactions on Knowledge and Data Engineering, vol.31, No.5, pp.1010-1023,2019 ].
IGTS: a segmentation method based on information gain. This approach estimates motion boundaries by using a dynamic programming approach to maximize the information gain of the components.
FLOSS: a shape-based segmentation method. The method segments the activity data based on the fact that: similarly shaped patterns should be associated with the same category and occur in close temporal proximity to each other.
ESPRESSO: an entropy and shape aware time series partitioning method. The method utilizes the entropy and the time shape characteristic of the time series to perform activity segmentation on the multidimensional time series.
DeepSeg: an activity segmentation method based on supervised learning. The framework employs CNN as a state inference model to predict state labels for discrete data, and then identifies active boundaries from the state labels.
C. Performance of activity segmentation
Table 1 shows the partitioning performance of the different methods on the four datasets, with the best effect highlighted in bold. By analyzing the process performance, we have the following observations.
First, our proposed SFTSeg consistently yields better performance over the four datasets than the baseline segmentation approach. Specifically, SFTSeg has improved 2.45%, 5.82%, 8.23% and 1.92% respectively in the F1-score better baseline method deep of the HandGesture, USC-HAD, RFID and WiFiAction data sets. The results show that SFTSeg can capture the features of the target data through consistency regularization and self-supervision loss we propose, and further perform accurate activity segmentation on the target data based on several marker samples.
Secondly, the supervised method DeepSeg does not show significant advantages compared to the unsupervised method, especially for RFID data. The main reason is that the DeepSeg is designed for a scenario where the source data and the target data have the same distribution. However, in the case of only a few labeled target samples, the competitiveness of the DeepSeg is reduced. This also explains why most work uses unsupervised means to segment activities in the absence of tagged data. SFTSeg can solve the problem of limited tagged target data and achieve better performance than these supervised and unsupervised approaches.
Third, for an unsupervised baseline approach, different datasets should employ different unsupervised approaches to obtain better segmentation results. For example, for RFID data IGTS is superior to other unsupervised methods, but for HandGesture data the segmentation results are significantly worse than espress. These results provide the basis for our description: unsupervised segmentation methods are often affected by environmental related problems. In contrast, our proposed SFTSeg can consistently achieve better performance across all datasets.
Table 1: segmentation performance comparison
Figure BDA0003392651770000161
D. Ablation experiment
Here we focus on studying the contribution of the basic components we design to SFTSeg, i.e. the consistency regularization loss, the self-supervision loss and the adaptive weighting. We investigated the effects of different components: (i) SFTSeg-Base is the basic siemese network model, optimizing classification loss by labeling source data given in equation (3). (ii) SFTSeg-Consis is a twin model of consistency regularization loss we have designed, as shown in equation (5). (iii) SFTSeg-Self is a twin network model with an unsupervised loss, but without the adaptive weights in equation (11). (iv) SFTSeg-Weight is a twin network model with self-supervision loss and adaptive weights, as shown in equation (13). (v) SFTSeg-Full is a model we propose to contain all components. The results of the segmentation of the four datasets are shown in table 2, with the best results highlighted in bold. The observations in this table are as follows: first, SFTSeg-Full achieved the best performance. At the same time, SFTSeg-Base performs the worst, which indicates that the functionality of the major components we design can greatly improve the segmentation performance. Second, SFTSeg-Consis achieved with better results than SFTSeg-Base when combined with consistency regularization. This is because our designed approach enhances the limited labeled samples from the target domain, which is beneficial for the model to improve the generalization capability of the target domain. Third, SFTSeg-Self and SFTSeg-Weight are superior to SFTSeg-Base to a greater extent. This result verifies that our primary motivation for designing SFTSeg, i.e. the self-supervised loss based on unlabeled target data, can enable models to capture features of the target domain and further improve segmentation performance.
Table 2: performance when considering different components
Figure BDA0003392651770000171
E. Role of target data size
SFTSeg attempts to mitigate large offsets between the source domain and the target domain using the target data. Therefore, we discuss here the effect of target data size on the segmentation performance. In particular, when the amount of unlabeled target data changes, we investigated the results for 1-shot, 3-shot, and 5-shot (n in n-shot refers to the amount of labeled data per action category).
FIG. 6 illustrates the results of F1-score when selecting different ratios of unlabeled target data, where (a) is the result for the HandGesture data for F1-score, (b) is the result for the USC-HAD data for F1-score, (c) is the result for the RFID data for F1-score, and (d) is the result for the WiFiAction data for F1-score. The results of RMSE are not shown because it has the same trend. Fig. 6 shows that the performance of the segmentation of SFTSeg is gradually improved with increasing amounts of unmarked data of the four data sets. This indicates that the unmarked data size plays an important role in segmentation performance, and the self-supervision task we designed can effectively use unmarked target data to enhance model performance. In addition, the 5-shot performance is obviously better than that of the 1-shot. The reason is that more target marking samples are not only beneficial to the consistency regularization in the training stage, but also beneficial to the distance calculation between the test sample and the marking target sample in the testing stage. Overall, the above results indicate that SFTSeg can efficiently utilize marked and unmarked target data to improve segmentation performance.
In summary, the present invention proposes an auto-supervised, sample-less action sequence segmentation framework SFTSeg to segment the activity on the action sequence data. The traditional action segmentation method is usually aimed at the same sensor, the invention can realize that the segmentation accuracy of target sensor data is enhanced by using source sensor data, and can realize good activity segmentation and recognition effects by using few target sensor mark samples. The twin neural network is adopted as a main frame of the few-sample learning, and the few-sample activity segmentation technology is realized. Aiming at three different input data, different loss functions are respectively designed to enhance the training effect: aiming at a marked sample of a source sensor, constructing a cross entropy loss function to force an input sample to be classified into a corresponding category; in order to enhance the generalization capability of the target sensor data, a consistency regularization method is introduced, a marked sample of a source sensor is reduced to be used as disturbance, the disturbance is injected into the marked sample of the target sensor to be used as enhanced data, and the generalization capability of a model is improved by utilizing an enhanced data training model; in order to solve the problem that the data distribution of a source domain and a target domain has larger difference, self-supervision learning is introduced, and a positive sample pair and a negative sample pair are constructed based on unlabeled samples of a target sensor to train a twin neural network, so that the twin neural network can capture the characteristics of target data, and the inference performance is improved.
The invention solves the problems of environment dependence of non-supervision methods (such as detection based on change points and threshold value) and subjectivity of designers in the activity segmentation task, and has good activity segmentation effect under different sensors in different scenes. The invention also solves the problem that the supervision method in the activity segmentation task needs a large amount of marked target data (high cost and limited by various conditions), and realizes that good activity segmentation effect can be achieved only by few marked target sensor samples.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (4)

1. An action sequence segmentation method aiming at behavior recognition and based on self-supervision few-sample learning is characterized by comprising the following steps:
step 1: constructing an automatic supervision small sample action sequence segmentation framework SFTSeg; the framework is based on a twin neural network, and takes a marked sample of a source sensor, a marked sample of a target sensor and an unmarked sample of the target sensor as input data; the marking sample of the source sensor and the marking sample of the target sensor correspond to four state labels which are respectively in a static state, a starting state, a motion state and an ending state; the samples refer to a sequence of actions derived from sensor data;
step 2: constructing a cross entropy loss function for the labeled sample of the source sensor to carry out twin neural network training;
and step 3: for the labeled sample of the target sensor, taking the labeled sample of the source sensor as disturbance, injecting the disturbance into the labeled sample of the target sensor as enhancement data, and constructing a consistency regularization loss function to perform twin neural network training;
and 4, step 4: constructing a positive sample pair and a negative sample pair based on the unlabeled sample of the target sensor, and constructing an auto-supervision loss function based on the positive sample pair and the negative sample pair to train the twin neural network so that the twin neural network can capture the characteristics of the unlabeled sample of the target sensor;
and 5: and (3) obtaining the trained SFTSeg through the steps 1-4, inputting the sample of the target sensor serving as the test sample into the trained SFTSeg, predicting the state label of the test sample by the trained SFTSeg, and then performing activity segmentation on the test sample according to the predicted state label.
2. The method for motion sequence segmentation based on self-supervised sample-less learning for behavior recognition according to claim 1, wherein the step 3 comprises:
the enhancement data is constructed according to the following rules:
A. the labeled samples of the compressed source sensor as perturbations have the same class as the labeled samples of the target sensor;
B. adding the compressed labeled sample of the source sensor to the labeled sample of the target sensor according to the warping path; the warp path is generated by a dynamic time warping algorithm.
3. The method for motion sequence segmentation based on self-supervised sample-less learning for behavior recognition according to claim 1, wherein the step 4 comprises:
dispersing the action sequence into an overlapped window with a fixed size of w by adopting a sliding window, wherein the sliding step length is l;
two windows are considered a positive sample pair if they meet the following constraints: the two windows are adjacent; the two windows contain the same number of change points, and the difference of the two windows does not contain any change points;
two windows are considered as a negative example pair if they meet the following constraints: the two windows are spaced apart by more than a given minimum distance in time; the two windows contain different numbers of change points; the change point is a time point when the action sequence behavior changes suddenly.
4. The method for motion sequence segmentation based on self-supervised sample-less learning for behavior recognition according to claim 3, wherein the step 4 further comprises:
for the positive sample pairs, firstly calculating the SEP score of a difference set of the positive sample pairs, and then filtering the positive sample pairs according to the SEP score;
for the negative sample pair, dividing each sample of the negative sample pair into h disjoint parts, and then calculating SEP scores of all two continuous parts to obtain the highest SEP score of each sample of the negative sample pair; then, calculating the dissimilarity score of the negative sample pair according to the highest SEP score of each sample of the negative sample pair; and eliminating negative sample pairs with lower dissimilarity scores.
CN202111471435.0A 2021-12-04 2021-12-04 Action sequence segmentation method aiming at behavior recognition and based on self-supervision less sample learning Active CN114118167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111471435.0A CN114118167B (en) 2021-12-04 2021-12-04 Action sequence segmentation method aiming at behavior recognition and based on self-supervision less sample learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111471435.0A CN114118167B (en) 2021-12-04 2021-12-04 Action sequence segmentation method aiming at behavior recognition and based on self-supervision less sample learning

Publications (2)

Publication Number Publication Date
CN114118167A true CN114118167A (en) 2022-03-01
CN114118167B CN114118167B (en) 2024-02-27

Family

ID=80366477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111471435.0A Active CN114118167B (en) 2021-12-04 2021-12-04 Action sequence segmentation method aiming at behavior recognition and based on self-supervision less sample learning

Country Status (1)

Country Link
CN (1) CN114118167B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116415152A (en) * 2023-04-21 2023-07-11 河南大学 Diffusion model-based self-supervision contrast learning method for human motion recognition

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222690A (en) * 2019-04-29 2019-09-10 浙江大学 A kind of unsupervised domain adaptation semantic segmentation method multiplying loss based on maximum two
CN110837836A (en) * 2019-11-05 2020-02-25 中国科学技术大学 Semi-supervised semantic segmentation method based on maximized confidence
US20200082221A1 (en) * 2018-09-06 2020-03-12 Nec Laboratories America, Inc. Domain adaptation for instance detection and segmentation
CN111914709A (en) * 2020-07-23 2020-11-10 河南大学 Action segmentation framework construction method based on deep learning and aiming at WiFi signal behavior recognition
WO2021057427A1 (en) * 2019-09-25 2021-04-01 西安交通大学 Pu learning based cross-regional enterprise tax evasion recognition method and system
CN112861758A (en) * 2021-02-24 2021-05-28 中国矿业大学(北京) Behavior identification method based on weak supervised learning video segmentation
US20210174093A1 (en) * 2019-12-06 2021-06-10 Baidu Usa Llc Video action segmentation by mixed temporal domain adaption
CN113408328A (en) * 2020-03-16 2021-09-17 哈尔滨工业大学(威海) Gesture segmentation and recognition algorithm based on millimeter wave radar

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200082221A1 (en) * 2018-09-06 2020-03-12 Nec Laboratories America, Inc. Domain adaptation for instance detection and segmentation
CN110222690A (en) * 2019-04-29 2019-09-10 浙江大学 A kind of unsupervised domain adaptation semantic segmentation method multiplying loss based on maximum two
WO2021057427A1 (en) * 2019-09-25 2021-04-01 西安交通大学 Pu learning based cross-regional enterprise tax evasion recognition method and system
CN110837836A (en) * 2019-11-05 2020-02-25 中国科学技术大学 Semi-supervised semantic segmentation method based on maximized confidence
US20210174093A1 (en) * 2019-12-06 2021-06-10 Baidu Usa Llc Video action segmentation by mixed temporal domain adaption
CN113408328A (en) * 2020-03-16 2021-09-17 哈尔滨工业大学(威海) Gesture segmentation and recognition algorithm based on millimeter wave radar
CN111914709A (en) * 2020-07-23 2020-11-10 河南大学 Action segmentation framework construction method based on deep learning and aiming at WiFi signal behavior recognition
CN112861758A (en) * 2021-02-24 2021-05-28 中国矿业大学(北京) Behavior identification method based on weak supervised learning video segmentation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
姚明海;黄展聪;: "基于主动学习的半监督领域自适应方法研究", 高技术通讯, no. 08, 15 August 2020 (2020-08-15) *
王博威;潘宗序;胡玉新;马闻;: "少量样本下基于孪生CNN的SAR目标识别", 雷达科学与技术, no. 06, 15 December 2019 (2019-12-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116415152A (en) * 2023-04-21 2023-07-11 河南大学 Diffusion model-based self-supervision contrast learning method for human motion recognition

Also Published As

Publication number Publication date
CN114118167B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
Luvizon et al. Human pose regression by combining indirect part detection and contextual information
Raileanu et al. Automatic data augmentation for generalization in deep reinforcement learning
Yang et al. Temporal dynamic appearance modeling for online multi-person tracking
Zhou et al. Improving video saliency detection via localized estimation and spatiotemporal refinement
Bai et al. Multi-hierarchical independent correlation filters for visual tracking
Minnen et al. Discovering characteristic actions from on-body sensor data
Ming et al. On the impact of spurious correlation for out-of-distribution detection
CN109859241B (en) Adaptive feature selection and time consistency robust correlation filtering visual tracking method
Jalal et al. Markerless sensors for physical health monitoring system using ECG and GMM feature extraction
CN108647654A (en) The gesture video image identification system and method for view-based access control model
CN114925850B (en) Deep reinforcement learning countermeasure defense method for disturbance rewards
Bargi et al. AdOn HDP-HMM: An adaptive online model for segmentation and classification of sequential data
CN112396001A (en) Rope skipping number statistical method based on human body posture estimation and TPA (tissue placement model) attention mechanism
CN114118167B (en) Action sequence segmentation method aiming at behavior recognition and based on self-supervision less sample learning
CN113705507B (en) Mixed reality open set human body gesture recognition method based on deep learning
CN108595014A (en) A kind of real-time dynamic hand gesture recognition system and method for view-based access control model
CN107341471A (en) A kind of Human bodys' response method based on Bilayer condition random field
Xia et al. A boundary consistency-aware multitask learning framework for joint activity segmentation and recognition with wearable sensors
Duan et al. A Multi-Task Deep Learning Approach for Sensor-based Human Activity Recognition and Segmentation
Zhang et al. Residual memory inference network for regression tracking with weighted gradient harmonized loss
Bezobrazov et al. Artificial intelligence for sport activitity recognition
CN117407772A (en) Method and system for classifying training multi-element time sequence data by supervising and comparing learning network model
Korban et al. Semantics-enhanced early action detection using dynamic dilated convolution
Uslu et al. A segmentation scheme for knowledge discovery in human activity spotting
Zhuang et al. Non-exhaustive learning using gaussian mixture generative adversarial networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant