CN114708609A

CN114708609A - Domain-adaptive skeleton behavior identification method and system based on continuous learning

Info

Publication number: CN114708609A
Application number: CN202111341029.2A
Authority: CN
Inventors: 闫秋艳; 王重秋; 王志晓; 袁冠; 郭震
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-07-05
Anticipated expiration: 2041-11-12
Also published as: CN114708609B

Abstract

A domain-adaptive skeleton behavior identification method and system based on continuous learning are disclosed, and the method comprises the following steps: acquiring newly added skeleton behavior data of a source domain and a target domain, extracting a skeleton behavior data sequence by adopting a sliding window method, and generating a newly added sample set; taking the newly added sample set as a training sample set; constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on a training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is greater than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing the next round of training; otherwise, the training is finished to obtain a skeleton behavior recognition model; and identifying the skeleton behavior to be identified of the target domain based on the skeleton behavior identification model to obtain an identification result.

Description

Domain-adaptive skeleton behavior identification method and system based on continuous learning

Technical Field

The invention relates to the technical field of computer vision behavior recognition, in particular to a domain adaptation skeleton behavior recognition method and system based on continuous learning.

Background

Human behavior recognition is an important branch of computer vision technology. The human behavior identification method comprises the steps of observing a moving target in input data, extracting the moving characteristics of the moving target, and carrying out classification decision on the moving target according to the acquired characteristics of different behaviors and motions. Compared with the traditional RGB image, the human skeleton behavior data has stronger robustness to the problems of illumination, shielding, interference and the like. In some special environments (such as a classroom), the same type of actions have great differences, such as left hand lifting or right hand lifting, which belong to the action types of hand lifting, but the action differences are great; meanwhile, the same type of action changes with the lapse of time, like left-side hand raising, and the angle of hand raising is different in the classroom progress, so that the action learning needs to be continuously performed. The existing skeleton behavior identification method has low identification accuracy on similar actions with larger differences.

Disclosure of Invention

In view of the foregoing analysis, embodiments of the present invention provide a domain-adaptive skeleton behavior recognition method and system based on continuous learning, so as to solve the problem that the existing skeleton behavior recognition method has low accuracy in recognizing similar actions with large differences.

In one aspect, an embodiment of the present invention provides a domain-adaptive skeleton behavior identification method based on continuous learning, including the following steps:

acquiring newly added skeleton behavior data of a source domain and a target domain, extracting a skeleton behavior data sequence by adopting a sliding window method, and generating a newly added sample set; taking the newly added sample set as a training sample set;

constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is greater than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing to perform the next round of training; otherwise, the training is finished to obtain a skeleton behavior recognition model;

and identifying the skeleton behavior to be identified of the target domain based on the skeleton behavior identification model to obtain an identification result.

The beneficial effects of the above technical scheme are: the obtained label knowledge in the source domain can be used for training the target domain by adopting the domain adaptive network model, so that the same type of action data can be accurately identified when the distribution is different. Meanwhile, the model can learn knowledge in a new sample and simultaneously does not forget the knowledge of an old sample by adopting continuous learning, continuous medical study is carried out on the actions of the same type with higher diversity, and the accurate identification of the actions of the same type with diversity is realized.

Based on the further improvement of the technical scheme, the method for extracting the skeleton behavior data sequence by adopting the sliding window method to generate a newly added sample set comprises the following steps:

representing each frame of the skeleton behavior data by adopting an action vector, and extracting D frames of skeleton behavior data by adopting a sliding window method to form an action matrix;

splitting the action matrix of each window into three action matrix components, and respectively calculating a covariance matrix of each action matrix component; and merging the three covariance matrixes by adopting convolution operation to generate one sample data, wherein the plurality of sample data form a newly added sample set.

The beneficial effects of the above technical scheme are: the framework behavior data sequence is extracted by adopting a sliding window method, so that time dimension information of the framework behavior data can be well expressed, a space structure of the framework behavior data can be well expressed by adopting a covariance matrix, and the time-space characteristics of the framework behavior data can be obtained from two dimensions of time and space, so that a data base is provided for accurate classification and identification in the follow-up process.

Further, training the attention mechanism-based domain adaptation network model based on the training sample set includes:

extracting local features through a local feature extraction network, performing domain discrimination on the local features by adopting a first domain discriminator, calculating local migration loss and weights of the local features according to domain discrimination results, and obtaining migratable features based on the local features and corresponding weights; the number of the first domain discriminators is determined according to the size of the local feature matrix;

inputting the migratable features into a multi-head attention feature extraction network, and extracting representative migratable features based on a multi-head attention mechanism;

adopting a second domain discriminator to carry out domain discrimination on the representative migratable features, and calculating the overall migration loss according to domain discrimination results;

classifying and judging the representative migratable features of the source domain samples by adopting a classifier, and calculating the source domain classification and judgment loss; the classifier is a fully connected network;

and calculating the total loss of the sample based on the local migration loss, the overall migration loss and the source domain classification discrimination loss, and training the domain adaptive network model based on the attention mechanism by taking the minimum total loss as a training target.

The beneficial effects of the above technical scheme are: the weight of the local feature is calculated according to the judgment result of the first domain discriminator, and the migratable feature is obtained according to the local feature and the weight thereof, so that the feature with mobility can be accurately extracted from the source domain and the target domain, and the migratable feature is distinguished by the multi-head attention feature extraction network, so that the representative migratable feature is extracted, namely the feature which is migratable in the source domain and the target domain and has higher distinguishing capability is obtained, and the identification accuracy of the domain-adapted network model on the skeleton behavior is further improved.

Further, the local migration loss is calculated using the following formula:

wherein ,

denotes the kth first domain discriminator, G_lfRepresenting local feature extraction networks, d_iRepresenting a sample x_iDomain tag of, L_entropyRepresenting the cross entropy loss function, n being the number of training samples, D_sRepresenting the source domain, D_tRepresenting the target domain, and K is the number of first domain discriminators.

The beneficial effects of the above technical scheme are: the local migration loss is calculated according to the judgment result of the first domain discriminator, the maximum local migration loss is taken as a target, and therefore the model is trained to extract the migratable local features.

Further, calculating local migration loss and the weight of the local feature according to a domain discrimination result, and obtaining a migratable feature based on the local feature and the corresponding weight; the method comprises the following steps:

according to the formula

Calculating attention weights of local features, wherein

Represents the kth first domain discriminator pair sample x_iK is 1,2, … K, where K is the number of first domain discriminators;

the local features are multiplied by the corresponding attention weights to obtain migratable features.

The beneficial effects of the above technical scheme are: the weight of the local features is calculated according to the discrimination result of the first domain discriminator, so that the features of the first domain discriminator, which cannot effectively discriminate data from a source domain or a target domain, are given larger weight, and the migratable features are extracted quickly and accurately.

Further, the overall migration loss is calculated using the following formula:

wherein ,D₂Denotes a second domain discriminator, G_dfRepresenting a multi-headed attention feature extraction network, g_iRepresents a sample x_iA migratable feature of d_iRepresents a sample x_iA domain tag of, L_entropyRepresenting the cross entropy loss function, n being the number of training samples, D_sRepresenting the source domain, D_tRepresenting the target domain.

The beneficial effects of the above technical scheme are: and calculating the overall migration loss according to the judgment result of the second domain discriminator to aim at maximizing the overall migration loss, so that the model is trained to extract more representative migratable features.

Further, calculating the total loss of the sample based on the local migration loss, the global migration loss and the source domain classification discrimination loss, wherein the calculating the total loss of the sample by adopting the following formula comprises: l is_θ＝L_s-λL_l-βL_dWhere λ and β are hyperparameters, L_lFor local migration loss, L_dFor global migration loss, L_sThe loss is discriminated for the source domain classification.

The beneficial effects of the above technical scheme are: the method has the advantages that more representative migratable features are extracted by training the model through minimizing source domain classification loss, maximizing local migration loss and overall migration loss, namely the features which have the same distinguishing capability in the source domain and the target domain and can accurately identify the types of the source domain and the target domain are obtained, so that the trained model has higher identification accuracy, and the action types can be accurately identified for skeleton behavior data of the source domain and the target domain which are distributed differently.

Further, the overall loss of the attention-based domain adaptation model is calculated using the following formula:

wherein the sigma is a hyper-parameter,

in order to add the loss of samples in the sample set,

the average loss of the samples extracted from the empirical learning library in the current task, M is the number of samples extracted from the empirical learning library, L_θ(x_i) Is a sample x_iIs lost.

The beneficial effects of the above technical scheme are: the overall loss is calculated jointly according to the newly added samples and the historical samples in the experience learning base, the knowledge learned by the model according to the new samples is considered, meanwhile, the overall loss is calculated according to the influence of the historical samples on the model, and when the overall loss is larger than a threshold value, the next round of training is carried out for continuous learning, so that the identification accuracy of the model is improved.

Further, extracting a part of samples from the experience learning library and samples in a newly added sample set of a next round of training to form a training sample set of a next round of training, including:

calculating the loss prior probability of each sample in the empirical learning library:

wherein ,P_iDenotes the loss prior probability, L, of the ith sample_θ(x_i) Is a sample x_iIs a constant, i is 1,2, … n_h；n_hThe number of samples in the experience learning library;

and extracting M samples with the maximum loss prior probability from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training.

The beneficial effects of the above technical scheme are: the historical samples are extracted from the experience learning base according to the prior probability for continuous learning, and the new samples are learned and adjusted according to the large loss in the experience learning base, so that the model is continuously optimized, and the classification accuracy of the model is further improved.

On the other hand, the embodiment of the invention provides a domain-adaptive skeleton behavior recognition system based on continuous learning, which comprises the following modules:

the training set generation module is used for acquiring newly added skeleton behavior data of the source domain and the target domain, extracting a skeleton behavior data sequence by adopting a sliding window method and generating a newly added sample set; taking the newly added sample set as a training sample set;

the network model training module is used for constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is greater than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing to perform the next round of training; otherwise, ending the training to obtain a skeleton behavior recognition model;

and the skeleton behavior identification module is used for identifying the skeleton behavior to be identified of the target domain based on the skeleton behavior identification model to obtain an identification result.

In the invention, the technical schemes can be combined with each other to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.

FIG. 1 is a flowchart of a domain-adaptive skeleton behavior recognition method based on continuous learning according to an embodiment of the present invention;

fig. 2 is a block diagram of a domain-adaptive skeleton behavior recognition system based on continuous learning according to an embodiment of the present invention.

Detailed Description

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.

The embodiment of the invention discloses a domain-adaptive skeleton behavior identification method based on continuous learning, which comprises the following steps as shown in fig. 1:

s1, acquiring newly added skeleton behavior data of the source domain and the target domain, extracting a skeleton behavior data sequence by adopting a sliding window method, and generating a newly added sample set; and taking the newly added sample set as a training sample set.

S2, constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is larger than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing the next round of training; otherwise, the training is finished, and a skeleton behavior recognition model is obtained.

And S3, recognizing the skeleton behavior to be recognized of the target domain based on the skeleton behavior recognition model to obtain a recognition result.

The source domain framework behavior data are data with classification labels, the target domain framework behavior data are data without labels or with few labels, and label knowledge obtained in the source domain can be used for training the target domain by adopting a domain adaptive network model, so that the same type of action data can be accurately identified when the distribution is different. Meanwhile, the model can learn the knowledge in the new sample and does not forget the knowledge of the old sample by adopting continuous learning, so that the action can be accurately identified when the same action has a difference, such as an angle difference.

In implementation, the skeleton behavior data can be acquired by a Kinect depth camera. The Kinect depth camera can collect data of 25 skeleton nodes, and each node data comprises data of three coordinates of x, y and z. The Kinect depth camera collects data at a frame rate of 15fps, each frame of data comprises three-dimensional coordinate data of 25 skeleton nodes, and if the recording duration of an action is 10 seconds, the Kinect depth camera corresponds to skeleton behavior data of 150 frames.

In order to obtain a training sample, a sliding window method is adopted to extract a skeleton behavior data sequence and generate a newly added sample set, and the method specifically comprises the following steps:

and S11, representing the skeleton behavior data of each frame by adopting a motion vector, and extracting D frames of skeleton behavior data by adopting a sliding window method to form a motion matrix.

In order to represent the time dimension information of the skeleton behavior data, the skeleton behavior data is regarded as a time sequence, each frame of skeleton behavior data corresponds to an action vector at a time point, and the dimension of each action vector is the number of skeleton nodes multiplied by 3. And D frame skeleton behavior data are extracted by adopting a sliding window method to form an action matrix. If an action contains 150 frames, the corresponding skeletal behavior data dimension is a data size of 150 x 75.

S12, splitting the action matrix of each window into three action matrix components, and respectively calculating the covariance matrix of each action matrix component; and merging the three covariance matrixes by adopting convolution operation to generate one sample data, wherein the plurality of sample data form a newly added sample set.

Symmetric positive definite matrices (SPD matrices) are widely used in the field of computer vision, such as face recognition, medical image processing, etc. Based on the Riemann geometric correlation theory of the non-Euclidean space, the SPD matrix is proved to be capable of better expressing the data distribution information of the objects in the non-Euclidean space.

Recent studies have shown that second order statistics have a better expression than first order statistics. Covariance matrix is a more commonly used second-order statistical information expression. Meanwhile, the covariance matrix belongs to a symmetric positive definite matrix. Therefore, the SPD matrix formed by the covariance matrix is adopted to express the space structure of the skeleton behavior data, and the space expression effect is good.

The covariance matrix adopted by the invention is taken as a typical representation of the SPD matrix, but is not limited to a representation method of framework data of the covariance matrix, and the statistical information expression mode meeting the SPD matrix property can be adopted. In practice, D is more than or equal to 25 in order to make the covariance matrix satisfy the positive nature.

Specifically, the action matrix of each window is divided into three action matrix components, that is, three matrix components of an x axis, a y axis and a z axis, to form three component matrices, the dimension of each component matrix is D × D, D is 25, the covariance matrix of each component matrix is calculated respectively, to obtain three covariance matrices, and the size of each covariance matrix is D × D. The covariance matrices of the three axes are then combined into a d x d matrix, denoted x, by a convolution operation (convolution kernel size 1 x 3)_iI.e. one sample data. The plurality of sample data constitutes a sample set. The source domain exemplars have corresponding action tags, and the target domain exemplars have no tags or only a very small number of tags.

After the sample set is generated, a domain adaptive network model based on an attention mechanism can be constructed. The constructed network model comprises a local feature extraction network, a first domain discriminator, a multi-head attention feature extraction network, a second domain discriminator and a classifier.

And the local feature extraction network is used for performing local feature extraction on the samples, and the ResNet50 network can be adopted by the local feature extraction network for example.

The first domain discriminator is used for discriminating the probability that the data comes from the source domain according to the input features, the first domain discriminator is a binary network, illustratively, the output of the source domain is marked as 0, the output of the target domain is marked as 1, the output result of the first domain discriminator is a probability value between [0 and 1] for representing the probability that the data comes from the source domain, the proximity of 0 indicates that the input data comes from the source domain, and the proximity of 1 indicates that the input data comes from the target domain.

The multi-head attention feature extraction network is a network constructed based on a multi-head attention mechanism and is used for extracting representative migratable features.

The second domain discriminator is for discriminating a probability that the data is from the source domain based on the representative migratable feature. The second domain discriminator is also a two-class network and has the same structure as the first domain discriminator.

The classifier is used for classifying and identifying the action type represented by the data according to the representative migratable feature. For example, a classifier may include two blocks, a fully-connected layer, and a softmax layer, where each Block includes a fully-connected layer, a normalization layer, and an activation function layer.

Specifically, training the attention mechanism-based domain adaptive network model based on the training sample set includes:

s21, extracting local features through a local feature extraction network, performing domain discrimination on the local features by adopting a first domain discriminator, calculating local migration loss and weights of the local features according to domain discrimination results, and obtaining migratable features based on the local features and the corresponding weights; the number of the first domain discriminators is determined according to the size of the local feature matrix.

Illustratively, ResNet50 is used as a local feature extraction network, and the size of the feature matrix output by the last convolutional layer of the network is 7 × 2048, where 2048 is the number of channels (convolutional kernel number), so that the size of the first domain discriminator is 7 × 7 — 49, and each first domain discriminator corresponds to one feature data in the feature matrix.

The input to each first domain discriminator is sample x_iThe vector of 1 x 2048 of the corresponding position in the feature matrix is output as a sample x_iProbability of belonging to the source domain. According to the judgmentAnd calculating the weight corresponding to the local characteristic and the local migration loss according to the output result of the device. Specifically, the local migration loss is calculated by using the following formula:

wherein ,

denotes the kth first domain discriminator, G_lfRepresenting local feature extraction networks, d_iRepresents a sample x_iDomain tag of, L_entropyRepresenting the cross entropy loss function, n being the number of training samples, D_sRepresenting the source domain, D_tRepresenting the target domain, and K is the number of first domain discriminators. The domain label, i.e. the sample, belongs to the source domain or the target domain. In order to learn the target domain data using the source domain of the existing label, the extracted features are applied not only to the source domain but also to the target domain, i.e., the features are migratable, and therefore the extracted features should make the first domain discriminator indistinguishable from the source domain or the target domain, i.e., the output of the first domain discriminator should be as close to 0.5 as possible. L is a radical of an alcohol_entropyThe higher the indication feature has similar distinguishing effect on the two domains (source domain and target domain), i.e. the better the mobility of the extracted local feature, the higher the local migration loss L is maximized_lShould be as large as possible to extract more mobile features.

To obtain the migratable features, a weight for each local feature is calculated based on the discrimination result of the first domain discriminator. In particular, according to the formula

Calculating attention weights of local features, wherein

Represents the kth first domain discriminator pair sample x_iK is 1,2, … K, and K is the number of first domain discriminators. In order to obtain a result that the discriminator cannot effectively distinguish the source domain from the target domainIs used in conjunction with a computer program product, therefore,

should be as close to 0.5 as possible, corresponding to

The value of (A) is as large as possible.

The larger the local feature, the more mobile it is, and the higher its weight is given.

And S22, inputting the migratable features into a multi-head attention feature extraction network, and extracting representative migratable features based on a multi-head attention mechanism.

In order to further obtain a representative migratable feature with a higher distinguishability, the migratable feature obtained in step S21 is input into a multi-head attention feature extraction network, and the distinguishability of the migratable feature is scored through a multi-head attention mechanism, so as to obtain a representative migratable feature that can express the distinguishability. The multi-head attention feature extraction network is the prior art, and the specific structure can refer to the prior art.

And S23, performing domain discrimination on the representative migratable features by adopting a second domain discriminator, and calculating the overall migration loss according to the domain discrimination result.

And after the representative migratable features are obtained, the second domain discriminator is used for carrying out domain discrimination on the representative migratable features, and the representative migratable matrix is input into the second domain discriminator to obtain a discrimination result.

Specifically, the overall migration loss is calculated by the following formula:

wherein ,D₂Denotes a second domain discriminator, G_dfRepresenting a multi-headed attention feature extraction network, g_iRepresents a sample x_iThe migratable nature of (a) a,d_irepresents a sample x_iDomain tag of, L_entropyRepresenting the cross entropy loss function, n being the number of training samples, D_sRepresenting the source domain, D_tRepresenting the target domain.

L_entropyThe higher the illustrative feature has a similar distinguishing effect on both domains (source and target), i.e., the better the mobility of the selected representative feature, and therefore the overall migration loss L_dShould be as large as possible to extract more migratory features.

S24, classifying and judging the representative migratable features of the source domain samples by adopting a classifier, and calculating the source domain classification and judgment loss; the classifier is a fully connected network.

The extracted representative migratable features are required to have the migratability from the source domain to the target domain and are also required to be applicable to the classification identification of the source domain and the target domain, and since the samples of the target domain have no classification label or only a very small number of samples have labels, the classification loss of the source domain needs to be calculated for the representative migratable features in the samples of the source domain.

Specifically, the following formula is adopted to calculate the source domain classification loss:

wherein ,G_yRepresenting a network of classifiers, G_dfRepresenting a multi-headed attention feature extraction network, g_iRepresents a sample x_iMigrateable feature of, L_entropyRepresenting the cross entropy loss function, n_sFor the number of source domain training samples, x_i∈D_sRepresenting source domain samples. The source domain classification loss should be as small as possible to improve the classification accuracy.

And S25, calculating the total loss of the sample based on the local migration loss, the overall migration loss and the source domain classification discrimination loss, and training the domain adaptive network model based on the attention mechanism by taking the minimum total loss as a training target.

Training purpose for training domain adaptive network model based on attention mechanismThe overall loss is normalized to be minimal, wherein the overall loss for the sample is calculated using the following formula: l is_θ＝L_s-λL_l-βL_dWhere λ and β are hyperparameters, L_lFor local migration loss, L_dFor global migration loss, L_sThe loss is discriminated for the source domain classification.

Due to the training objective to maximize local migration loss L_lMaximizing the overall migration loss L_dMinimize the loss of source domain classification, therefore, use L_θ＝L_s-λL_l-βL_dThe total loss of training is calculated with the goal of minimum total loss.

And stopping the training of the current round when the training reaches the preset precision when the training reaches the preset times or is lost.

With the lapse of time, there may be some deviations in the same action of human behavior, for example, different angles of lifting hands, sitting posture from original straight to some skew, need to adopt continuous learning to train the model, thereby make the model to the same type of action that has great difference can all accurate discernment. And after the training of the current round is finished, judging whether the next learning is needed or not according to the overall loss precision of the current training sample.

Specifically, the overall loss of the attention-based domain adaptation model is calculated by using the following formula:

wherein the sigma is a hyper-parameter,

in order to add the loss of samples in the sample set,

If the current training is the first round of training, this is doneHistorical training data are not stored in the temporal experience learning base, and current training samples are all newly added sample sets, so that the overall loss is reduced

Due to x_jE.mem has no samples, therefore

When the training is not the first round, the overall loss needs to be calculated according to the loss of the newly added samples in the current training sample set and the loss of the historical samples extracted from the experience learning base. Thereby ensuring that a priori knowledge of historical samples is still retained while learning new samples.

And if the overall loss is less than the threshold value, and the model precision reaches the requirement at the moment, the next round of training is not carried out any more, and the training is finished to obtain the skeleton behavior recognition model. The threshold requirement is set according to the accuracy requirement in implementation.

Otherwise, randomly extracting part of samples from the current training sample set and storing the part of samples into the experience learning base, when a next round of newly added samples comes, extracting part of samples from the experience learning base and samples in the newly added sample set of the next round of training to form a training sample set of the next round of training, continuing the next round of training until the overall loss is less than a threshold value, ending the training, and obtaining the skeleton behavior recognition model. Therefore, the model can continuously learn new knowledge from a new sample and can store most of the learned prior knowledge, and the accuracy of the model identification action is improved.

Specifically, the extracting of a part of samples from the experience learning library and the samples in the newly added sample set of the next round of training to form the training sample set of the next round of training includes:

wherein ,P_iDenotes the loss prior probability, L, of the ith sample_θ(x_i) Is a sample x_iIs a constant, i is 1,2, … n_h；n_hThe number of samples in the library is learned empirically. The constant epsilon is adopted to ensure that the samples in the empirical learning library are all likely to be drawn.

And extracting M samples with the maximum loss prior probability from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training. The prior probability represents the classification loss of the samples, the greater the loss, the poorer the model learning effect, and then the samples are added into the next round of model training, so that the model learns new samples, and meanwhile, the model is adjusted according to the poor learning effect of the old samples, thereby continuously optimizing the model, and further ensuring the more accurate classification and identification.

After the trained framework behavior model is obtained, the framework behavior to be recognized of the target domain is recognized based on the framework behavior recognition model, and a recognition result is obtained. Specifically, after obtaining the to-be-recognized skeleton behavior data of the target domain, processing the to-be-recognized skeleton behavior data according to the method in the steps S11-S12 to obtain the to-be-recognized data with the same format as the sample data, inputting the to-be-recognized data into a trained skeleton behavior recognition model, and inputting the classification result of the classifier, namely the recognition result of the to-be-recognized data.

Through the domain adaptation network model based on continuous learning, the obtained label knowledge in the source domain is used for training the target domain, so that when the same type of action data is distributed differently or has differences (for example, the left hand lifting and the right hand lifting have different hand lifting angles), the identification can be accurately carried out, the accurate identification of the same type of action with higher diversity is realized, and the identification accuracy of the skeleton behavior is improved.

A specific embodiment of the present invention discloses a domain-adaptive skeleton behavior recognition system based on continuous learning, as shown in fig. 2, including the following modules:

the network model training module is used for constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is greater than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing to perform the next round of training; otherwise, the training is finished to obtain a skeleton behavior recognition model;

The method embodiment and the system embodiment are based on the same principle, and related parts can be referenced mutually, and the same technical effect can be achieved. For a specific implementation process, reference is made to the foregoing embodiments, which are not described herein again.

Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A domain adaptation skeleton behavior identification method based on continuous learning is characterized by comprising the following steps:

constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, and calculating the overall loss of the domain adaptive network model based on the attention mechanism; if the overall loss is larger than the threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next training to form a training sample set of the next training, and continuing to perform the next training; otherwise, the training is finished to obtain a skeleton behavior recognition model;

2. The method for recognizing domain-adapted framework behaviors based on continuous learning according to claim 1, wherein the extracting of the framework behavior data sequence by using a sliding window method to generate a new sample set comprises:

expressing each frame of the skeleton behavior data by adopting an action vector, and extracting D frames of skeleton behavior data by adopting a sliding window method to form an action matrix;

3. The method for identifying domain-adapted framework behaviors based on continuous learning according to claim 1, wherein training the domain-adapted network model based on the attention mechanism based on the training sample set comprises:

4. The domain-adaptive framework behavior recognition method based on continuous learning of claim 3, wherein the local migration loss is calculated by adopting the following formula:

wherein ,

denotes the kth first domain discriminator, G_lfRepresenting local feature extraction networks, d_iRepresents a sample x_iDomain tag of, L_entropyRepresents a cross entropy loss function, n is the number of training samples, D_sRepresenting the source domain, D_tRepresenting the target domain, and K is the number of first domain discriminators.

5. The method for identifying domain-adapted framework behaviors based on continuous learning according to claim 3, wherein local migration loss and the weight of the local feature are calculated according to a domain discrimination result, and a migratable feature is obtained based on the local feature and the corresponding weight; the method comprises the following steps:

according to the formula

Calculating attention weights of local features, wherein

Represents the kth first domain discriminator on sample x_iK is 1,2, … K, where K is the number of first domain discriminators;

6. The domain-adaptive framework behavior recognition method based on continuous learning of claim 3, wherein the overall migration loss is calculated by adopting the following formula:

wherein ,D₂Denotes a second domain discriminator, G_dfRepresenting a multi-headed attention feature extraction network, g_iRepresents a sample x_iA migratable feature of d_iRepresents a sample x_iDomain tag of, L_entropyRepresents a cross entropy loss function, n is the number of training samples, D_sRepresenting the source domain, D_tRepresenting the target domain.

7. The method for identifying domain-adapted framework behaviors based on continuous learning according to claim 3, wherein the step of calculating the total loss of the sample based on the local migration loss, the overall migration loss and the source domain classification discrimination loss comprises the step of calculating the total loss of the sample by adopting the following formula: l is a radical of an alcohol_θ＝L_s-λL_l-βL_dWhere λ and β are hyperparameters, L_lFor local migration loss, L_dFor global migration loss, L_sThe loss is discriminated for the source domain classification.

8. The method for identifying domain-adapted framework behaviors based on continuous learning according to claim 1, wherein the overall loss of the domain-adapted model based on the attention mechanism is calculated by adopting the following formula:

wherein the sigma is a hyper-parameter,

in order to add the loss of samples in the sample set,

9. The method for recognizing domain-adaptive framework behaviors based on continuous learning according to claim 1, wherein a training sample set of a next training round is formed by extracting partial samples from the experience learning library and samples in an additional sample set of the next training round, and the method comprises the following steps:

wherein ,P_iDenotes the loss prior probability, L, of the ith sample_θ(x_i) Is a sample x_iOf epsilon is a constant，i＝1,2,…n_h；n_hThe number of samples in the experience learning library;

10. A domain-adaptive skeleton behavior recognition system based on continuous learning is characterized by comprising the following modules:

and the skeleton behavior recognition module is used for recognizing the skeleton behavior to be recognized of the target domain based on the skeleton behavior recognition model to obtain a recognition result.