CN114708609A - Domain-adaptive skeleton behavior identification method and system based on continuous learning - Google Patents

Domain-adaptive skeleton behavior identification method and system based on continuous learning Download PDF

Info

Publication number
CN114708609A
CN114708609A CN202111341029.2A CN202111341029A CN114708609A CN 114708609 A CN114708609 A CN 114708609A CN 202111341029 A CN202111341029 A CN 202111341029A CN 114708609 A CN114708609 A CN 114708609A
Authority
CN
China
Prior art keywords
domain
training
loss
samples
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111341029.2A
Other languages
Chinese (zh)
Other versions
CN114708609B (en
Inventor
闫秋艳
王重秋
王志晓
袁冠
郭震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN202111341029.2A priority Critical patent/CN114708609B/en
Publication of CN114708609A publication Critical patent/CN114708609A/en
Application granted granted Critical
Publication of CN114708609B publication Critical patent/CN114708609B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

A domain-adaptive skeleton behavior identification method and system based on continuous learning are disclosed, and the method comprises the following steps: acquiring newly added skeleton behavior data of a source domain and a target domain, extracting a skeleton behavior data sequence by adopting a sliding window method, and generating a newly added sample set; taking the newly added sample set as a training sample set; constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on a training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is greater than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing the next round of training; otherwise, the training is finished to obtain a skeleton behavior recognition model; and identifying the skeleton behavior to be identified of the target domain based on the skeleton behavior identification model to obtain an identification result.

Description

Domain-adaptive skeleton behavior identification method and system based on continuous learning
Technical Field
The invention relates to the technical field of computer vision behavior recognition, in particular to a domain adaptation skeleton behavior recognition method and system based on continuous learning.
Background
Human behavior recognition is an important branch of computer vision technology. The human behavior identification method comprises the steps of observing a moving target in input data, extracting the moving characteristics of the moving target, and carrying out classification decision on the moving target according to the acquired characteristics of different behaviors and motions. Compared with the traditional RGB image, the human skeleton behavior data has stronger robustness to the problems of illumination, shielding, interference and the like. In some special environments (such as a classroom), the same type of actions have great differences, such as left hand lifting or right hand lifting, which belong to the action types of hand lifting, but the action differences are great; meanwhile, the same type of action changes with the lapse of time, like left-side hand raising, and the angle of hand raising is different in the classroom progress, so that the action learning needs to be continuously performed. The existing skeleton behavior identification method has low identification accuracy on similar actions with larger differences.
Disclosure of Invention
In view of the foregoing analysis, embodiments of the present invention provide a domain-adaptive skeleton behavior recognition method and system based on continuous learning, so as to solve the problem that the existing skeleton behavior recognition method has low accuracy in recognizing similar actions with large differences.
In one aspect, an embodiment of the present invention provides a domain-adaptive skeleton behavior identification method based on continuous learning, including the following steps:
acquiring newly added skeleton behavior data of a source domain and a target domain, extracting a skeleton behavior data sequence by adopting a sliding window method, and generating a newly added sample set; taking the newly added sample set as a training sample set;
constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is greater than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing to perform the next round of training; otherwise, the training is finished to obtain a skeleton behavior recognition model;
and identifying the skeleton behavior to be identified of the target domain based on the skeleton behavior identification model to obtain an identification result.
The beneficial effects of the above technical scheme are: the obtained label knowledge in the source domain can be used for training the target domain by adopting the domain adaptive network model, so that the same type of action data can be accurately identified when the distribution is different. Meanwhile, the model can learn knowledge in a new sample and simultaneously does not forget the knowledge of an old sample by adopting continuous learning, continuous medical study is carried out on the actions of the same type with higher diversity, and the accurate identification of the actions of the same type with diversity is realized.
Based on the further improvement of the technical scheme, the method for extracting the skeleton behavior data sequence by adopting the sliding window method to generate a newly added sample set comprises the following steps:
representing each frame of the skeleton behavior data by adopting an action vector, and extracting D frames of skeleton behavior data by adopting a sliding window method to form an action matrix;
splitting the action matrix of each window into three action matrix components, and respectively calculating a covariance matrix of each action matrix component; and merging the three covariance matrixes by adopting convolution operation to generate one sample data, wherein the plurality of sample data form a newly added sample set.
The beneficial effects of the above technical scheme are: the framework behavior data sequence is extracted by adopting a sliding window method, so that time dimension information of the framework behavior data can be well expressed, a space structure of the framework behavior data can be well expressed by adopting a covariance matrix, and the time-space characteristics of the framework behavior data can be obtained from two dimensions of time and space, so that a data base is provided for accurate classification and identification in the follow-up process.
Further, training the attention mechanism-based domain adaptation network model based on the training sample set includes:
extracting local features through a local feature extraction network, performing domain discrimination on the local features by adopting a first domain discriminator, calculating local migration loss and weights of the local features according to domain discrimination results, and obtaining migratable features based on the local features and corresponding weights; the number of the first domain discriminators is determined according to the size of the local feature matrix;
inputting the migratable features into a multi-head attention feature extraction network, and extracting representative migratable features based on a multi-head attention mechanism;
adopting a second domain discriminator to carry out domain discrimination on the representative migratable features, and calculating the overall migration loss according to domain discrimination results;
classifying and judging the representative migratable features of the source domain samples by adopting a classifier, and calculating the source domain classification and judgment loss; the classifier is a fully connected network;
and calculating the total loss of the sample based on the local migration loss, the overall migration loss and the source domain classification discrimination loss, and training the domain adaptive network model based on the attention mechanism by taking the minimum total loss as a training target.
The beneficial effects of the above technical scheme are: the weight of the local feature is calculated according to the judgment result of the first domain discriminator, and the migratable feature is obtained according to the local feature and the weight thereof, so that the feature with mobility can be accurately extracted from the source domain and the target domain, and the migratable feature is distinguished by the multi-head attention feature extraction network, so that the representative migratable feature is extracted, namely the feature which is migratable in the source domain and the target domain and has higher distinguishing capability is obtained, and the identification accuracy of the domain-adapted network model on the skeleton behavior is further improved.
Further, the local migration loss is calculated using the following formula:
Figure BDA0003352316320000041
wherein ,
Figure BDA0003352316320000042
denotes the kth first domain discriminator, GlfRepresenting local feature extraction networks, diRepresenting a sample xiDomain tag of, LentropyRepresenting the cross entropy loss function, n being the number of training samples, DsRepresenting the source domain, DtRepresenting the target domain, and K is the number of first domain discriminators.
The beneficial effects of the above technical scheme are: the local migration loss is calculated according to the judgment result of the first domain discriminator, the maximum local migration loss is taken as a target, and therefore the model is trained to extract the migratable local features.
Further, calculating local migration loss and the weight of the local feature according to a domain discrimination result, and obtaining a migratable feature based on the local feature and the corresponding weight; the method comprises the following steps:
according to the formula
Figure BDA0003352316320000043
Calculating attention weights of local features, wherein
Figure BDA0003352316320000044
Represents the kth first domain discriminator pair sample xiK is 1,2, … K, where K is the number of first domain discriminators;
the local features are multiplied by the corresponding attention weights to obtain migratable features.
The beneficial effects of the above technical scheme are: the weight of the local features is calculated according to the discrimination result of the first domain discriminator, so that the features of the first domain discriminator, which cannot effectively discriminate data from a source domain or a target domain, are given larger weight, and the migratable features are extracted quickly and accurately.
Further, the overall migration loss is calculated using the following formula:
Figure BDA0003352316320000045
wherein ,D2Denotes a second domain discriminator, GdfRepresenting a multi-headed attention feature extraction network, giRepresents a sample xiA migratable feature of diRepresents a sample xiA domain tag of, LentropyRepresenting the cross entropy loss function, n being the number of training samples, DsRepresenting the source domain, DtRepresenting the target domain.
The beneficial effects of the above technical scheme are: and calculating the overall migration loss according to the judgment result of the second domain discriminator to aim at maximizing the overall migration loss, so that the model is trained to extract more representative migratable features.
Further, calculating the total loss of the sample based on the local migration loss, the global migration loss and the source domain classification discrimination loss, wherein the calculating the total loss of the sample by adopting the following formula comprises: l isθ=Ls-λLl-βLdWhere λ and β are hyperparameters, LlFor local migration loss, LdFor global migration loss, LsThe loss is discriminated for the source domain classification.
The beneficial effects of the above technical scheme are: the method has the advantages that more representative migratable features are extracted by training the model through minimizing source domain classification loss, maximizing local migration loss and overall migration loss, namely the features which have the same distinguishing capability in the source domain and the target domain and can accurately identify the types of the source domain and the target domain are obtained, so that the trained model has higher identification accuracy, and the action types can be accurately identified for skeleton behavior data of the source domain and the target domain which are distributed differently.
Further, the overall loss of the attention-based domain adaptation model is calculated using the following formula:
Figure BDA0003352316320000051
wherein the sigma is a hyper-parameter,
Figure BDA0003352316320000052
in order to add the loss of samples in the sample set,
Figure BDA0003352316320000053
the average loss of the samples extracted from the empirical learning library in the current task, M is the number of samples extracted from the empirical learning library, Lθ(xi) Is a sample xiIs lost.
The beneficial effects of the above technical scheme are: the overall loss is calculated jointly according to the newly added samples and the historical samples in the experience learning base, the knowledge learned by the model according to the new samples is considered, meanwhile, the overall loss is calculated according to the influence of the historical samples on the model, and when the overall loss is larger than a threshold value, the next round of training is carried out for continuous learning, so that the identification accuracy of the model is improved.
Further, extracting a part of samples from the experience learning library and samples in a newly added sample set of a next round of training to form a training sample set of a next round of training, including:
calculating the loss prior probability of each sample in the empirical learning library:
Figure BDA0003352316320000061
Figure BDA0003352316320000062
wherein ,PiDenotes the loss prior probability, L, of the ith sampleθ(xi) Is a sample xiIs a constant, i is 1,2, … nh;nhThe number of samples in the experience learning library;
and extracting M samples with the maximum loss prior probability from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training.
The beneficial effects of the above technical scheme are: the historical samples are extracted from the experience learning base according to the prior probability for continuous learning, and the new samples are learned and adjusted according to the large loss in the experience learning base, so that the model is continuously optimized, and the classification accuracy of the model is further improved.
On the other hand, the embodiment of the invention provides a domain-adaptive skeleton behavior recognition system based on continuous learning, which comprises the following modules:
the training set generation module is used for acquiring newly added skeleton behavior data of the source domain and the target domain, extracting a skeleton behavior data sequence by adopting a sliding window method and generating a newly added sample set; taking the newly added sample set as a training sample set;
the network model training module is used for constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is greater than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing to perform the next round of training; otherwise, ending the training to obtain a skeleton behavior recognition model;
and the skeleton behavior identification module is used for identifying the skeleton behavior to be identified of the target domain based on the skeleton behavior identification model to obtain an identification result.
In the invention, the technical schemes can be combined with each other to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
FIG. 1 is a flowchart of a domain-adaptive skeleton behavior recognition method based on continuous learning according to an embodiment of the present invention;
fig. 2 is a block diagram of a domain-adaptive skeleton behavior recognition system based on continuous learning according to an embodiment of the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
The embodiment of the invention discloses a domain-adaptive skeleton behavior identification method based on continuous learning, which comprises the following steps as shown in fig. 1:
s1, acquiring newly added skeleton behavior data of the source domain and the target domain, extracting a skeleton behavior data sequence by adopting a sliding window method, and generating a newly added sample set; and taking the newly added sample set as a training sample set.
S2, constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is larger than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing the next round of training; otherwise, the training is finished, and a skeleton behavior recognition model is obtained.
And S3, recognizing the skeleton behavior to be recognized of the target domain based on the skeleton behavior recognition model to obtain a recognition result.
The source domain framework behavior data are data with classification labels, the target domain framework behavior data are data without labels or with few labels, and label knowledge obtained in the source domain can be used for training the target domain by adopting a domain adaptive network model, so that the same type of action data can be accurately identified when the distribution is different. Meanwhile, the model can learn the knowledge in the new sample and does not forget the knowledge of the old sample by adopting continuous learning, so that the action can be accurately identified when the same action has a difference, such as an angle difference.
In implementation, the skeleton behavior data can be acquired by a Kinect depth camera. The Kinect depth camera can collect data of 25 skeleton nodes, and each node data comprises data of three coordinates of x, y and z. The Kinect depth camera collects data at a frame rate of 15fps, each frame of data comprises three-dimensional coordinate data of 25 skeleton nodes, and if the recording duration of an action is 10 seconds, the Kinect depth camera corresponds to skeleton behavior data of 150 frames.
In order to obtain a training sample, a sliding window method is adopted to extract a skeleton behavior data sequence and generate a newly added sample set, and the method specifically comprises the following steps:
and S11, representing the skeleton behavior data of each frame by adopting a motion vector, and extracting D frames of skeleton behavior data by adopting a sliding window method to form a motion matrix.
In order to represent the time dimension information of the skeleton behavior data, the skeleton behavior data is regarded as a time sequence, each frame of skeleton behavior data corresponds to an action vector at a time point, and the dimension of each action vector is the number of skeleton nodes multiplied by 3. And D frame skeleton behavior data are extracted by adopting a sliding window method to form an action matrix. If an action contains 150 frames, the corresponding skeletal behavior data dimension is a data size of 150 x 75.
S12, splitting the action matrix of each window into three action matrix components, and respectively calculating the covariance matrix of each action matrix component; and merging the three covariance matrixes by adopting convolution operation to generate one sample data, wherein the plurality of sample data form a newly added sample set.
Symmetric positive definite matrices (SPD matrices) are widely used in the field of computer vision, such as face recognition, medical image processing, etc. Based on the Riemann geometric correlation theory of the non-Euclidean space, the SPD matrix is proved to be capable of better expressing the data distribution information of the objects in the non-Euclidean space.
Recent studies have shown that second order statistics have a better expression than first order statistics. Covariance matrix is a more commonly used second-order statistical information expression. Meanwhile, the covariance matrix belongs to a symmetric positive definite matrix. Therefore, the SPD matrix formed by the covariance matrix is adopted to express the space structure of the skeleton behavior data, and the space expression effect is good.
The covariance matrix adopted by the invention is taken as a typical representation of the SPD matrix, but is not limited to a representation method of framework data of the covariance matrix, and the statistical information expression mode meeting the SPD matrix property can be adopted. In practice, D is more than or equal to 25 in order to make the covariance matrix satisfy the positive nature.
Specifically, the action matrix of each window is divided into three action matrix components, that is, three matrix components of an x axis, a y axis and a z axis, to form three component matrices, the dimension of each component matrix is D × D, D is 25, the covariance matrix of each component matrix is calculated respectively, to obtain three covariance matrices, and the size of each covariance matrix is D × D. The covariance matrices of the three axes are then combined into a d x d matrix, denoted x, by a convolution operation (convolution kernel size 1 x 3)iI.e. one sample data. The plurality of sample data constitutes a sample set. The source domain exemplars have corresponding action tags, and the target domain exemplars have no tags or only a very small number of tags.
After the sample set is generated, a domain adaptive network model based on an attention mechanism can be constructed. The constructed network model comprises a local feature extraction network, a first domain discriminator, a multi-head attention feature extraction network, a second domain discriminator and a classifier.
And the local feature extraction network is used for performing local feature extraction on the samples, and the ResNet50 network can be adopted by the local feature extraction network for example.
The first domain discriminator is used for discriminating the probability that the data comes from the source domain according to the input features, the first domain discriminator is a binary network, illustratively, the output of the source domain is marked as 0, the output of the target domain is marked as 1, the output result of the first domain discriminator is a probability value between [0 and 1] for representing the probability that the data comes from the source domain, the proximity of 0 indicates that the input data comes from the source domain, and the proximity of 1 indicates that the input data comes from the target domain.
The multi-head attention feature extraction network is a network constructed based on a multi-head attention mechanism and is used for extracting representative migratable features.
The second domain discriminator is for discriminating a probability that the data is from the source domain based on the representative migratable feature. The second domain discriminator is also a two-class network and has the same structure as the first domain discriminator.
The classifier is used for classifying and identifying the action type represented by the data according to the representative migratable feature. For example, a classifier may include two blocks, a fully-connected layer, and a softmax layer, where each Block includes a fully-connected layer, a normalization layer, and an activation function layer.
Specifically, training the attention mechanism-based domain adaptive network model based on the training sample set includes:
s21, extracting local features through a local feature extraction network, performing domain discrimination on the local features by adopting a first domain discriminator, calculating local migration loss and weights of the local features according to domain discrimination results, and obtaining migratable features based on the local features and the corresponding weights; the number of the first domain discriminators is determined according to the size of the local feature matrix.
Illustratively, ResNet50 is used as a local feature extraction network, and the size of the feature matrix output by the last convolutional layer of the network is 7 × 2048, where 2048 is the number of channels (convolutional kernel number), so that the size of the first domain discriminator is 7 × 7 — 49, and each first domain discriminator corresponds to one feature data in the feature matrix.
The input to each first domain discriminator is sample xiThe vector of 1 x 2048 of the corresponding position in the feature matrix is output as a sample xiProbability of belonging to the source domain. According to the judgmentAnd calculating the weight corresponding to the local characteristic and the local migration loss according to the output result of the device. Specifically, the local migration loss is calculated by using the following formula:
Figure BDA0003352316320000111
wherein ,
Figure BDA0003352316320000112
denotes the kth first domain discriminator, GlfRepresenting local feature extraction networks, diRepresents a sample xiDomain tag of, LentropyRepresenting the cross entropy loss function, n being the number of training samples, DsRepresenting the source domain, DtRepresenting the target domain, and K is the number of first domain discriminators. The domain label, i.e. the sample, belongs to the source domain or the target domain. In order to learn the target domain data using the source domain of the existing label, the extracted features are applied not only to the source domain but also to the target domain, i.e., the features are migratable, and therefore the extracted features should make the first domain discriminator indistinguishable from the source domain or the target domain, i.e., the output of the first domain discriminator should be as close to 0.5 as possible. L is a radical of an alcoholentropyThe higher the indication feature has similar distinguishing effect on the two domains (source domain and target domain), i.e. the better the mobility of the extracted local feature, the higher the local migration loss L is maximizedlShould be as large as possible to extract more mobile features.
To obtain the migratable features, a weight for each local feature is calculated based on the discrimination result of the first domain discriminator. In particular, according to the formula
Figure BDA0003352316320000121
Calculating attention weights of local features, wherein
Figure BDA0003352316320000122
Represents the kth first domain discriminator pair sample xiK is 1,2, … K, and K is the number of first domain discriminators. In order to obtain a result that the discriminator cannot effectively distinguish the source domain from the target domainIs used in conjunction with a computer program product, therefore,
Figure BDA0003352316320000123
should be as close to 0.5 as possible, corresponding to
Figure BDA0003352316320000124
The value of (A) is as large as possible.
Figure BDA0003352316320000125
The larger the local feature, the more mobile it is, and the higher its weight is given.
The local features are multiplied by the corresponding attention weights to obtain migratable features.
And S22, inputting the migratable features into a multi-head attention feature extraction network, and extracting representative migratable features based on a multi-head attention mechanism.
In order to further obtain a representative migratable feature with a higher distinguishability, the migratable feature obtained in step S21 is input into a multi-head attention feature extraction network, and the distinguishability of the migratable feature is scored through a multi-head attention mechanism, so as to obtain a representative migratable feature that can express the distinguishability. The multi-head attention feature extraction network is the prior art, and the specific structure can refer to the prior art.
And S23, performing domain discrimination on the representative migratable features by adopting a second domain discriminator, and calculating the overall migration loss according to the domain discrimination result.
And after the representative migratable features are obtained, the second domain discriminator is used for carrying out domain discrimination on the representative migratable features, and the representative migratable matrix is input into the second domain discriminator to obtain a discrimination result.
Specifically, the overall migration loss is calculated by the following formula:
Figure BDA0003352316320000126
wherein ,D2Denotes a second domain discriminator, GdfRepresenting a multi-headed attention feature extraction network, giRepresents a sample xiThe migratable nature of (a) a,direpresents a sample xiDomain tag of, LentropyRepresenting the cross entropy loss function, n being the number of training samples, DsRepresenting the source domain, DtRepresenting the target domain.
LentropyThe higher the illustrative feature has a similar distinguishing effect on both domains (source and target), i.e., the better the mobility of the selected representative feature, and therefore the overall migration loss LdShould be as large as possible to extract more migratory features.
S24, classifying and judging the representative migratable features of the source domain samples by adopting a classifier, and calculating the source domain classification and judgment loss; the classifier is a fully connected network.
The extracted representative migratable features are required to have the migratability from the source domain to the target domain and are also required to be applicable to the classification identification of the source domain and the target domain, and since the samples of the target domain have no classification label or only a very small number of samples have labels, the classification loss of the source domain needs to be calculated for the representative migratable features in the samples of the source domain.
Specifically, the following formula is adopted to calculate the source domain classification loss:
Figure BDA0003352316320000131
wherein ,GyRepresenting a network of classifiers, GdfRepresenting a multi-headed attention feature extraction network, giRepresents a sample xiMigrateable feature of, LentropyRepresenting the cross entropy loss function, nsFor the number of source domain training samples, xi∈DsRepresenting source domain samples. The source domain classification loss should be as small as possible to improve the classification accuracy.
And S25, calculating the total loss of the sample based on the local migration loss, the overall migration loss and the source domain classification discrimination loss, and training the domain adaptive network model based on the attention mechanism by taking the minimum total loss as a training target.
Training purpose for training domain adaptive network model based on attention mechanismThe overall loss is normalized to be minimal, wherein the overall loss for the sample is calculated using the following formula: l isθ=Ls-λLl-βLdWhere λ and β are hyperparameters, LlFor local migration loss, LdFor global migration loss, LsThe loss is discriminated for the source domain classification.
Due to the training objective to maximize local migration loss LlMaximizing the overall migration loss LdMinimize the loss of source domain classification, therefore, use Lθ=Ls-λLl-βLdThe total loss of training is calculated with the goal of minimum total loss.
And stopping the training of the current round when the training reaches the preset precision when the training reaches the preset times or is lost.
With the lapse of time, there may be some deviations in the same action of human behavior, for example, different angles of lifting hands, sitting posture from original straight to some skew, need to adopt continuous learning to train the model, thereby make the model to the same type of action that has great difference can all accurate discernment. And after the training of the current round is finished, judging whether the next learning is needed or not according to the overall loss precision of the current training sample.
Specifically, the overall loss of the attention-based domain adaptation model is calculated by using the following formula:
Figure BDA0003352316320000141
wherein the sigma is a hyper-parameter,
Figure BDA0003352316320000142
in order to add the loss of samples in the sample set,
Figure BDA0003352316320000143
the average loss of the samples extracted from the empirical learning library in the current task, M is the number of samples extracted from the empirical learning library, Lθ(xi) Is a sample xiIs lost.
If the current training is the first round of training, this is doneHistorical training data are not stored in the temporal experience learning base, and current training samples are all newly added sample sets, so that the overall loss is reduced
Figure BDA0003352316320000144
Due to xjE.mem has no samples, therefore
Figure BDA0003352316320000145
When the training is not the first round, the overall loss needs to be calculated according to the loss of the newly added samples in the current training sample set and the loss of the historical samples extracted from the experience learning base. Thereby ensuring that a priori knowledge of historical samples is still retained while learning new samples.
And if the overall loss is less than the threshold value, and the model precision reaches the requirement at the moment, the next round of training is not carried out any more, and the training is finished to obtain the skeleton behavior recognition model. The threshold requirement is set according to the accuracy requirement in implementation.
Otherwise, randomly extracting part of samples from the current training sample set and storing the part of samples into the experience learning base, when a next round of newly added samples comes, extracting part of samples from the experience learning base and samples in the newly added sample set of the next round of training to form a training sample set of the next round of training, continuing the next round of training until the overall loss is less than a threshold value, ending the training, and obtaining the skeleton behavior recognition model. Therefore, the model can continuously learn new knowledge from a new sample and can store most of the learned prior knowledge, and the accuracy of the model identification action is improved.
Specifically, the extracting of a part of samples from the experience learning library and the samples in the newly added sample set of the next round of training to form the training sample set of the next round of training includes:
calculating the loss prior probability of each sample in the empirical learning library:
Figure BDA0003352316320000151
Figure BDA0003352316320000152
wherein ,PiDenotes the loss prior probability, L, of the ith sampleθ(xi) Is a sample xiIs a constant, i is 1,2, … nh;nhThe number of samples in the library is learned empirically. The constant epsilon is adopted to ensure that the samples in the empirical learning library are all likely to be drawn.
And extracting M samples with the maximum loss prior probability from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training. The prior probability represents the classification loss of the samples, the greater the loss, the poorer the model learning effect, and then the samples are added into the next round of model training, so that the model learns new samples, and meanwhile, the model is adjusted according to the poor learning effect of the old samples, thereby continuously optimizing the model, and further ensuring the more accurate classification and identification.
After the trained framework behavior model is obtained, the framework behavior to be recognized of the target domain is recognized based on the framework behavior recognition model, and a recognition result is obtained. Specifically, after obtaining the to-be-recognized skeleton behavior data of the target domain, processing the to-be-recognized skeleton behavior data according to the method in the steps S11-S12 to obtain the to-be-recognized data with the same format as the sample data, inputting the to-be-recognized data into a trained skeleton behavior recognition model, and inputting the classification result of the classifier, namely the recognition result of the to-be-recognized data.
Through the domain adaptation network model based on continuous learning, the obtained label knowledge in the source domain is used for training the target domain, so that when the same type of action data is distributed differently or has differences (for example, the left hand lifting and the right hand lifting have different hand lifting angles), the identification can be accurately carried out, the accurate identification of the same type of action with higher diversity is realized, and the identification accuracy of the skeleton behavior is improved.
A specific embodiment of the present invention discloses a domain-adaptive skeleton behavior recognition system based on continuous learning, as shown in fig. 2, including the following modules:
the training set generation module is used for acquiring newly added skeleton behavior data of the source domain and the target domain, extracting a skeleton behavior data sequence by adopting a sliding window method and generating a newly added sample set; taking the newly added sample set as a training sample set;
the network model training module is used for constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is greater than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing to perform the next round of training; otherwise, the training is finished to obtain a skeleton behavior recognition model;
and the skeleton behavior identification module is used for identifying the skeleton behavior to be identified of the target domain based on the skeleton behavior identification model to obtain an identification result.
The method embodiment and the system embodiment are based on the same principle, and related parts can be referenced mutually, and the same technical effect can be achieved. For a specific implementation process, reference is made to the foregoing embodiments, which are not described herein again.
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (10)

1. A domain adaptation skeleton behavior identification method based on continuous learning is characterized by comprising the following steps:
acquiring newly added skeleton behavior data of a source domain and a target domain, extracting a skeleton behavior data sequence by adopting a sliding window method, and generating a newly added sample set; taking the newly added sample set as a training sample set;
constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, and calculating the overall loss of the domain adaptive network model based on the attention mechanism; if the overall loss is larger than the threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next training to form a training sample set of the next training, and continuing to perform the next training; otherwise, the training is finished to obtain a skeleton behavior recognition model;
and identifying the skeleton behavior to be identified of the target domain based on the skeleton behavior identification model to obtain an identification result.
2. The method for recognizing domain-adapted framework behaviors based on continuous learning according to claim 1, wherein the extracting of the framework behavior data sequence by using a sliding window method to generate a new sample set comprises:
expressing each frame of the skeleton behavior data by adopting an action vector, and extracting D frames of skeleton behavior data by adopting a sliding window method to form an action matrix;
splitting the action matrix of each window into three action matrix components, and respectively calculating a covariance matrix of each action matrix component; and merging the three covariance matrixes by adopting convolution operation to generate one sample data, wherein the plurality of sample data form a newly added sample set.
3. The method for identifying domain-adapted framework behaviors based on continuous learning according to claim 1, wherein training the domain-adapted network model based on the attention mechanism based on the training sample set comprises:
extracting local features through a local feature extraction network, performing domain discrimination on the local features by adopting a first domain discriminator, calculating local migration loss and weights of the local features according to domain discrimination results, and obtaining migratable features based on the local features and corresponding weights; the number of the first domain discriminators is determined according to the size of the local feature matrix;
inputting the migratable features into a multi-head attention feature extraction network, and extracting representative migratable features based on a multi-head attention mechanism;
adopting a second domain discriminator to carry out domain discrimination on the representative migratable features, and calculating the overall migration loss according to domain discrimination results;
classifying and judging the representative migratable features of the source domain samples by adopting a classifier, and calculating the source domain classification and judgment loss; the classifier is a fully connected network;
and calculating the total loss of the sample based on the local migration loss, the overall migration loss and the source domain classification discrimination loss, and training the domain adaptive network model based on the attention mechanism by taking the minimum total loss as a training target.
4. The domain-adaptive framework behavior recognition method based on continuous learning of claim 3, wherein the local migration loss is calculated by adopting the following formula:
Figure FDA0003352316310000021
wherein ,
Figure FDA0003352316310000022
denotes the kth first domain discriminator, GlfRepresenting local feature extraction networks, diRepresents a sample xiDomain tag of, LentropyRepresents a cross entropy loss function, n is the number of training samples, DsRepresenting the source domain, DtRepresenting the target domain, and K is the number of first domain discriminators.
5. The method for identifying domain-adapted framework behaviors based on continuous learning according to claim 3, wherein local migration loss and the weight of the local feature are calculated according to a domain discrimination result, and a migratable feature is obtained based on the local feature and the corresponding weight; the method comprises the following steps:
according to the formula
Figure FDA0003352316310000023
Calculating attention weights of local features, wherein
Figure FDA0003352316310000024
Figure FDA0003352316310000025
Represents the kth first domain discriminator on sample xiK is 1,2, … K, where K is the number of first domain discriminators;
the local features are multiplied by the corresponding attention weights to obtain migratable features.
6. The domain-adaptive framework behavior recognition method based on continuous learning of claim 3, wherein the overall migration loss is calculated by adopting the following formula:
Figure FDA0003352316310000031
wherein ,D2Denotes a second domain discriminator, GdfRepresenting a multi-headed attention feature extraction network, giRepresents a sample xiA migratable feature of diRepresents a sample xiDomain tag of, LentropyRepresents a cross entropy loss function, n is the number of training samples, DsRepresenting the source domain, DtRepresenting the target domain.
7. The method for identifying domain-adapted framework behaviors based on continuous learning according to claim 3, wherein the step of calculating the total loss of the sample based on the local migration loss, the overall migration loss and the source domain classification discrimination loss comprises the step of calculating the total loss of the sample by adopting the following formula: l is a radical of an alcoholθ=Ls-λLl-βLdWhere λ and β are hyperparameters, LlFor local migration loss, LdFor global migration loss, LsThe loss is discriminated for the source domain classification.
8. The method for identifying domain-adapted framework behaviors based on continuous learning according to claim 1, wherein the overall loss of the domain-adapted model based on the attention mechanism is calculated by adopting the following formula:
Figure FDA0003352316310000032
wherein the sigma is a hyper-parameter,
Figure FDA0003352316310000033
in order to add the loss of samples in the sample set,
Figure FDA0003352316310000034
the average loss of the samples extracted from the empirical learning library in the current task, M is the number of samples extracted from the empirical learning library, Lθ(xi) Is a sample xiIs lost.
9. The method for recognizing domain-adaptive framework behaviors based on continuous learning according to claim 1, wherein a training sample set of a next training round is formed by extracting partial samples from the experience learning library and samples in an additional sample set of the next training round, and the method comprises the following steps:
calculating the loss prior probability of each sample in the empirical learning library:
Figure FDA0003352316310000041
Figure FDA0003352316310000042
wherein ,PiDenotes the loss prior probability, L, of the ith sampleθ(xi) Is a sample xiOf epsilon is a constant,i=1,2,…nh;nhThe number of samples in the experience learning library;
and extracting M samples with the maximum loss prior probability from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training.
10. A domain-adaptive skeleton behavior recognition system based on continuous learning is characterized by comprising the following modules:
the training set generation module is used for acquiring newly added skeleton behavior data of the source domain and the target domain, extracting a skeleton behavior data sequence by adopting a sliding window method and generating a newly added sample set; taking the newly added sample set as a training sample set;
the network model training module is used for constructing a domain adaptive network model based on an attention mechanism, training the domain adaptive network model based on the attention mechanism based on the training sample set, calculating the overall loss of the domain adaptive network model based on the attention mechanism, if the overall loss is greater than a threshold value, randomly extracting a part of samples from the current training sample set and storing the part of samples into an experience learning library, extracting a part of samples from the experience learning library and samples in a newly added sample set of the next round of training to form a training sample set of the next round of training, and continuing to perform the next round of training; otherwise, the training is finished to obtain a skeleton behavior recognition model;
and the skeleton behavior recognition module is used for recognizing the skeleton behavior to be recognized of the target domain based on the skeleton behavior recognition model to obtain a recognition result.
CN202111341029.2A 2021-11-12 2021-11-12 Domain adaptive skeleton behavior recognition method and system based on continuous learning Active CN114708609B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111341029.2A CN114708609B (en) 2021-11-12 2021-11-12 Domain adaptive skeleton behavior recognition method and system based on continuous learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111341029.2A CN114708609B (en) 2021-11-12 2021-11-12 Domain adaptive skeleton behavior recognition method and system based on continuous learning

Publications (2)

Publication Number Publication Date
CN114708609A true CN114708609A (en) 2022-07-05
CN114708609B CN114708609B (en) 2023-08-18

Family

ID=82167268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111341029.2A Active CN114708609B (en) 2021-11-12 2021-11-12 Domain adaptive skeleton behavior recognition method and system based on continuous learning

Country Status (1)

Country Link
CN (1) CN114708609B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115455247A (en) * 2022-09-26 2022-12-09 中国矿业大学 Classroom collaborative learning role determination method
CN115859122A (en) * 2023-02-02 2023-03-28 中国电子科技集团公司第十五研究所 Data identification method, automatic continuous learning model, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160462A (en) * 2019-12-30 2020-05-15 浙江大学 Unsupervised personalized human activity recognition method based on multi-sensor data alignment
CN111191709A (en) * 2019-12-25 2020-05-22 清华大学 Continuous learning framework and continuous learning method of deep neural network
CN112489689A (en) * 2020-11-30 2021-03-12 东南大学 Cross-database voice emotion recognition method and device based on multi-scale difference confrontation
US20210192363A1 (en) * 2019-12-23 2021-06-24 Hrl Laboratories, Llc Systems and methods for unsupervised continual learning
CN113343804A (en) * 2021-05-26 2021-09-03 武汉大学 Integrated migration learning classification method and system for single-view fully-polarized SAR data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210192363A1 (en) * 2019-12-23 2021-06-24 Hrl Laboratories, Llc Systems and methods for unsupervised continual learning
CN111191709A (en) * 2019-12-25 2020-05-22 清华大学 Continuous learning framework and continuous learning method of deep neural network
CN111160462A (en) * 2019-12-30 2020-05-15 浙江大学 Unsupervised personalized human activity recognition method based on multi-sensor data alignment
CN112489689A (en) * 2020-11-30 2021-03-12 东南大学 Cross-database voice emotion recognition method and device based on multi-scale difference confrontation
CN113343804A (en) * 2021-05-26 2021-09-03 武汉大学 Integrated migration learning classification method and system for single-view fully-polarized SAR data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JUNTING ZHANG ET AL: ""Class-incremental Learning via Deep Model Consolidation"", 《ARXIV:1903.07864》, pages 1 - 10 *
XIMEI WANG ET AL: ""Transferable Attention for Domain Adaptation"", 《PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》, vol. 33, no. 1, pages 5345 - 5352 *
陈诚等: ""结合自注意力的对抗性领域适应图像分类方法"", 《计算机工程与科学》, vol. 42, no. 2, pages 259 - 265 *
高岩等: ""基于骨骼轨迹聚合模型的课堂交互群体发现"", 《计算机科学》, vol. 48, no. 8, pages 334 - 339 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115455247A (en) * 2022-09-26 2022-12-09 中国矿业大学 Classroom collaborative learning role determination method
CN115455247B (en) * 2022-09-26 2023-09-19 中国矿业大学 Classroom collaborative learning role judgment method
CN115859122A (en) * 2023-02-02 2023-03-28 中国电子科技集团公司第十五研究所 Data identification method, automatic continuous learning model, device and equipment
CN115859122B (en) * 2023-02-02 2023-06-02 中国电子科技集团公司第十五研究所 Data identification method, automatic continuous learning model, device and equipment

Also Published As

Publication number Publication date
CN114708609B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN111523621B (en) Image recognition method and device, computer equipment and storage medium
CN109948447B (en) Character network relation discovery and evolution presentation method based on video image recognition
Pagano et al. Adaptive ensembles for face recognition in changing video surveillance environments
CN114708609A (en) Domain-adaptive skeleton behavior identification method and system based on continuous learning
Bargi et al. AdOn HDP-HMM: An adaptive online model for segmentation and classification of sequential data
CN114360038B (en) Weak supervision RPA element identification method and system based on deep learning
Pouthier et al. Active speaker detection as a multi-objective optimization with uncertainty-based multimodal fusion
Zhang et al. Taylor expansion based classifier adaptation: Application to person detection
CN118015507A (en) Weak supervision video violence detection method based on time domain enhancement and contrast learning
CN115798055B (en) Violent behavior detection method based on cornersort tracking algorithm
CN117131436A (en) Radiation source individual identification method oriented to open environment
CN104050451A (en) Robust target tracking method based on multi-channel Haar-like characteristics
CN111339983A (en) Method for fine-tuning face recognition model
Nida et al. Deep temporal motion descriptor (DTMD) for human action recognition
Guo et al. Discriminative Prototype Learning for Few-Shot Object Detection in Remote Sensing Images
Yamada et al. Covariate shift adaptation for discriminative 3D pose estimation
Nayak et al. Exploiting spatio-temporal scene structure for wide-area activity analysis in unconstrained environments
Reddy P et al. Multimodal spatiotemporal feature map for dynamic gesture recognition from real time video sequences
Hasan et al. Incremental learning of human activity models from videos
Golchha et al. Quantum-enhanced support vector classifier for image classification
CN112613419A (en) Wisdom education is with study monitor system
Senthilselvi et al. An Adaptive, Dynamic and Semantic Approach for Understanding of Sign Language based on Convolution Neural Network.
Chen et al. A hierarchical floatboost and mlp classifier for mobile phone embedded eye location system
Puri et al. Breaking the Silence: Empowering the Deaf, Mute & Blind Community through Sign Language Live Captioning Using Deep Learning
Kibish A note about finding anomalies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant