CN117216545A

CN117216545A - Training method of object activity recognition model, object activity recognition method and device

Info

Publication number: CN117216545A
Application number: CN202310631137.6A
Authority: CN
Inventors: 黄雅雯; 李悦翔; 段皓然; 万凡; 龙洋; 郑冶枫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-12-12

Abstract

The application discloses a training method of an object activity recognition model, an object activity recognition method and an object activity recognition device, and relates to the field of machine learning. The method comprises the following steps: acquiring a first data set and a second data set for training an object activity recognition model; generating a pseudo tag corresponding to the second sample data by adopting a tag generation model; generating a candidate data set according to the first sample data and the second sample data; performing interpolation processing on two sample data selected randomly in the candidate data set and label information corresponding to the two sample data respectively to obtain mixed sample data and label information corresponding to the mixed sample data; and iteratively adjusting parameters of the object activity recognition model based on the mixed sample data and label information corresponding to the mixed sample data to obtain the trained object activity recognition model. The application improves the generalization capability of the object recognition model and the accuracy of the output result by improving the complexity and variability of the input data of the model.

Description

Training method of object activity recognition model, object activity recognition method and device

Technical Field

The present application relates to the field of machine learning, and in particular, to a training method for an object activity recognition model, and an object activity recognition method and apparatus.

Background

Wearable sensors are often used in many practical applications including, but not limited to, sleep monitoring, elderly care, and health assessment, among others.

Human activity recognition (Human Activity Recognition, HAR) based on wearable devices is one of the core research areas in pervasive computing, which plays an important role in human behavioral understanding, health monitoring, skill assessment, motor training, and the like. In the related art, a deep semi-supervised learning method is generally adopted, and a small amount of data with labels but mostly without labels is adopted to train a human activity recognition model, so that human activity data acquired by a wearable device can be recognized based on the trained human activity recognition model, and the human activity data is determined to correspond to human activity categories.

However, when the human activity recognition model trained by the method is used for recognizing human activity data, the human activity data is easy to recognize errors, and the recognition accuracy is low.

Disclosure of Invention

The embodiment of the application provides a training method of an object activity recognition model, an object activity recognition method and an object activity recognition device. The technical scheme is as follows:

According to an aspect of an embodiment of the present application, there is provided a training method of an object activity recognition model, the method including:

acquiring a first data set and a second data set for training the object activity recognition model, wherein the first data set comprises at least one first sample data and real labels respectively corresponding to the first sample data, the real labels are used for indicating object activity categories corresponding to the first sample data, the second data set comprises at least one second sample data which is not marked with the object activity categories, and each sample data comprises data acquired by at least one sensor at least one time point;

generating a pseudo tag corresponding to the second sample data by adopting a tag generation model, wherein the tag generation model is an artificial intelligent model for generating the pseudo tag for indicating the object activity category corresponding to the second sample data;

generating a candidate data set according to the first sample data, the real label corresponding to the first sample data, the second sample data and the pseudo label corresponding to the second sample data;

two sample data are selected at will from the candidate data set, interpolation processing is carried out on the two sample data and the label information corresponding to the two sample data respectively, and mixed sample data and the label information corresponding to the mixed sample data are obtained;

And iteratively adjusting parameters of the object activity recognition model based on the mixed sample data and the label information corresponding to the mixed sample data to obtain the trained object activity recognition model.

According to an aspect of an embodiment of the present application, there is provided an object activity recognition method, the method including:

acquiring activity data of a first object, the activity data comprising data acquired by at least one sensor at least one point in time;

acquiring characteristic information of the activity data through a characteristic extraction module of the object activity recognition model;

performing calibration processing on the characteristic information through a calibration module of the object activity recognition model to obtain calibrated characteristic information, wherein the calibration processing is used for enhancing the correlation between the characteristic information;

and obtaining the recognition result of the object activity category corresponding to the activity data according to the calibrated characteristic information through a classification module of the object activity recognition model.

According to an aspect of an embodiment of the present application, there is provided a training apparatus of an object activity recognition model, the apparatus including:

a data acquisition module, configured to acquire a first data set and a second data set for training the object activity recognition model, where the first data set includes at least one first sample data and a real tag corresponding to each of the first sample data, the real tag is used to indicate an object activity class corresponding to the first sample data, and the second data set includes at least one second sample data not labeled with the object activity class, and each sample data includes data acquired by at least one sensor at least one time point;

The pseudo tag generation module is used for generating a pseudo tag corresponding to the second sample data by adopting a tag generation model, wherein the tag generation model is an artificial intelligent model for generating the pseudo tag for indicating the object activity category corresponding to the second sample data;

the data set generation module is used for generating a candidate data set according to the first sample data, the real label corresponding to the first sample data, the second sample data and the pseudo label corresponding to the second sample data;

the interpolation processing module is used for arbitrarily selecting two sample data from the candidate data set, and carrying out interpolation processing on the two sample data and label information corresponding to the two sample data respectively to obtain mixed sample data and label information corresponding to the mixed sample data;

and the iteration adjustment module is used for carrying out iteration adjustment on the parameters of the object activity recognition model based on the mixed sample data and the label information corresponding to the mixed sample data to obtain the object activity recognition model after training.

According to an aspect of an embodiment of the present application, there is provided an object activity recognition apparatus including:

A data acquisition module for acquiring activity data of a first object, the activity data comprising data acquired by at least one sensor at least one point in time;

the feature extraction module is used for acquiring feature information of the activity data through the feature extraction module of the object activity recognition model;

the calibration module is used for carrying out calibration processing on the characteristic information through the calibration module of the object activity recognition model to obtain calibrated characteristic information, and the calibration processing is used for enhancing the correlation between the characteristic information;

and the classification module is used for obtaining the recognition result of the object activity category corresponding to the activity data according to the calibrated characteristic information through the classification module of the object activity recognition model.

According to an aspect of an embodiment of the present application, there is provided a computer device including a processor and a memory, in which a computer program is stored, the computer program being loaded and executed by the processor to implement the training method of the object activity recognition model described above.

According to an aspect of an embodiment of the present application, there is provided a computer-readable storage medium having stored therein a computer program loaded and executed by a processor to implement the training method of the object activity recognition model described above.

According to an aspect of an embodiment of the present application, there is provided a computer program product comprising a computer program loaded and executed by a processor to implement the training method of the object activity recognition model described above.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects:

the pseudo tag corresponding to the second sample data which is not marked with the object activity category is generated by adopting the tag generation model, and the mixed sample data and the tag information corresponding to the mixed sample data are obtained according to the first sample data with the real tag and the second sample data with the pseudo tag, so that the input data of the object activity identification model are richer, and the complexity and the variability of the input data are improved. Therefore, the object activity recognition model is trained, the object activity recognition model can be suitable for more complex activity data, the generalization capability of the object recognition model is further improved, and the accuracy of the output result of the object recognition model is improved.

Drawings

FIG. 1 is a schematic illustration of an implementation environment for an embodiment of the present application;

FIG. 2 is a flow chart of a training method for an object activity recognition model provided by an embodiment of the present application;

FIG. 3 is a flow chart of an application of an object activity recognition model provided by one embodiment of the present application;

FIG. 4 is a flow chart of an iterative training method for an object activity recognition model provided by one embodiment of the present application;

FIG. 5 is a flow chart of a training process for an object activity recognition model provided by one embodiment of the present application;

FIG. 6 is a flow chart of a training process and a use process of an object activity recognition model provided by one embodiment of the present application;

FIG. 7 is a flow chart of a method for object activity recognition provided by one embodiment of the present application;

FIG. 8 is a feature embedding diagram based on mHealth data set, provided by one embodiment of the application;

FIG. 9 is a graph comparing the identification results of the Opportunity dataset and mHealth+ dataset provided by one embodiment of the present application;

FIG. 10 is a block diagram of a training apparatus for an object activity recognition model provided by one embodiment of the present application;

FIG. 11 is a block diagram of an object activity recognition device provided by one embodiment of the present application;

FIG. 12 is a block diagram of a computer device according to one embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Before describing embodiments of the present application, in order to facilitate understanding of the present solution, the following explanation is made on terms appearing in the present solution:

1. human activity recognition: human activity recognition (Human Activity Recognition, HAR) mainly includes two study directions: vision-based HAR identification and sensor-based HAR.

For vision-based HAR recognition, data is largely recorded as video frames, skeletal modalities, and the like. The time and space features, such as double flow CNN (Convolutional Neural Network ), LSTM (Long Short-Term Memory network) based models and 3D-CNN (3 Dimensions-Convolutional Neural Network, three-dimensional convolutional neural network) based models, are automatically extracted by mapping the activity information into one-dimensional manual features to represent some points of interest that may vary significantly in time and space, for example, by deep learning models.

For sensor-based HAR recognition, it is possible to protect the privacy of individuals and to effectively save the amount of computation compared to vision-based HAR recognition. Early sensor-based HAR recognition methods recognize human activity by using a single accelerometer signal, with the great success of feature extraction and variability processing, deep learning has gradually become the mainstay of extracting appropriate features in human activity recognition systems, e.g., low-bias and high-variance strong LSTM can be combined into one robust learner with variance reduction.

2. Deep semi-supervised learning: the deep learning method is a deep learning method between supervised learning and unsupervised learning, and is characterized in that data with a small amount of labels and data with most labels are trained simultaneously. Deep semi-supervised learning can be used if the data has any of the following three main assumptions. First, if the data points are located in the same cluster in the feature space, the data points may be of the same class. Second, if the data points belong to the same class or cluster, the output of the data points from the depth model should be close. Third, the high-dimensional features of the data should lie approximately on the low-dimensional manifold and the classification boundaries should not span the high-density region. These three assumptions apply to most standard classification tasks, which should also apply to sensor-based HAR identification. Based on these assumptions, if a perturbation is applied to an unlabeled data point, the corresponding output should still be very close, which is formed as a consistent regularization that most semi-supervised methods rely on.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

The technical scheme of the application mainly relates to a machine learning technology in an artificial intelligence technology, and mainly relates to a training process of an object activity recognition model.

Referring to fig. 1, a schematic diagram of an implementation environment of an embodiment of the present application is shown. The solution implementation environment can implement a training system that becomes an object activity recognition model. The implementation environment of the scheme can comprise: model training apparatus 10 and model using apparatus 20.

Model training device 10 may be an electronic device such as a personal computer, tablet, server, intelligent robot, or some other electronic device with relatively high computing power. Model training apparatus 10 is used to train an object activity recognition model.

In the embodiment of the application, the object activity recognition model is a machine learning model which is obtained by training based on a training method of the object activity recognition model and is used for recognizing the activity data of any object to obtain the object activity category corresponding to the object activity data. The model training apparatus 10 may train the object activity recognition model in a machine learning manner, so as to enable the model training apparatus to have the capability of determining the object activity category corresponding to the object activity data according to the object activity data, and a specific model training method may refer to the following embodiments.

The object activity recognition model comprises a feature extraction module, a calibration module and a classification module. The feature extraction module is used for acquiring feature information of the input data; the calibration module is used for enhancing the correlation between the characteristic information of each input data, so that the calibrated characteristic information contains the association relation between the characteristic information of the input data, and the accuracy of model output is improved; the classification module is used for outputting the prediction information of the object activity category corresponding to the input data according to the calibrated characteristic information.

The trained object activity recognition model may be deployed for use in model use device 20. The model using device 20 may be a terminal device such as a mobile phone, a tablet computer, a PC (Personal Computer ), a smart tv, a multimedia playing device, a vehicle-mounted terminal, or a server. When it is necessary to determine the object activity class corresponding to the object activity data from the object activity data, the model using apparatus 20 may implement the above-described function by training the completed object activity recognition model.

The model training apparatus 10 and the model using apparatus 20 may be two independent apparatuses or the same apparatus. If model training apparatus 10 and model using apparatus 20 are the same apparatus, model training apparatus 10 may be deployed in model using apparatus 20.

In the embodiment of the present application, the execution subject of each step may be a computer device, where the computer device refers to an electronic device with data computing, processing, and storage capabilities. The computer device may be a terminal device such as a PC (Personal Computer ), tablet, smart phone, wearable device, smart robot, or a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing service. The computer device may be the model training device 10 of fig. 1 or the model using device 20.

The specific use procedure of the model using device 20 for identifying the model based on the activity of the object may refer to fig. 2, where the front end device a refers to a sensor device, the front end device B refers to a terminal device, and the back end device refers to a computer device. The sensor device is used for collecting all activity data of the detected object, and the collected sensor data are sent to the front-end device B. The sensor device may be a wearable sensor, for example. Wearable sensors include, but are not limited to, accelerometers, altimeters, heart rate sensors, optical sensors, gyroscopes, and the like. The sensor data is used to characterize the activity information of the detected object and may include movement data, body data, position data, etc. of the detected object.

In some embodiments, the front-end device B may have installed therein a target application that receives sensor data, and the back-end device may provide a background service for clients of the front-end device B that have installed the running target application. The back-end equipment is deployed with an object activity recognition model which completes training, and then obtains an object activity category corresponding to the sensor data according to the sensor data received by the front-end equipment B based on the object activity recognition model, and feeds back an output result of the object activity recognition model to the front-end equipment B.

Alternatively, the backend device may provide the backend service to a plurality of front-end devices B at the same time.

Referring to fig. 3, a flowchart of a training method of an object activity recognition model according to an embodiment of the present application is shown. The subject of execution of the steps of the method may be a computer device. The method may include at least one of the following steps 310-350:

step 310, obtaining a first data set and a second data set for training an object activity recognition model, wherein the first data set comprises at least one first sample data and real labels respectively corresponding to the first sample data, the real labels are used for indicating object activity categories corresponding to the first sample data, the second data set comprises at least one second sample data which is not marked with the object activity categories, and each sample data comprises data acquired by at least one sensor at least one time point.

The recognition object of the object activity recognition model refers to any object that can perform an activity, has activity data, and includes, but is not limited to, a human body, an animal, a robot, a moving object, and the like. Alternatively, the object activity recognition model may refer to a human activity recognition model.

Each sample data is used for representing the activity data of the object part corresponding to at least one sensor in at least one time point, and the sample data in the first data set and the second data set can be sample data aiming at the same object or sample data aiming at different objects. Wherein the first sample data corresponds to a real tag and the second sample data does not have a real tag. The real tag is used for indicating the activity category of the object corresponding to the first sample data, namely, the activity category corresponding to the activity data acquired by at least one sensor in at least one time point. The subject activity categories include, but are not limited to, standing, sitting, lying, kneeling, walking, jogging, running, jumping, and the like activity categories.

In some embodiments, acquisition data of at least one object is acquired, the acquisition data of the object including data acquired by at least one sensor of the object at a plurality of points in time; according to the length and the overlapping rate of the sliding window, intercepting a plurality of sample data from the acquired data of the object by adopting the sliding window; dividing the plurality of sample data into first sample data and second sample data according to a first ratio; acquiring real labels corresponding to the first sample data respectively to obtain a first data set; and obtaining a second data set according to each second sample data.

Alternatively, the sensors may be wearable sensors, and each sensor may collect activity data of one part of the subject, for example, if the subject is a human body, activity data of a neck, shoulder, arm, elbow, waist, and other body parts of the subject may be collected. The acquired data may be used to characterize the activity characteristics of at least one object, such that an object activity class corresponding to the acquired data may be derived from the acquired data.

The acquired data may be expressed as { X, y }, where X ε R ^H×T ，y∈{1，2，...，C} ^T H is the number of sensor channels (which can be understood as the number of sensors), T is the number of time samples (which can be understood as the number of time points), C is the total number of categories, H, T, C is a positive integer.

The overlapping rate is the proportion of the overlapping part of the acquired data with the last window when the sliding window moves each time. The step length of each window movement is calculated according to the length and the overlapping rate of the sliding window, for example, if the length of the sliding window is 10 and the overlapping rate is 50%, the step length of each sliding window sliding is 5 (i.e. 50% of the length of the sliding window). And intercepting the acquired data from the initial position of the acquired data of the time sequence by adopting a sliding window, moving the unit indicated by the step length each time until the end position of the acquired data of the time sequence, wherein the acquired data intercepted by the sliding window each time is sample data.

It is noted that the selection of the appropriate sliding window length and overlap ratio is critical. Too short a window length may not capture enough pattern information, while too long a window may result in too high a computational cost. The choice of overlap rate depends on the actual problem and the data characteristics, a higher overlap rate may increase the amount of data and training time of the model, but may improve the performance of the model. In practice, it is often necessary to try different combinations of parameters to find the best setting. The length and the overlapping rate of the sliding window can be set according to the practical training requirement, and the application is not limited to the practical training requirement.

The first proportion may be any proportion, for example, 1%,3%,10%, etc., and the specific numerical value of the first proportion may be set according to the actual training requirement, which is not limited by the present application. According to the first ratio, the plurality of sample data can be divided into a first sample data and a second sample data, i.e. the acquired data { X, y } can be divided into two parts, the first sample data { X } ^l ，y ^l Second sample data { X }, and ^u ，y ^u }. The real labels corresponding to the first sample data are obtained, and in general, the real labels corresponding to the first sample data are obtained in a manual labeling mode to obtain a first data set. And directly according to The second sample data results in a second data set.

The first data set may be represented asj is a positive integer, wherein ∈ ->Representing the first sample data, +.>Representing the real label corresponding to the first sample data, N ^l Representing the number of first sample data. The second data set may be denoted +.>k is a positive integer, wherein->Representing the second sample data, N ^u Representing the number of second sample data. In general, in order to improve accuracy of model training results, N is set ^u ＞＞N ^l I.e. the data amount of the second sample data is much larger than the data amount of the first sample data, i.e. the first ratio is typically a smaller value.

The method is used for intercepting the acquired data to obtain a plurality of sample data, so that the sample data can contain enough characteristic information as far as possible, and the accuracy of model training is improved. And the real label corresponding to the first sample data is obtained by marking the first sample data, and the second sample data is not marked, so that the generalization performance of the model is improved, and the problem of fitting caused by training by adopting all marked sample data is avoided.

In step 320, a pseudo tag corresponding to the second sample data is generated using a tag generation model, which is an artificial intelligence model that generates a pseudo tag for indicating an activity class of the object corresponding to the second sample data.

The pseudo tag is tag information which is obtained by the tag generation model according to the second sample data and is used for indicating the object activity category corresponding to the second sample data. Unlike the real tag, the real tag is a certain object activity class corresponding to the first sample data, and the pseudo tag is probability information of the object activity class corresponding to the second sample data, i.e. the pseudo tag is a vector formed by the probability information.

The input data of the label generating model is second sample data in the second data set, and the output data is a pseudo label corresponding to the second sample data. The second data set with pseudo tags may be represented as Wherein,representing the pseudo tag corresponding to the second sample data.

Step 330, generating a candidate data set according to the first sample data, the real label corresponding to the first sample data, the second sample data and the pseudo label corresponding to the second sample data.

According to the first sample dataThe real tag corresponding to the first sample data +.>Second sample data->And a pseudo tag corresponding to the second sample data +.>Candidate data sets may be generated. In some embodiments, the candidate data set includes first sample dataThe real label corresponding to the first sample data, the second sample data and the pseudo label corresponding to the second sample data.

And 340, arbitrarily selecting two sample data from the candidate data set, and performing interpolation processing on the two sample data and the label information corresponding to the two sample data respectively to obtain mixed sample data and the label information corresponding to the mixed sample data.

The two sample data may include two first sample data, two second sample data, and one first sample data and one second sample data.

Interpolation refers to the generation of an unknown data based on two existing data. And performing interpolation processing on the two sample data to obtain mixed sample data, and performing interpolation processing on label information corresponding to the two sample data respectively to obtain label information corresponding to the mixed sample data. Interpolation includes linear interpolation, nearest interpolation, higher order interpolation, etc.

And 350, iteratively adjusting parameters of the object activity recognition model based on the mixed sample data and label information corresponding to the mixed sample data to obtain the trained object activity recognition model.

The input data of the object activity recognition model is mixed sample data, and the output data is prediction information of the object activity category corresponding to the mixed sample data. And adjusting parameters of the object activity recognition model according to output data of the object activity recognition model and label information corresponding to the mixed sample data, and after iterating for a plurality of times, enabling the object activity recognition model to meet the prediction requirement, so as to obtain the trained object activity recognition model.

According to the technical scheme provided by the embodiment of the application, the pseudo tag corresponding to the second sample data which is not marked with the object activity category is generated by adopting the tag generation model, and the mixed sample data and the tag information corresponding to the mixed sample data are obtained according to the first sample data with the real tag and the second sample data with the pseudo tag, so that the input data of the object activity identification model are richer, and the complexity and the variability of the input data are improved. Therefore, the object activity recognition model is trained, the object activity recognition model can be suitable for more complex activity data, the generalization capability of the object recognition model is further improved, and the accuracy of the output result of the object recognition model is improved.

In some embodiments, step 320 includes at least one of the following sub-steps 321-323:

and 321, respectively processing the second sample data by adopting n data enhancement modes to obtain n processed second sample data, wherein n is an integer greater than 1.

Data enhancement refers to a method of increasing the amount of data by adding a small change to or newly creating synthetic data from existing data (i.e., second sample data). The data enhancement mode can be to make the generated data have different semanteme but similar to the original data by generating paraphrasing of the original data as enhancement data; noise can be added to the original data on the premise of ensuring that the result is effective so as to improve the robustness of the model; new data may also be created as enhanced data based on the distribution rules of the original data.

For example, the second sample data may be processed separately using two data enhancement modes, and then a data enhancement function may be usedTo represent two data enhancement modes, the processed second sample data obtained by using the two data enhancement functions can be represented as +.>And->

Note that the number of second sample data subjected to the data enhancement processing here may be the same as or different from the number of first sample data. For example, the number of processed second sample data is equal to the first sample numberThe number of data is the same. For a first sample data of size BWherein->Selecting second sample data +.>Wherein->The processed second sample data is +.>And->The application is not limited herein with respect to the size of M and B.

In step 322, a label generating model is used to generate n classification probabilities corresponding to the processed second sample data, where the classification probabilities refer to probabilities that the processed second sample data belongs to the activity categories of the objects.

Illustratively, a data enhancement function is employedThe classification probability corresponding to the processed second sample data may be expressed as +.>Wherein (1) >Representing data enhancement function->Processed second sample data, +.>And representing a label generation model, wherein the softmax function is used for normalizing the data in the label generation model, so as to obtain the classification probability corresponding to the processed second sample data. Similarly, a data enhancement function is used>The classification probability corresponding to the processed second sample data may be expressed as +.>

Step 323, determining the pseudo tag corresponding to the second sample data according to the classification probabilities corresponding to the n processed second sample data.

First, an average value of classification probabilities corresponding to the n processed second sample data is calculated.

Illustratively, the classification probability may be expressed as:

wherein,representing data enhancement function->And data enhancement function->And the average value of the classification probabilities respectively corresponding to the processed second sample data.

And secondly, sharpening the average value, and taking the obtained calculation result as a pseudo tag corresponding to the second sample data.

Illustratively, the process of sharpening the average value may be expressed as:

wherein v represents a temperature parameter,representation->Is a predictive probability distribution of->Representation->Is a L1 norm of (c). The temperature parameter v is used for adjusting the degree of confidence of the model on the prediction probability distribution, the prediction probability distribution is smoother due to the higher temperature parameter, probability values of various categories are closer, preference of the model on specific categories is reduced, and the sharpness of the prediction probability distribution is increased due to the lower temperature parameter, so that the probability is distributed to the most probable category more credibly by the model. The temperature parameter v may be a preset empirical value.

During the sharpening process, the temperature parameters are adjusted by dividing each probability value in the predicted probability distribution by the temperature parameter, which may result in a decrease or increase in the relative difference between the probability values, depending on the size of the temperature parameters, with higher temperature values making the probability values closer, and lower temperature values making the difference between the probability values more pronounced. By adjusting the temperature parameters, the confidence level of the model and the smoothness of the prediction probability distribution can be balanced in semi-supervised learning, so that the generalization performance of the model is improved.

The average value of the classification probability after sharpening can be used as a pseudo tag corresponding to the second sample data after processing, thus, the data enhancement functionThe processed second sample data and corresponding pseudo tag may be represented asData enhancement function->The processed second sample data and the corresponding pseudo tag may be expressed as +.>

And generating a candidate data set according to the first sample data, the real label corresponding to the first sample data, the processed second sample data and the pseudo label corresponding to the processed second sample data.

In some embodiments, the sample data in the candidate data set comprises: first sample data and processed second sample data; the label information corresponding to the first sample data is a real label corresponding to the first sample data; the label information corresponding to the processed second sample data is a pseudo label corresponding to the second sample data.

For example, the candidate data set may be represented asWherein,the candidate data set may be simply expressed as +.>

It should be noted that the real label corresponding to the first sample data in the candidate data set is label information after one-hot encoding.

And carrying out enhancement processing on the second sample data by adopting n data enhancement modes to obtain processed second sample data with more abundant data volume and data meaning, and improving the generalization capability of the object activity recognition model. And a pseudo tag corresponding to the second sample data is obtained by adopting a tag generation model, and the object activity category corresponding to the second sample data is moderately represented by the pseudo tag, so that mixed sample data can be generated based on the second sample data with the pseudo tag.

In some embodiments, step 340 includes at least one of the following sub-steps 341-342:

step 341, adding the product of one sample data of the two sample data and the first coefficient to the product of the other sample data and the second coefficient to obtain mixed sample data.

The first coefficient and the second coefficient have a value in the range of (0, 1), and optionally, the sum of the first coefficient and the second coefficient is 1. The two sample data being two data arbitrarily selected from the candidate data sets, i.e. from the above-mentioned data sets Arbitrarily selected two sample data. The two sample data may include two first sample data, may include two processed second sample data, and may include one first sample data and one processed second sample data.

For example, a linear interpolation method may be used to obtain mixed sample data, where the mixed sample data may be expressed as:

wherein lambda is a first coefficient, 1-lambda is a second coefficient, lambda has a value range of (0, 1),and->For any two sample data.

In step 342, the product of the label information corresponding to one sample data and the first coefficient is added to the product of the label information corresponding to the other sample data and the second coefficient, so as to obtain the label information corresponding to the mixed sample data.

For example, tag information corresponding to the mixed sample data may be expressed as:

wherein,and->And tag information corresponding to any two sample data.

Thus, a mixed sample data set can be obtainedThe mixed sample data in the mixed sample data set and the label information corresponding to the mixed sample data are used for training the object activity recognition model.

Any two sample data are mixed by adopting a linear interpolation method to obtain mixed sample data, so that the obtained mixed sample data has more possibility, the richness and the complexity of the mixed sample data are improved, and the generalization capability of the object recognition model and the accuracy of an output result of the object recognition model are improved.

Referring to fig. 4, a flowchart of an iterative training method of an object activity recognition model according to an embodiment of the present application is shown. The subject of execution of the steps of the method may be a computer device. The object activity recognition model includes: the device comprises a feature extraction module, a calibration module and a classification module. The method may include at least one of the following steps 410-450:

in step 410, feature information of the mixed sample data is obtained by the feature extraction module.

The feature extraction module is configured to obtain feature information of the mixed sample data, as shown in fig. 5, where the feature extraction module includes a convolution Layer (Convolution Layer), an excitation Layer (Rectified Linear Unit Layer, reLU Layer), and a temporary-back Layer (Dropout Layer), the convolution Layer is configured to perform feature extraction on input data, the excitation Layer is configured to perform nonlinear mapping on an output result of the convolution Layer, and the temporary-back Layer is configured to reduce an overfitting phenomenon of the model.

The characteristic information of the mixed sample data can be expressed asWherein F is _k ∈R ^c×t C represents a sensor dimension, wherein the sensor dimension comprises information of a plurality of sensors, t represents a time dimension, the time dimension comprises information of a time point, and c and t are positive integers.

And step 420, performing calibration processing on the characteristic information through a calibration module to obtain calibrated characteristic information, wherein the calibration processing is used for enhancing the correlation between the characteristic information.

Although the above step 340 may be performed to obtain mixed sample data to improve the generalization ability of the model, the mixed sample data obtained by mixing any two sample data may not obtain a correct recognition result. When a conflict occurs between the characteristic information of the mixed sample data and the characteristic information of the original sample data, an activity interference problem occurs in model training, for example, if sample data with one object activity category of "walking" is mixed with sample data with one untagged "running" category, the obtained mixed sample data is likely to be similar to the "jogging" category, but the characteristic information of the object activity category of the mixed sample data mainly corresponds to "walking" and "running", and the model cannot obtain the "jogging" object activity category based on the two characteristic information without correlation.

Therefore, the calibration module is required to calibrate the feature information to enhance the correlation between the feature information, so that the calibrated feature information contains the correlation information between the feature information.

Step 420 includes at least one sub-step of steps 421-423:

step 421, calculating, by the calibration module, a first covariance matrix and a second covariance matrix according to the feature information, where the first covariance matrix is a feature covariance matrix of the sensor dimension, and the second covariance matrix is a feature covariance matrix of the time dimension.

Illustratively, the first covariance matrix and the second covariance matrix may be expressed as:

wherein M is _c A characteristic covariance matrix (i.e., a first covariance matrix) representing the sensor dimensions, M _t A feature covariance matrix (i.e., a second covariance matrix) representing a time dimension. F (F) _k Is characteristic information of the mixed sample data, and F _k In the form of a matrix of which the number of elements is,is F _k Is a transposed matrix of (a). />I _c ∈R ^t×t ，1 _c ∈R ^t×t ；/>I _t ∈R ^c×c ，1 _t ∈R ^c×c Wherein 1 is _c All 1 matrix, 1, representing dimension t×t _t Representing an all 1 matrix of c×c dimensions. The covariance matrix M obtained above _c ∈R ^c×c ，M _t ∈R ^t×t Each row and column element in the write variance matrix contains a characteristic statistical dependence in the sensor dimension and the time dimension and also indicates a characteristic phase between each pair of sample data in the mixed sample dataAnd (5) relativity.

Step 422, calculating a hybrid calibration attention matrix based on the first covariance matrix and the second covariance matrix.

As shown in fig. 5, from the characteristic covariance matrix of the sensor dimension and the characteristic covariance matrix of the time dimension, a hybrid calibration attention matrix may be derived, which may be denoted as M _c ×M _t The mixed calibration attention matrix contains characteristic information of the sensor dimension and characteristic information of the time dimension.

Step 422 includes at least one sub-step of steps 4221-4224:

step 4221, performing group convolution on each row element or each column element in the first covariance matrix and the second covariance matrix to obtain a group convolution result corresponding to the first covariance matrix and a group convolution result corresponding to the second covariance matrix.

Separate convolution filters are adopted for each row element or each column element in the first covariance matrix and the second covariance matrix, and grouping convolution is carried out for each row element or each column element, so that M is obtained _c ∈R ^c×c Convolving to M _c ∈R ^c×1 In the form of a matrix, M _t ∈R ^t×t Convolving to M _t ∈R ^t×1 Is a matrix form of (c). Wherein for the group convolution of the first covariance matrix, the convolved filter size, the number of filters, and the number of groups are all set to c, and for the group convolution of the second covariance matrix, the convolved filter size, the number of filters, and the number of groups are all set to t.

Step 4222, performing standard convolution on the group convolution result corresponding to the first covariance matrix and the group convolution result corresponding to the second covariance matrix, so as to obtain a first convolution result and a second convolution result.

The convolution filter with the convolution kernel size of 1 multiplied by 1 is used for respectively carrying out standard convolution on the grouping convolution result corresponding to the first covariance matrix and the grouping convolution result corresponding to the second covariance matrix so as to strengthen the characteristic dimension of the data, and the obtained first convolution result can be expressed as M _c ∈R ^c×1 Second convolutionThe result can be expressed as M _t ∈R ^t×1 。

Step 4223, processing the first convolution result and the second convolution result by adopting a sigmoid activation function to obtain a first attention weight vector and a second attention weight vector.

And respectively processing the first convolution result and the second convolution result by adopting a sigmoid activation function, and respectively mapping the first convolution result and the second convolution result to between 0 and 1, so as to obtain a first attention weight vector corresponding to the first convolution result and a second attention weight vector corresponding to the second convolution result.

Step 4224, calculating a hybrid calibration attention matrix according to the first attention weight vector and the second attention weight vector.

For M _c Or M _t Remodelling, e.g. M can be carried out _t ∈R ^t×1 Remodelling to M _t ∈R ^1×t According to M _t ∈R ^1×t And M _c ∈R ^c×1 A mixed calibration attention matrix M can be obtained _c ×M _t And M is _c ×M _t ∈R ^c×t . Alternatively, M may also be _c ∈R ^c×1 Remodelling to M _c ∈R ^1×c According to M _t ∈R ^t×1 And M _c ∈R ^1×c A mixed calibration attention matrix M can be obtained _c ×M _t And M is _c ×M _t ∈R ^t×c 。

Step 423, multiplying the characteristic information by the hybrid calibration attention moment array to obtain the calibrated characteristic information.

Will characteristic information F _k And mix the calibration attention matrix M _c ×M _t Multiplication can obtain calibrated characteristic information, which can be expressed as Wherein (1)>Representing a multiplication operation.

By processing the first covariance matrix and the second covariance matrix, the correlation between the characteristic information is further enhanced, so that the finally obtained calibrated characteristic information can contain strong correlation information of the characteristic information, and the accuracy of an output result of the object activity recognition model can be improved.

And 430, obtaining the prediction information of the object activity category corresponding to the mixed sample data according to the calibrated characteristic information through the classification module.

The classification module is used for outputting the prediction information of the object activity category corresponding to the mixed sample data based on the calibrated characteristic information. The prediction information is used to indicate probability information that the mixed sample data belongs to each object activity class, and the prediction information can be presented in a vector form.

The specific recognition process of the object activity recognition model may be shown in fig. 5, where the feature information of the mixed sample data is obtained based on the feature extraction module, then the feature information is calibrated based on the calibration module, the relevance between the feature information is added to the mixed sample data, so as to obtain calibrated feature information, and finally the classification module obtains the prediction information of the object activity category corresponding to the mixed sample data.

Step 440, calculating to obtain the loss function value of the object activity recognition model according to the prediction information of the object activity class corresponding to the mixed sample data and the label information corresponding to the mixed sample data.

In some embodiments, for mixed sample data satisfying a first condition, a cross entropy loss function is adopted to calculate a first loss function value according to prediction information and label information of an object activity category corresponding to the mixed sample data; wherein the first condition includes that at least one sample data is a first sample data among two sample data that generate mixed sample data.

Exemplary, the sample data is mixedFrom two sample data->And->And (5) generating. If->Or alternativelyI.e. constitute mixed sample data +.>And->If at least one sample data is the first sample data, calculating a first loss function value corresponding to the mixed sample data meeting the first condition by adopting a cross entropy loss function, wherein the first loss function value can be expressed as:

L(y，p)＝-∑(y _m ×log p _m )

wherein y is _m ×log p _m Loss function value indicating that the mixed sample data is class m object activity class, y _m True probability information representing class m object activity category, y _m Can be obtained from the label information corresponding to the mixed sample data, p _m Predictive probability information, p, representing class m object activity categories _m May be obtained from the prediction information of the object activity class corresponding to the mixed sample data.

By adopting the cross entropy loss function, the approach degree of the model prediction probability distribution and the real probability distribution can be measured, and when the prediction result of the model is closer to the real result, the cross entropy loss value is smaller. Thus, cross entropy loss will be minimized during training to obtain more accurate model predictions.

In some embodiments, for the mixed sample data satisfying the second condition, calculating to obtain a second loss function value according to the prediction information and the label information of the object activity class corresponding to the mixed sample data by using a mean square error loss function; wherein the second condition includes that neither of the two sample data that generated the mixed sample data is the first sample data.

Illustratively, ifAnd->I.e. constitute mixed sample data +.>And->Neither of the first sample data nor the second sample data, a second loss function value corresponding to the mixed sample data satisfying the second condition is calculated by using a mean square error loss function, and the second loss function value may be expressed as:

wherein y is _n True probability information, y, representing that the mixed sample data belongs to the activity class of the n-th class object _n Can be obtained from tag information corresponding to the mixed sample data,predictive probability information indicating that the mixed sample data belongs to the class n object activity class,/for>May be obtained from the prediction information of the object activity class corresponding to the mixed sample data.

In some embodiments, the loss function value of the object activity recognition model is calculated from the first loss function value and the second loss function value.

Alternatively, the first loss function value and the second loss function value may be added to obtain a loss function value of the object activity recognition model.

Alternatively, the first loss function value and the second loss function value may be weighted and summed to obtain a loss function value of the object activity recognition model.

And 450, iteratively adjusting parameters of the object activity recognition model according to the loss function value to obtain the trained object activity recognition model.

And iteratively adjusting parameters of the object activity recognition model according to the loss function value to reduce the loss function value, so that the loss function value of the finally obtained object activity recognition model can reach or be close to a minimum value, and the accuracy of the output result of the trained object activity recognition model is improved.

According to the technical scheme provided by the embodiment of the application, the calibration module is added into the object recognition model, the mixed sample data is subjected to calibration processing, the correlation between the characteristic information of the mixed sample data is enhanced, the calibrated characteristic information can contain the strong correlation information of the characteristic information, the problem of active interference generated during model training is avoided, the accuracy of the model output result is influenced, the generalization capability of the object recognition model is improved, and the accuracy of the object recognition model output result is improved.

Referring to fig. 6, a flowchart of a training process and a use process of an object activity recognition model is shown, in which input data of the object activity recognition model includes pseudo tag data (processed second sample data) and tagged data (first sample data), wherein the pseudo tag data is a pseudo tag of non-tag data generated by a non-tag data passing through a tag generation model. When the hybrid data input object activity recognition model is trained, the relevance between the characteristic information of the input data is enhanced through the calibration module, so that a more accurate recognition result can be output. In the use process, the activity data of any object is input into the object activity recognition model, the feature information of the activity data is obtained through the feature extraction module, the correlation between the feature information is enhanced through the calibration module, and finally the recognition result of the object activity class is obtained through the classification module.

Referring to fig. 7, a flowchart of an object activity recognition method according to an embodiment of the application is shown. The subject of execution of the steps of the method may be a computer device. The method may include at least one of the following steps 710-740:

at step 710, activity data of a first object is acquired, the activity data comprising data acquired by at least one sensor at least one point in time.

The first object refers to any one object, and activity data of the first object is used for characterizing activity conditions of the first object at least one point in time.

In step 720, feature information of the activity data is obtained by the feature extraction module of the object activity recognition model.

And step 730, performing calibration processing on the feature information through a calibration module of the object activity recognition model to obtain the calibrated feature information, wherein the calibration processing is used for enhancing the correlation between the feature information.

In some embodiments, according to the characteristic information, a first covariance matrix and a second covariance matrix are obtained through calculation by the calibration module, wherein the first covariance matrix is a characteristic covariance matrix of a sensor dimension, and the second covariance matrix is a characteristic covariance matrix of a time dimension; calculating a mixed calibration attention matrix according to the first covariance matrix and the second covariance matrix; and multiplying the characteristic information by the mixed calibration attention moment array to obtain the calibrated characteristic information.

In some embodiments, performing group convolution on each row of elements or each column of elements in the first covariance matrix and the second covariance matrix to obtain a group convolution result corresponding to the first covariance matrix and a group convolution result corresponding to the second covariance matrix; respectively carrying out standard convolution on a grouping convolution result corresponding to the first covariance matrix and a grouping convolution result corresponding to the second covariance matrix to obtain a first convolution result and a second convolution result; adopting a sigmoid activation function to respectively process the first convolution result and the second convolution result to obtain a first attention weight vector and a second attention weight vector; and calculating a mixed calibration attention matrix according to the first attention weight vector and the second attention weight vector.

And 740, obtaining a recognition result of the object activity category corresponding to the activity data according to the calibrated characteristic information by a classification module of the object activity recognition model.

The descriptions of the respective modules of the object activity recognition model in the above steps 720 to 740 may refer to the above embodiments, and are not repeated here.

According to the technical scheme provided by the embodiment of the application, the calibration module in the object recognition model is used for carrying out calibration processing on the activity data of the object, so that the correlation between the characteristic information of the activity data is enhanced, the calibrated characteristic information can contain strong correlation information of the characteristic information, the problem of activity interference in the recognition process of the model is avoided, and the accuracy of the output result of the model is influenced.

The training method and the object activity recognition method for the object activity recognition model provided by the embodiment of the application are a model training process and a model using process which correspond to each other. For details not described in detail on one side, reference is made to the description on the other side.

The training-completed object activity recognition model obtained as described above may be based on five (4+1) reference datasets corresponding to diverse and typical applications in the field of wearable HAR, namely, a performance dataset, a pamp 2 dataset, a mHealth (Mobile Health) dataset and a DSADS (Daily and Sports Activities Data Set, daily exercise behavior) dataset. Furthermore, above the mHealth dataset, the mHealth+ dataset further contains unbalanced NULL-Class Activity (no NULL Class Activity). The mhealth+ data set accounts for more than 70% of the entire data set, and other settings of the mhealth+ data set are the same as the mHealth data set. The following evaluation will show the effect of no empty class activity on mHealth data sets and mhealth+ data sets. These five traditional deep semi-supervised techniques are applied and evaluated on object activity recognition and experiments are performed according to the original work. Most importantly, some of these methods may be sensitive to training hyper-parameters, which will be carefully adjusted and the values/settings listed in the following procedure in order to reproduce the results.

Illustratively, a sliding window of 1 second is set for the Opportunity dataset with an overlap ratio of 50%, a sliding window of 0.8 seconds is set for the DSADS dataset with an overlap ratio of 50%, while for other datasets a larger window size is applied, 168 time points and an overlap ratio of 78%.

The evaluation procedure used Leave-One-Subject-Out Cross Validation (LOSO-CV) as the evaluation strategy. Where the test data is the activity data of one subject set aside, the end result is the average (with standard deviation) after iterating through all subjects. By using the average F1 fraction (F _m ) As an evaluation index to measure the performance of different methods, the calculation formula is as follows:

wherein C is the total number of categories, TP _c Is the True number of cases (True Positive) of each category, FP _c Is the False Positive number (FN) of each category _c Is the number of False positive examples (False positive) for each category.

pi-Model is a training framework with two penalty terms. One of the losses is a standard cross entropy loss based on limited-label activity and the other is a consistent regularization between model prediction on label-free activity and model prediction on enhanced label-free activity. Consistency regularization in this approach can help the model better cope with variability in the same activity. The data enhancement technique used here is time series scaling, and the consistency regularization penalty is based on mean square error and has a weight of γ. Different enhancement techniques may affect model performance, while HAR-specific enhancement techniques may be explored in future work. Virtual countermeasure training (Virtual Antagonarial Training, VAT) is different from consistent regularization using random selection enhancements in pi-Model, which automatically approximates the disturbance direction of each unlabeled activity, which is the direction in which label probability is most sensitive in the input space. Intuitively, this countering approximation helps the model learn invariance features from activities of different variability. In the original work, there are three super parameters that may affect the model performance.

The iteration number of finding the countermeasure direction is set to 1, the disturbance size of the countermeasure direction is set to 10, the regularization coefficient of the control output countermeasure value is set to 0.8, and the consistency regularization term weight on the unmarked data is set to γ in the experiment of VAT. The entropy minimization method is a method of adding a loss term in the standard depth semi-supervised method, which encourages the depth model to make low entropy predictions. VAT was combined with the optimal solution using previous study results and named VAT. Entropy minimization losses are added directly to the VAT loss group without the need for weight control.

Instead of using the same model directly on both marked and unmarked data, mean-Teacher (MT, teacher mode) derives an additional model from the existing model by applying an exponential moving average technique for marking the data and then regularizing the output of both models for the same unmarked activity. The exponential moving average decay is set to 0.9 and the weight of the consistency regularization is γ.

Furthermore, pseudo-Labeling (PL) aims at generating Pseudo labels for unlabeled activities in training iterations using the model itself, so that generalization of the model can be improved by more diversified training data. This is the most similar work to the proposed method compared to the most advanced MixMatch work. The proposed MixHAR (w/o Cal) is a carefully adapted version from MixMatch for object activity recognition, so that no additional MixMatch is listed for comparison. At each epoch, the model is first trained using tagged activities, and then predicts pseudo tags that are not tagged with activities. Tagged activities are used with tagged and untagged activities with pseudo tags under standard supervised training schemes to optimize the model (using cross entropy loss). Model for use herein The architecture is the same as the model used by the supervised baseline and MixHAR. The weight gamma rises from zero as done by the deepest semi-supervised operation. They all used a learning rate of 10 ^-3 And Adam optimizer performs training for 150 periods.

The HAR model may use 3 active representation coding blocks and two fully connected layers as the active classification modules. Each encoded block consists of a 1D convolutional layer with a ReLU activation and dropout technique (loss rate of 0.3). The number of convolution filters is 32, 64, 96, and the convolution filter sizes are 5, and 3, respectively. The output of the first fully-connected layer consists of 64 units and is activated by the ReLU function; the output of the second fully connected layer, consisting of units reflecting the number of categories, is input to the softmax function to obtain the classification probability. With the standard small batch training approach, the batch size is set to 32, and in each iteration, batches of two active frames (of the same batch size) are extracted from the marked and unmarked data pools, respectively. Since the total amount of marked data is much smaller than the total amount of unmarked data, the marked batch can be cycled until all unmarked data is used once. The depth network was trained using Adam gradient descent optimizer with off-topic cross validation (LOSO-CV) for 150 periods. Pseudo tags may be generated using Re-scaling and TimeWarping enhancement techniques, λ may be set to 0.8 for data blending, and v may be set to 0.4 for sharpening operations. The above-described loss weights γ may be set for MixHAR and Deep-Semi-HAR baselines by a ramp training strategy.

The model in the supervised baseline is trained with only the labeled data, and compared to the supervised baseline, 5 conventional deep semi-supervised techniques facilitate model training in some labeled/unlabeled data sets, especially on data sets with repetitive activities and without null-type activities, such as DSADS data sets, mHealth data sets. However, the performance of these five methods is not always satisfactory, and the possible reasons for this observation may be related to the theoretical basis behind, with these methods mostly focusing on mitigating the changes in activity.

When the tag data is less, the primary goal may be to know what the current activity is, and the interactivity variability may be more critical to model training. For example, in a DSADS dataset, users are required to freely perform their different activities. Due to the style of the method, the methods have obvious effects on the DSADS data set. In the performance dataset, collecting data from only 4 users in a limited environment may not cause too many intra-activity problems, but the main challenge may be to distinguish between different non-repetitive activities. Although the performance of these conventional HARs is not always excellent, the results still indicate that they are viable in exploiting unlabeled activities, which may further improve the work in the future.

The proposed MixHAR achieves a significant increase in mean F1 score and decrease in standard deviation compared to the supervised baseline across all the different data sets with different tag/no-tag settings, indicating that the above approach has progressed in a deep semi-supervised approach using no-tag data. F of the Opportunity dataset _m F from 0.6% to 3.3% improvement for PAMAP2 dataset _m Improvement from 0.8% to 11.1% for F of mHealth dataset _m Improvement from 5.3% to 12.8% for F of mHealth+ dataset _m Improvement from 3.8% to 10.6% for F of DSADS dataset _m The improvement is from 3.9% to 9.7%. Where the improvement in the Opportunity dataset is not as large as the other datasets, a possible reason may be that the activity in the Opportunity is non-repeatable and extremely unbalanced.

Comparing all deep semi-supervised HAR baseline methods, mixHAR is still better than they, F _m There is a clear gap. Standard deviation indicated that the model exhibited large fluctuations across subjects. Since these five methods focus mainly on different consistency regularization of intra-activity variability, in some cases they achieve slightly better standard deviations than MixHAR.

In MixHAR, the main performance improvement comes from tagged/untagged data mix, which is reasonable because the goal of MixHAR is to correct active intrusion problems in feature space with untagged data, and MixHAR achieves 0.1% -5.7% performance improvement in different data sets, different settings, illustrating the importance of corrections to tagged and untapped data.

The effects of null classes in HAR scenes that are ubiquitous in the real world, most of the collected activities are independent of the HAR system, with only a few activities of interest. Null classes may affect recognition performance in two ways (e.g., on the Opportunity dataset and mhealth+ dataset). On the one hand, it is extremely unbalanced (more than 70% of the entire dataset), resulting in a bias in learning the deep HAR model. Furthermore, when the amount of tagged data is small, it is likely that most of the tagged data is empty activities, while the activity of interest is the real target, most likely not adequately represented/ignored during training, resulting in F _m The value is lower. On the other hand, null classes have patterns similar to the activity of interest, which may lead to ambiguity in the model training process.

FIG. 8 is a visual result of different Deep-Semi-HAR method (including the proposed MixHAR) feature embedding graphs learned based on mHealth dataset. A healthy deep HAR model should be able to handle inter/intra activity variability, as reflected in the feature map fig. 8, the same category of activity should be clustered as close together as possible, while different categories of activity should be explicitly separated. The activities in the mHealth dataset are repetitive (e.g., walking, running), as shown in fig. 8, more than half of the categories are easily distinguished by a supervised baseline. pi-Model, PL and MT methods each have advantages and disadvantages. By significantly reducing the inner movement distance, it can perform better in the recognition cycle. However, injection variability from unlabeled data may be too noisy to handle and may mislead representation learning. VAT and VAT achieve better performance than the supervised baseline by better handling intra-activity variability in walking, cycling, and climbing stairs. Most importantly, the robustness of the MixHAR-trained model on all different activities is significantly improved, which indicates that the MixHAR effectively utilizes unlabeled data and improves the generalization capability of the model.

The effect of the hybrid calibration is that the hybrid of labeled/unlabeled data injects a great deal of variability into the robust model training while retaining discriminative classification information. Because the tagged data is limited, mixing large amounts of untagged data can challenge the model's ability to process diverse information. Specifically, when an activity intrusion occurs, conflicts between the mixed activity and the original activity can inhibit the ability of the model to distinguish between different activities, which remain weak for different activities with similar characteristics, such as knee bending and back and forth jumping, running and jogging (such as MixHAR w/ocal in fig. 8). The above proposed hybrid calibration aims at learning the correlation of the hybrid samples in the feature space and enhancing the degree of differentiation of the model from the variability of the injection. By further mixing correction (MixHAR in fig. 8), it is clear that MixHAR can clearly distinguish all different activities (mutual activities), with decreasing intra-activity distances and increasing mutual activity distances.

Fig. 9 shows the benefit of using MixHAR for each class in case of actual data imbalance (typically over 70% empty classes). For the Opportunity dataset and mhealth+ dataset, the unlabeled data using MixHAR gets better generalization over most minority classes and maintains the performance of most classes. For the Opportunity dataset, while Mix-HAR improves the generalization ability of the model for most activities, the performance of the model still comes mainly from unbalanced empty activities. The reason may be that in the Opportunity dataset, most of the empty activities are walking or other simple transitional activities, while other activities of interest are relatively complex (i.e., non-repetitive). For the mHealth + dataset, relatively simple activities, such as rest, may still not be effectively performed with additional unlabeled data when the labeled data is limited. These activities have features that may be widely present in many other activities, which may be misidentified.

Comparing the MixHAR in fig. 9 with standard random over-sampling and under-sampling methods, as seen, over-sampling and under-sampling are feasible for improving a minority class, however, mixHAR is still superior to them in almost all classes of activity. In addition, the over/under sampling method reduces the performance of the model on the head class (null class) to a great extent, while MixHAR improves the performance on the head class to some extent. Experimental results show that the deep semi-supervised HAR method MixHAR can effectively treat unbalanced behaviors and can be used as a valuable direction for real world universal HAR research.

In recent years, the utilization of unlabeled data has received increasing attention in deep HAR research, and in addition to semi-supervised HAR, self-supervised learning has also received a rapidly growing attention. To assess the progress of MixHAR, mixHAR was also compared to the latest methods of using unlabeled data, including but not limited to semi-supervised HAR (i.e., HAR based on unlabeled data), self-supervised HAR. Recent 5 efforts have been developed on 5 wearable-based HAR datasets. An antagonistic Auto-Encoder HAR (AAE-HAR) is represented by an Encoder-decoder network learning activity, where two discriminators process style information. First, a Multi-task self-supervised Learning framework is proposed to pre-train an activity representation learner by differentiating various transformations applied to input signals and use the representation learner for downstream activity classification tasks (MTL-HARs). Secondly, a CNN encoder-decoder network with noise injection and cleaning is proposed to exploit unlabeled data. In addition, a Self-supervision method is proposed, a training activity representation learner is pre-trained, and the trained learner is used for a downstream activity classification task (Mask-Self-HAR). A Selfhar framework is also presented, which is based not only on existing self-supervision techniques, but also on semi-supervised training, exploiting unlabeled human activities. MixHAR is superior to all these methods in terms of mean F1 fraction and standard deviation.

In CL-HAR (Continual Learning HAR, continuous learning) networks, encoder-decoders based on noisy signals may lose information of the original signal and the reconstruction process tends to be biased. Similarly, MTL-HAR builds a self-supervising framework based on general data enhancements, which may be impacted by designing pre-training tasks that are independent of HAR. For Mask-Self-HAR, reconstructing the original signal helps the model learn different activity patterns that are relatively accurate, resulting in more satisfactory performance. In addition, self-HAR also achieved significant improvement. However, during the pre-training process, the weaker HAR specific information may still result in these self-supervision based methods being insufficiently robust and performance on relatively complex data. Furthermore, in zero-class activities, the activity of interest may be largely unrerepresented during the pre-training phase. Thus, when non-repetitive activity (report data set) or zero-class activity is used, the performance of these self-supervision based methods is still lower than the MixHAR method, which indicates that the MixHAR method is healthier and more reliable.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Referring to fig. 10, a block diagram of a training apparatus for an object activity recognition model according to an embodiment of the present application is shown. The device has the function of realizing the training method of the object activity recognition model, and the function can be realized by hardware or by executing corresponding software by the hardware. The apparatus may be the computer device described above or may be provided in a computer device. As shown in fig. 10, the apparatus 1000 may include: a data acquisition module 1010, a pseudo tag generation module 1020, a data set generation module 1030, an interpolation processing module 1040, and an iteration adjustment module 1050.

A data acquisition module 1010, configured to acquire a first data set and a second data set for training the object activity recognition model, where the first data set includes at least one first sample data and a real tag corresponding to each of the first sample data, the real tag being used to indicate an object activity class corresponding to the first sample data, and the second data set includes at least one second sample data not labeled with an object activity class, and each sample data includes data acquired by at least one sensor at least one point in time;

A pseudo tag generating module 1020, configured to generate a pseudo tag corresponding to the second sample data using a tag generating model, where the tag generating model is an artificial intelligence model that generates the pseudo tag for indicating an object activity class corresponding to the second sample data;

a data set generating module 1030, configured to generate a candidate data set according to the first sample data, the real tag corresponding to the first sample data, the second sample data, and the pseudo tag corresponding to the second sample data;

the interpolation processing module 1040 is configured to arbitrarily select two sample data from the candidate data set, perform interpolation processing on the two sample data and tag information corresponding to the two sample data respectively, and obtain mixed sample data and tag information corresponding to the mixed sample data;

the iteration adjustment module 1050 is configured to iteratively adjust parameters of the object activity recognition model based on the mixed sample data and tag information corresponding to the mixed sample data, so as to obtain a trained object activity recognition model.

In some embodiments, the pseudo tag generation module 1020 includes:

The data enhancement unit is used for respectively processing the second sample data by adopting n data enhancement modes to obtain n processed second sample data, wherein n is an integer greater than 1.

The probability generation unit is used for generating classification probabilities corresponding to the n processed second sample data respectively by adopting the label generation model, wherein the classification probabilities refer to probabilities that the processed second sample data belong to each object activity category.

And the pseudo tag determining unit is used for determining the pseudo tag corresponding to the second sample data according to the classification probabilities respectively corresponding to the n processed second sample data.

In some embodiments, the pseudo tag determining unit is configured to:

calculating the average value of the classification probabilities respectively corresponding to the n processed second sample data;

and carrying out sharpening processing on the average value, and taking the obtained calculation result as a pseudo tag corresponding to the second sample data.

In some embodiments, the sample data in the candidate dataset comprises:

the first sample data is characterized in that tag information corresponding to the first sample data is a real tag corresponding to the first sample data;

The processed second sample data, and the label information corresponding to the processed second sample data is a pseudo label corresponding to the second sample data.

In some embodiments, the interpolation processing module 1040 is configured to:

adding the product of one sample data and the first coefficient to the product of the other sample data and the second coefficient to obtain the mixed sample data;

and adding the product of the label information corresponding to one sample data in the two sample data and the first coefficient to the product of the label information corresponding to the other sample data and the second coefficient to obtain the label information corresponding to the mixed sample data.

In some embodiments, the object activity recognition model comprises: the device comprises a feature extraction module, a calibration module and a classification module; the iteration adjustment module 1050 includes:

and the feature extraction unit is used for acquiring the feature information of the mixed sample data through the feature extraction module.

The calibration unit is used for carrying out calibration processing on the characteristic information through the calibration module to obtain calibrated characteristic information, and the calibration processing is used for enhancing the correlation between the characteristic information;

The classification unit is used for obtaining the prediction information of the object activity category corresponding to the mixed sample data according to the calibrated characteristic information through the classification module;

the loss calculation unit is used for calculating a loss function value of the object activity recognition model according to the prediction information of the object activity category corresponding to the mixed sample data and the label information corresponding to the mixed sample data;

and the iteration adjustment unit is used for carrying out iteration adjustment on the parameters of the object activity recognition model according to the loss function value to obtain the trained object activity recognition model.

In some embodiments, the calibration unit comprises:

the first matrix calculating subunit is configured to calculate, according to the characteristic information by using the calibration module, a first covariance matrix and a second covariance matrix, where the first covariance matrix is a characteristic covariance matrix of a sensor dimension, and the second covariance matrix is a characteristic covariance matrix of a time dimension.

And the second matrix calculating subunit is used for calculating and obtaining a mixed calibration attention matrix according to the first covariance matrix and the second covariance matrix.

And the calibration subunit is used for multiplying the characteristic information by the mixed calibration attention moment array to obtain the calibrated characteristic information.

In some embodiments, the calibration subunit is configured to:

carrying out grouping convolution on each row of elements or each column of elements in the first covariance matrix and the second covariance matrix to obtain a grouping convolution result corresponding to the first covariance matrix and a grouping convolution result corresponding to the second covariance matrix;

respectively carrying out standard convolution on the grouping convolution result corresponding to the first covariance matrix and the grouping convolution result corresponding to the second covariance matrix to obtain a first convolution result and a second convolution result;

processing the first convolution result and the second convolution result respectively by adopting a sigmoid activation function to obtain a first attention weight vector and a second attention weight vector;

and calculating the mixed calibration attention matrix according to the first attention weight vector and the second attention weight vector.

In some embodiments, the loss calculation unit is configured to:

for mixed sample data meeting a first condition, calculating to obtain a first loss function value by adopting a cross entropy loss function according to the prediction information and the label information of the object activity category corresponding to the mixed sample data; wherein the first condition includes that at least one sample data is the first sample data among two sample data generating the mixed sample data;

For mixed sample data meeting a second condition, calculating to obtain a second loss function value by adopting a mean square error loss function according to the prediction information and the label information of the object activity category corresponding to the mixed sample data; wherein the second condition includes that neither sample data of the mixed sample data is the first sample data;

and calculating a loss function value of the object activity recognition model according to the first loss function value and the second loss function value.

In some embodiments, the data acquisition module 1010 is configured to:

acquiring acquisition data of at least one object, wherein the acquisition data of the object comprises data acquired by the at least one sensor on the object at a plurality of time points;

according to the length and the overlapping rate of the sliding window, intercepting a plurality of sample data from the acquired data of the object by adopting the sliding window;

dividing the plurality of sample data into the first sample data and the second sample data according to a first scale;

obtaining real labels corresponding to the first sample data respectively to obtain a first data set; and obtaining the second data set according to each second sample data.

Referring to fig. 11, a block diagram of an object activity recognition apparatus according to an embodiment of the present application is shown. The device has the function of realizing the object activity recognition method, and the function can be realized by hardware or by executing corresponding software by hardware. The apparatus may be the computer device described above or may be provided in a computer device. As shown in fig. 11, the apparatus 1100 may include: a data acquisition module 1110, a feature extraction module 1120, a calibration module 1130, and a classification module 1140.

A data acquisition module 1110 for acquiring activity data of the first object, the activity data comprising data acquired by the at least one sensor at the at least one point in time.

The feature extraction module 1120 is configured to obtain feature information of the activity data through a feature extraction module of the object activity recognition model.

And the calibration module 1130 is configured to perform a calibration process on the feature information through the calibration module of the object activity recognition model to obtain calibrated feature information, where the calibration process is used to enhance correlation between the feature information.

The classification module 1140 is configured to obtain, according to the calibrated feature information, a recognition result of the object activity category corresponding to the activity data by using the classification module of the object activity recognition model.

In some embodiments, the calibration module 1130 includes:

the first matrix calculation unit is used for calculating to obtain a first covariance matrix and a second covariance matrix according to the characteristic information through the calibration module, wherein the first covariance matrix is a characteristic covariance matrix of a sensor dimension, and the second covariance matrix is a characteristic covariance matrix of a time dimension.

And the second matrix calculation unit is used for calculating and obtaining a mixed calibration attention matrix according to the first covariance matrix and the second covariance matrix.

And the calibration unit is used for multiplying the characteristic information by the mixed calibration attention moment array to obtain the calibrated characteristic information.

In some embodiments, the calibration unit is configured to:

It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the content structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Referring to FIG. 12, a block diagram of a computer device 1200 according to one embodiment of the application is shown. The computer device 1200 may be any electronic device having data computing, processing, and storage functions. The computer apparatus 1200 may be used to implement the training method or the object activity recognition method of the object activity recognition model provided in the above-described embodiments.

In general, the computer device 1200 includes: a processor 1201 and a memory 1202.

Processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1201 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1201 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and rendering of content required to be displayed by the display screen. In some embodiments, the processor 1201 may also include an AI processor for processing computing operations related to machine learning.

Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 1202 is used to store a computer program configured to be executed by one or more processors to implement the training method or object activity recognition method of the object activity recognition model described above.

Those skilled in the art will appreciate that the architecture shown in fig. 12 is not limiting as to the computer device 1200, and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a computer readable storage medium is also provided, in which a computer program is stored which, when being executed by a processor of a computer device, implements the training method or object activity recognition method of the object activity recognition model described above. Alternatively, the above-mentioned computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory ), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, or the like.

In an exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program stored in a computer readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program so that the computer device performs the training method of the object activity recognition model or the object activity recognition method described above.

It should be noted that, before and during the process of collecting the relevant data of the user, the present application may display a prompt interface, a popup window or output voice prompt information, where the prompt interface, popup window or voice prompt information is used to prompt the user to collect the relevant data currently, so that the present application only starts to execute the relevant step of obtaining the relevant data of the user after obtaining the confirmation operation of the user to the prompt interface or popup window, otherwise (i.e. when the confirmation operation of the user to the prompt interface or popup window is not obtained), the relevant step of obtaining the relevant data of the user is finished, i.e. the relevant data of the user is not obtained. In other words, all the subject activity data collected by the method are collected under the condition that the informed consent or the independent consent of the personal information body is obtained under the condition that the user agrees and authorizes, the subsequent data use and processing actions are carried out within the authorized range of the legal regulations and the personal information body, and the collection, use and processing of the related user data are required to comply with the related legal regulations and standards of the related country and region.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limiting.

The foregoing description of the exemplary embodiments of the application is not intended to limit the application to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the application.

Claims

1. A method of training an object activity recognition model, the method comprising:

2. The method of claim 1, wherein generating the pseudo tag corresponding to the second sample data using a tag generation model comprises:

respectively processing the second sample data by adopting n data enhancement modes to obtain n processed second sample data, wherein n is an integer greater than 1;

Generating classification probabilities respectively corresponding to the n processed second sample data by adopting the label generation model, wherein the classification probabilities refer to probabilities that the processed second sample data belong to each object activity category;

and determining the pseudo tag corresponding to the second sample data according to the classification probabilities respectively corresponding to the n processed second sample data.

3. The method according to claim 2, wherein determining the pseudo tag corresponding to the second sample data according to the classification probabilities respectively corresponding to the n processed second sample data includes:

4. The method of claim 2, wherein the sample data in the candidate dataset comprises:

5. The method according to claim 1, wherein the arbitrarily selecting two sample data from the candidate data set, performing interpolation processing on the two sample data and tag information corresponding to the two sample data, to obtain mixed sample data and tag information corresponding to the mixed sample data, includes:

6. The method of claim 1, wherein the object activity recognition model comprises: the device comprises a feature extraction module, a calibration module and a classification module;

iteratively adjusting parameters of the object activity recognition model based on the mixed sample data and label information corresponding to the mixed sample data to obtain a trained object activity recognition model, wherein the method comprises the following steps:

Acquiring characteristic information of the mixed sample data through the characteristic extraction module;

the characteristic information is calibrated through the calibration module, so that calibrated characteristic information is obtained, and the calibration process is used for enhancing the correlation between the characteristic information;

obtaining prediction information of the object activity category corresponding to the mixed sample data according to the calibrated characteristic information through the classification module;

calculating to obtain a loss function value of the object activity recognition model according to the prediction information of the object activity category corresponding to the mixed sample data and the label information corresponding to the mixed sample data;

and iteratively adjusting parameters of the object activity recognition model according to the loss function value to obtain the trained object activity recognition model.

7. The method of claim 6, wherein the calibrating the feature information by the calibration module to obtain calibrated feature information comprises:

calculating by the calibration module according to the characteristic information to obtain a first covariance matrix and a second covariance matrix, wherein the first covariance matrix is a characteristic covariance matrix of a sensor dimension, and the second covariance matrix is a characteristic covariance matrix of a time dimension;

Calculating a mixed calibration attention matrix according to the first covariance matrix and the second covariance matrix;

and multiplying the characteristic information by the mixed calibration attention moment array to obtain the calibrated characteristic information.

8. The method of claim 7, wherein said calculating a hybrid calibration attention matrix from said first covariance matrix and said second covariance matrix comprises:

9. The method according to claim 5, wherein the calculating the loss function value of the object activity recognition model according to the prediction information of the object activity class corresponding to the mixed sample data and the label information corresponding to the mixed sample data includes:

10. The method according to any one of claims 1 to 9, wherein the acquiring a first data set and a second data set for training the object activity recognition model comprises:

11. A method of object activity recognition, the method comprising:

acquiring feature information of the activity data through a feature extraction module of an object activity recognition model;

The characteristic information is calibrated through a calibration module of the object activity recognition model, so that calibrated characteristic information is obtained, and the calibration process is used for enhancing the correlation between the characteristic information;

12. The method according to claim 11, wherein the calibrating the feature information by the calibration module of the object activity recognition model to obtain calibrated feature information includes:

13. The method of claim 12, wherein the computing a hybrid calibration attention matrix from the first covariance matrix and the second covariance matrix comprises:

14. A training apparatus for an object activity recognition model, the apparatus comprising:

15. An object activity recognition apparatus, the apparatus comprising:

16. A computer device comprising a processor and a memory, the memory having stored therein a computer program that is loaded and executed by the processor to implement the method of training the object activity recognition model of any one of claims 1 to 10 or to implement the method of object activity recognition of any one of claims 11 to 13.

17. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, which is loaded and executed by a processor to implement the training method of the object activity recognition model according to any one of claims 1 to 10 or to implement the object activity recognition method according to any one of claims 11 to 13.

18. A computer program product, characterized in that the computer program product comprises a computer program that is loaded and executed by a processor to implement the training method of the object activity recognition model according to any one of claims 1 to 10 or to implement the object activity recognition method according to any one of claims 11 to 13.