CN117058716A

CN117058716A - Cross-domain behavior recognition method and device based on image pre-fusion

Info

Publication number: CN117058716A
Application number: CN202311044162.0A
Authority: CN
Inventors: 赵正平; 宋登峰; 杨永森; 普碧才; 田永军; 霍智锋; 赵丹; 佘有明; 李振弘; 刘镇; 苏慧; 代礼琴; 陈乐�; 陆建锋; 金春仙; 党璐璐; 马文亮; 张颖; 李均宏; 杨征鸿
Original assignee: Nujiang Power Supply Bureau of Yunnan Power Grid Co Ltd
Current assignee: Nujiang Power Supply Bureau of Yunnan Power Grid Co Ltd
Priority date: 2023-08-18
Filing date: 2023-08-18
Publication date: 2023-11-14

Abstract

The application provides a cross-domain behavior recognition method and device based on image pre-fusion, wherein the method comprises the following steps: constructing a behavior recognition dataset comprising a source domain dataset and a target domain dataset; training a neural network model using the processed source domain data to obtain a pre-training model; inputting the processed target domain data into a pre-training model to obtain a prediction output label and a corresponding prediction confidence; taking the prediction output label and the prediction confidence as pseudo labels of the target domain data; constructing a virtual sample according to the pseudo tag; acquiring a fusion tag of data in a virtual sample, and constructing a fusion data set; training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model; and inputting the target domain data into a cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. The method enables the model to migrate the knowledge learned from one domain to other domains, thereby improving generalization capability and robustness of the model and solving the problem of low accuracy in cross-domain recognition.

Description

Cross-domain behavior recognition method and device based on image pre-fusion

Technical Field

The application relates to the technical field of computer vision and pattern recognition, in particular to a cross-domain behavior recognition method and device based on image pre-fusion.

Background

The field of behavior recognition includes methods for a variety of different depth architectures, such as: the double-flow network architecture uses two 2D convolution blocks to perform combined training on RGB and optical flow information and model time information; a time relation network depth model, which adopts a special pooling layer to model the time relation between video frames; an expanded two-dimensional convolution filter is integrated to take advantage of deep networks of large-scale pre-trained two-dimensional models, etc. However, the above method is trained on the same distributed training data set and test data set, i.e. all samples are from the same data set, and cannot be directly applied in the field of cross-domain behavior recognition.

The key to cross-domain identification is the domain difference between the source and target domains, because there is a difference in data distribution between the different domains, including differences in appearance, illumination, background, etc., for cross-domain identification tasks, the samples that are trained and tested are often from different data sets, i.e., the distribution of the samples is different. The method can lead to that the data distribution difference of samples can not be well eliminated under the cross-domain condition by some behavior recognition methods, so that the classification effect of the model is greatly reduced, and the accuracy of the cross-domain recognition is further affected.

Disclosure of Invention

The application provides a cross-domain behavior recognition method and device based on image pre-fusion, which are used for solving the problem of low accuracy of cross-domain recognition.

In a first aspect, the present application provides a cross-domain behavior recognition method based on image pre-fusion, including:

constructing a behavior recognition data set, wherein the behavior recognition data set comprises a source domain data set and a target domain data set;

normalizing the data in the source domain data set to obtain source domain data, wherein the source domain data have the same image shape and the pixel values of the source domain data have the same value range;

training a neural network model using the source domain data to obtain a pre-training model;

normalizing the data in the target domain data set to obtain target domain data;

inputting the target domain data into the pre-training model to obtain a prediction output label and a prediction confidence of the prediction output label;

taking the prediction output label and the prediction confidence as pseudo labels of the target domain data;

constructing a virtual sample according to the pseudo tag;

acquiring a fusion tag of the data in the virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample and the fusion tag;

normalizing the data in the fusion data set to obtain the processed fusion data;

training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model;

and inputting the target domain data into the cross-domain behavior recognition model to obtain a cross-domain behavior recognition result.

Optionally, the source domain data in the source domain data set is the same as the behavior class of the target domain data in the target domain data set.

Optionally, the neural network model includes a multi-layer convolution layer, a full connection layer, and a residual structure.

Optionally, the training the neural network model using the source domain data to obtain the pre-training model includes:

acquiring a first loss function value based on the cross entropy loss function;

reducing the first loss function value using a back propagation algorithm and a random gradient descent method;

and outputting model parameters of the neural network model when the first loss function value is smaller than or equal to a first loss threshold value so as to obtain the pre-training model.

Optionally, the constructing a virtual sample according to the pseudo tag includes:

taking out the target domain data with the prediction confidence coefficient higher than a confidence coefficient threshold and the prediction output label of the target domain data to obtain taken-out data;

and fusing the extracted data with random data in the source domain data set to construct a virtual sample, wherein the random data is the data with the same label.

Optionally, the method is characterized in that the fetched data and random data in the source domain data set are fused according to the following formula:

wherein lambda is the fusion proportionality coefficient, lambda epsilon [0,1 ]]，To fuse data, x _i To fetch data, x _j Is random data.

Optionally, the image shape of the target domain data is the same as the image shape of the source domain data, and the pixel value range of the target domain data is the same as the pixel value range of the source domain data; the image shape of the processed fusion data is the same as the image shape of the source domain data, and the pixel value range of the processed fusion data is the same as the pixel value range of the source domain data.

Optionally, the training the pre-training model using the fusion data to obtain a cross-domain behavior recognition model includes:

acquiring a second loss function value based on the cross entropy loss function;

reducing the second loss function value using a back propagation algorithm and a random gradient descent method;

and outputting model parameters of the pre-training model when the second loss function value is smaller than or equal to a second loss threshold value so as to obtain the cross-domain behavior recognition model.

Optionally, the method further comprises:

acquiring a real label of the target domain data set;

comparing the cross-domain behavior recognition result with the real tag of the target domain data set to evaluate the cross-domain recognition performance of the cross-domain behavior recognition model.

In a second aspect, the present application provides a cross-domain behavior recognition device based on image pre-fusion, which is applied to the recognition method provided in the first aspect, and the device includes:

the acquisition module is used for acquiring data from the data set;

the processing module is used for processing the data in the source domain data set, the target domain data set and the fusion data set;

the fusion module is used for fusing the extracted data in the target domain data set with the random data in the source domain data set;

a construction module for constructing a dataset and a virtual sample;

the training module is used for training the neural network model and the pre-training model;

and the comparison module is used for evaluating the cross-domain recognition performance of the cross-domain recognition model.

As can be seen from the above technical solution, the present application provides a cross-domain behavior recognition method and apparatus based on image pre-fusion, where the method includes: constructing a behavior recognition dataset comprising a source domain dataset and a target domain dataset; training a neural network model by using the normalized source domain data to obtain a pre-training model; inputting the target domain data subjected to normalization processing into a pre-training model to obtain a prediction output label and a corresponding prediction confidence; taking the prediction output label and the prediction confidence as pseudo labels of the target domain data; constructing a virtual sample according to the pseudo tag; acquiring a fusion tag of data in a virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample; training a pre-training model by using the normalized fusion data to obtain a cross-domain behavior recognition model; and inputting the target domain data into a cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. The identification method can reduce the difference between the source domain and the target domain, so that the model can transfer the knowledge learned from one domain to other domains, thereby improving the generalization capability and the robustness of the model and solving the problem of low accuracy in cross-domain identification.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a flow chart of a cross-domain behavior recognition method based on image pre-fusion provided by the application;

FIG. 2 is a schematic diagram of a behavior recognition convolutional neural network model provided by the present application;

fig. 3 is a schematic diagram of image fusion and virtual sample generation provided by the present application.

Detailed Description

Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the examples below do not represent all embodiments consistent with the application. Merely exemplary of systems and methods consistent with aspects of the application as set forth in the claims.

In the field of behavior recognition, the key to cross-domain recognition is the domain difference between the source domain and the target domain, because there is a difference in data distribution between different domains, including differences in appearance, illumination, background, etc., for a cross-domain recognition task, samples that are trained and tested are often from different data sets, i.e., the distribution of samples is different. The method can lead to that the data distribution difference of samples can not be well eliminated under the cross-domain condition by some behavior recognition methods, so that the classification effect of the model is greatly reduced, and the accuracy of the cross-domain recognition is further affected.

In order to solve the problem of low accuracy in cross-domain identification, some embodiments of the present application provide a cross-domain behavior identification method based on image pre-fusion, referring to fig. 1, fig. 1 is a flowchart of the cross-domain behavior identification method based on image pre-fusion provided by the present application, where the cross-domain behavior identification method based on image pre-fusion provided by the embodiment of the present application includes:

s10: a behavior recognition dataset is constructed, the behavior recognition dataset comprising a source domain dataset and a target domain dataset.

The method comprises the steps of constructing a behavior recognition data set for cross-domain recognition, selecting two different behavior recognition data sets as A and B respectively, selecting data with the same behavior category from A, B, and constructing a source domain data set and a target domain data set respectively. It will be appreciated that the source domain data in the source domain data set is of the same behavior class as the target domain data in the target domain data set.

In some embodiments, taking the large behavior recognition dataset UCF101 and dataset HMDB51 as an example, dataset UCF101 provides 13320 videos of 101 action behavior categories, dataset HMDB51 contains 51 action behavior categories for a total of 6849 videos. The data set UCF101 is used as a source domain data set, and the data set HMDB51 is used as a target domain data set. Data with the same behavior category is selected from the data sets UCF101 and HMDB51, respectively, for example, the two selected data sets contain the same seven behavior categories, and the data sets are respectively constructed as a source domain data set UCF7 and a target domain data set HMDB7. It should be noted that, the data in the source domain data set UCF7 and the target domain data set HMDB7 respectively have a real tag, and the real tag is a behavior class tag to which the corresponding data provided in the data set belongs.

S20: and carrying out normalization processing on the data in the source domain data set to obtain source domain data.

The data in the source domain data set and the data in the target domain data set are both image data. The image data in the source domain dataset UCF7 is normalized, which includes processing of image shapes and pixel value ranges for the image data in the source domain dataset UCF7 to adjust the image data in the source domain dataset UCF7 to the same image shapes and the same pixel value ranges. The pixel value range after normalization processing is 0-1. For example, the image shapes in the set of the source domain data UCF7 can be uniformly adjusted to 224×224×3, and the pixel value range can be adjusted to 0-1, so as to obtain the source domain data.

S30: training a neural network model using the source domain data to obtain a pre-trained model.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a behavior recognition convolutional neural network model provided by the present application, for a single frame image, image features are extracted through a convolutional layer conv, and in order to obtain information of a previous frame, part of extracted features obtained from the previous frame replace features at a corresponding position of a current frame, feature fusion is performed through the convolutional layer, and finally all features are sent to a classification layer to obtain a final behavior recognition classification prediction result. The neural network model includes a multi-layer convolution layer, a full connection layer, and a residual structure. The source domain data having an image shape 224 x 3 is input to a neural network model, which in some embodiments may be a deep full convolution neural network model that can accommodate source domain data inputs of any size.

After the source domain data is input into the neural network model, a first loss function value of the source domain data is obtained based on a cross entropy loss function, and a cross entropy loss function formula is as follows:

wherein i is the sample, c is the class, N is the number of samples in the source domain dataset, M is the number of classes in the source domain dataset, y _ic As a sign function, p _ic The prediction probability of class c for sample i is predicted for the model. If the true class of sample i is the same as c, y _ic 1, if the true class of sample i is different from c, y _ic Is 0.

And calculating a first loss function value between a predicted value and a true value of the neural network model according to the cross entropy loss function, updating model parameters through a back propagation algorithm and a random gradient descent method, and reducing the loss between the predicted value and the true value of the model, so that the predicted value of the model can be more similar to the true value to train the neural network model, and outputting model parameters of the neural network model when the first loss function value is smaller than or equal to a first loss threshold value to finish model training so as to obtain a pre-training model.

S40: and carrying out normalization processing on the data in the target domain data set to obtain target domain data.

The image shape of the target domain data after normalization processing is the same as the image shape of the source domain data, and the pixel value range of the target domain data is the same as the pixel value range of the source domain data. In some embodiments, the data size of the target domain data may be made the same as the image shape of the source domain data, also 224×224×3. The image shapes of the source domain data and the target domain data are unified to be the same size, and subsequent data fusion is facilitated.

S50: and inputting the target domain data into the pre-training model to obtain a prediction output label and a prediction confidence of the prediction output label.

Inputting the target domain data in the target domain data set HMDB7 into a pre-training model to obtain a prediction output label of the pre-training model and a prediction confidence conf of the prediction output label.

S60: and taking the prediction output label and the prediction confidence as pseudo labels of the target domain data.

According to the identification method provided by the embodiment of the application, because cross-domain identification needs to be tested on the target domain data set HMDB7, the real label of the target domain data set HMDB7 is not used in the test process. And the pseudo tag can reduce the category overlapping of the data, and the pseudo tag can make the category boundary of the data clearer and the learned category more compact.

S70: and constructing a virtual sample according to the pseudo tag.

The target domain data in the target domain data set HMDB7 is screened based on the pseudo tag of the target domain data set HMDB7 obtained in the above step S60. In some embodiments, the confidence threshold may be t=0.7, and the target domain data with the predicted confidence higher than the confidence threshold and the predicted output tag of the target domain data are fetched from the target domain data set HMDB7 to obtain fetched data.

The fetched data is fused with random data in the source domain dataset UCF7 to construct a virtual sample. It should be noted that, the random data in the source domain data set UCF7 is the data with the same real label, and the source domain data set UCF7 does not need to generate a pseudo label, because the cross-domain identification is trained on the source domain, the test is performed on the target domain, and the real label on the source domain data set UCF7 can be used during the training. In some embodiments, the fetch data is fused with random data in the source domain dataset UCF7 according to the following:

wherein lambda is the fusion proportionality coefficient, lambda epsilon [0,1 ]]，To fuse data, x _i To fetch data, x _j Is random data. In some embodiments, λ may be 0.5, and substituting the value of λ into the above formula to obtain the fusion data +.>Using the fusion data obtained->To construct a virtual sample. Referring to fig. 3, fig. 3 is a schematic diagram of image fusion and virtual sample generation provided by the present application. In the data used for fusion, the true label of the source domain data is consistent with the pseudo label of the target domain data, and the data with inconsistent labels is not fused.

S80: acquiring fusion data in the virtual sampleIs to use the fusion data +.>And constructing a fusion data set by the fusion tag. It can be understood that the behavior category corresponding to the fusion tag is the same as the behavior category corresponding to the real tag of the source domain data or the pseudo tag of the target domain data.

S90: and carrying out normalization processing on the data in the fusion data set to obtain processed fusion data.

The image shape of the processed fusion data is the same as the image shape of the source domain data, and the pixel value range of the processed fusion data is the same as the pixel value range of the source domain data. In some embodiments, the image shape of the fusion data may be processed to be the same as the image shape of the source domain data, also 224×224×3. The target domain data and the processed fusion data are respectively identical with the image shape of the source domain data, and the pixel value ranges of the target domain data and the processed fusion data are respectively identical with the pixel value ranges of the source domain data, so that the effect of cross-domain identification is achieved.

S100: and training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model.

And inputting the processed fusion data into a pre-training model, and acquiring a second loss function value based on the cross entropy loss function. It will be appreciated that the formula of the cross-over loss function in this step is the same as the formula in step S above, except that N in the formula is the number of samples in the target domain dataset and M is the number of categories in the target domain dataset.

And calculating a second loss function value between a predicted value and a true value of the pre-training model according to the cross entropy loss function, updating model parameters through a back propagation algorithm and a random gradient descent method, and reducing the loss between the predicted value and the true value of the model, so that the predicted value of the model can be more similar to the true value to train the pre-training model, and outputting model parameters of the pre-training model when the second loss function value is smaller than or equal to a second loss threshold value to finish model training so as to obtain a cross-domain behavior recognition model.

S110: and inputting the target domain data into the cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. And identifying the cross-domain behaviors by using the obtained cross-domain behavior identification result.

In some embodiments, the cross-domain behavior recognition method based on image pre-fusion provided by the application further comprises the following steps:

the real label of the target domain data set HMDB7 is obtained, and the cross-domain behavior recognition result obtained in the step S110 is compared with the real label of the target domain data set HMDB7, namely, the prediction accuracy of the cross-domain behavior recognition model on the target domain data set is evaluated. For example: and testing the pre-training model on the target domain data set to obtain a first accuracy acc1, and testing the cross-domain recognition behavior model on the target domain data set to obtain a second accuracy acc2. If acc2> acc1, the validity of the cross-domain behavior recognition method can be described, and the cross-domain recognition performance of the cross-domain behavior recognition model can be evaluated.

According to the cross-domain behavior recognition method, the virtual sample expansion data are constructed by using the pseudo tag fusion data, so that model degradation conditions when a pre-training model is migrated to a new scene are reduced, and the purposes of increasing model robustness and cross-domain recognition capability are achieved. And then, on the basis of data fusion, guiding a convolutional neural network model to learn and fuse the characteristics of the source domain data set through a cross entropy loss function, so that the characteristics related to human actions can be deeply mined in the learning process, and analysis and test are carried out on the model performance. The method can reduce the domain difference to the greatest extent and improve the performance of the cross-domain model.

Some embodiments of the present application further provide a cross-domain behavior recognition device based on image pre-fusion, which is applied to the recognition method provided in the foregoing embodiments, where the device includes:

the acquisition module is used for acquiring data from the data set;

a construction module for constructing a dataset and a virtual sample;

As can be seen from the above technical solutions, the embodiments of the present application provide a cross-domain behavior recognition method and apparatus based on image pre-fusion, where the method includes: constructing a behavior recognition dataset comprising a source domain dataset and a target domain dataset; training a neural network model by using the normalized source domain data to obtain a pre-training model; inputting the target domain data subjected to normalization processing into a pre-training model to obtain a prediction output label and a corresponding prediction confidence; taking the prediction output label and the prediction confidence as pseudo labels of the target domain data; constructing a virtual sample according to the pseudo tag; acquiring a fusion tag of data in a virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample; training a pre-training model by using the normalized fusion data to obtain a cross-domain behavior recognition model; and inputting the target domain data into a cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. The identification method can reduce the difference between the source domain and the target domain, so that the model can transfer the knowledge learned from one domain to other domains, thereby improving the generalization capability and the robustness of the model and solving the problem of low accuracy in cross-domain identification.

The above-provided detailed description is merely a few examples under the general inventive concept and does not limit the scope of the present application. Any other embodiments which are extended according to the solution of the application without inventive effort fall within the scope of protection of the application for a person skilled in the art.

Claims

1. The cross-domain behavior recognition method based on image pre-fusion is characterized by comprising the following steps of:

constructing a virtual sample according to the pseudo tag;

2. The image pre-fusion based cross-domain behavior recognition method according to claim 1, wherein the source domain data in the source domain data set is the same as the behavior class of the target domain data in the target domain data set.

3. The image pre-fusion-based cross-domain behavior recognition method according to claim 1, wherein the neural network model comprises a multi-layer convolution layer, a full connection layer and a residual structure.

4. The image pre-fusion based cross-domain behavioral recognition method of claim 1, wherein training a neural network model using the source domain data to obtain a pre-trained model comprises:

acquiring a first loss function value based on the cross entropy loss function;

5. The image pre-fusion based cross-domain behavior recognition method according to claim 1, wherein the constructing a virtual sample according to the pseudo tag comprises:

6. The cross-domain behavior recognition method based on image pre-fusion according to claim 5, wherein the fetched data and random data in the source domain data set are fused according to the following formula:

7. The image pre-fusion-based cross-domain behavior recognition method according to claim 1, wherein the image shape of the target domain data is the same as the image shape of the source domain data, and the pixel value range of the target domain data is the same as the pixel value range of the source domain data; the image shape of the processed fusion data is the same as the image shape of the source domain data, and the pixel value range of the processed fusion data is the same as the pixel value range of the source domain data.

8. The method of image pre-fusion based cross-domain behavior recognition according to claim 1, wherein training the pre-training model using the fusion data to obtain a cross-domain behavior recognition model comprises:

9. The image pre-fusion-based cross-domain behavior recognition method according to claim 1, further comprising:

acquiring a real label of the target domain data set;

10. A cross-domain behavior recognition device based on image pre-fusion, characterized in that it is applied to the recognition method of any one of claims 1 to 9, said device comprising:

the acquisition module is used for acquiring data from the data set;

a construction module for constructing a dataset and a virtual sample;