CN117058716A - Cross-domain behavior recognition method and device based on image pre-fusion - Google Patents
Cross-domain behavior recognition method and device based on image pre-fusion Download PDFInfo
- Publication number
- CN117058716A CN117058716A CN202311044162.0A CN202311044162A CN117058716A CN 117058716 A CN117058716 A CN 117058716A CN 202311044162 A CN202311044162 A CN 202311044162A CN 117058716 A CN117058716 A CN 117058716A
- Authority
- CN
- China
- Prior art keywords
- data
- domain
- cross
- fusion
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 84
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000012549 training Methods 0.000 claims abstract description 63
- 238000003062 neural network model Methods 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims description 14
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 3
- 230000035582 behavioral recognition Effects 0.000 claims 1
- 230000006399 behavior Effects 0.000 description 65
- 230000006870 function Effects 0.000 description 22
- 238000010606 normalization Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 101100268665 Caenorhabditis elegans acc-1 gene Proteins 0.000 description 2
- 101100268668 Caenorhabditis elegans acc-2 gene Proteins 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The application provides a cross-domain behavior recognition method and device based on image pre-fusion, wherein the method comprises the following steps: constructing a behavior recognition dataset comprising a source domain dataset and a target domain dataset; training a neural network model using the processed source domain data to obtain a pre-training model; inputting the processed target domain data into a pre-training model to obtain a prediction output label and a corresponding prediction confidence; taking the prediction output label and the prediction confidence as pseudo labels of the target domain data; constructing a virtual sample according to the pseudo tag; acquiring a fusion tag of data in a virtual sample, and constructing a fusion data set; training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model; and inputting the target domain data into a cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. The method enables the model to migrate the knowledge learned from one domain to other domains, thereby improving generalization capability and robustness of the model and solving the problem of low accuracy in cross-domain recognition.
Description
Technical Field
The application relates to the technical field of computer vision and pattern recognition, in particular to a cross-domain behavior recognition method and device based on image pre-fusion.
Background
The field of behavior recognition includes methods for a variety of different depth architectures, such as: the double-flow network architecture uses two 2D convolution blocks to perform combined training on RGB and optical flow information and model time information; a time relation network depth model, which adopts a special pooling layer to model the time relation between video frames; an expanded two-dimensional convolution filter is integrated to take advantage of deep networks of large-scale pre-trained two-dimensional models, etc. However, the above method is trained on the same distributed training data set and test data set, i.e. all samples are from the same data set, and cannot be directly applied in the field of cross-domain behavior recognition.
The key to cross-domain identification is the domain difference between the source and target domains, because there is a difference in data distribution between the different domains, including differences in appearance, illumination, background, etc., for cross-domain identification tasks, the samples that are trained and tested are often from different data sets, i.e., the distribution of the samples is different. The method can lead to that the data distribution difference of samples can not be well eliminated under the cross-domain condition by some behavior recognition methods, so that the classification effect of the model is greatly reduced, and the accuracy of the cross-domain recognition is further affected.
Disclosure of Invention
The application provides a cross-domain behavior recognition method and device based on image pre-fusion, which are used for solving the problem of low accuracy of cross-domain recognition.
In a first aspect, the present application provides a cross-domain behavior recognition method based on image pre-fusion, including:
constructing a behavior recognition data set, wherein the behavior recognition data set comprises a source domain data set and a target domain data set;
normalizing the data in the source domain data set to obtain source domain data, wherein the source domain data have the same image shape and the pixel values of the source domain data have the same value range;
training a neural network model using the source domain data to obtain a pre-training model;
normalizing the data in the target domain data set to obtain target domain data;
inputting the target domain data into the pre-training model to obtain a prediction output label and a prediction confidence of the prediction output label;
taking the prediction output label and the prediction confidence as pseudo labels of the target domain data;
constructing a virtual sample according to the pseudo tag;
acquiring a fusion tag of the data in the virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample and the fusion tag;
normalizing the data in the fusion data set to obtain the processed fusion data;
training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model;
and inputting the target domain data into the cross-domain behavior recognition model to obtain a cross-domain behavior recognition result.
Optionally, the source domain data in the source domain data set is the same as the behavior class of the target domain data in the target domain data set.
Optionally, the neural network model includes a multi-layer convolution layer, a full connection layer, and a residual structure.
Optionally, the training the neural network model using the source domain data to obtain the pre-training model includes:
acquiring a first loss function value based on the cross entropy loss function;
reducing the first loss function value using a back propagation algorithm and a random gradient descent method;
and outputting model parameters of the neural network model when the first loss function value is smaller than or equal to a first loss threshold value so as to obtain the pre-training model.
Optionally, the constructing a virtual sample according to the pseudo tag includes:
taking out the target domain data with the prediction confidence coefficient higher than a confidence coefficient threshold and the prediction output label of the target domain data to obtain taken-out data;
and fusing the extracted data with random data in the source domain data set to construct a virtual sample, wherein the random data is the data with the same label.
Optionally, the method is characterized in that the fetched data and random data in the source domain data set are fused according to the following formula:
wherein lambda is the fusion proportionality coefficient, lambda epsilon [0,1 ]],To fuse data, x i To fetch data, x j Is random data.
Optionally, the image shape of the target domain data is the same as the image shape of the source domain data, and the pixel value range of the target domain data is the same as the pixel value range of the source domain data; the image shape of the processed fusion data is the same as the image shape of the source domain data, and the pixel value range of the processed fusion data is the same as the pixel value range of the source domain data.
Optionally, the training the pre-training model using the fusion data to obtain a cross-domain behavior recognition model includes:
acquiring a second loss function value based on the cross entropy loss function;
reducing the second loss function value using a back propagation algorithm and a random gradient descent method;
and outputting model parameters of the pre-training model when the second loss function value is smaller than or equal to a second loss threshold value so as to obtain the cross-domain behavior recognition model.
Optionally, the method further comprises:
acquiring a real label of the target domain data set;
comparing the cross-domain behavior recognition result with the real tag of the target domain data set to evaluate the cross-domain recognition performance of the cross-domain behavior recognition model.
In a second aspect, the present application provides a cross-domain behavior recognition device based on image pre-fusion, which is applied to the recognition method provided in the first aspect, and the device includes:
the acquisition module is used for acquiring data from the data set;
the processing module is used for processing the data in the source domain data set, the target domain data set and the fusion data set;
the fusion module is used for fusing the extracted data in the target domain data set with the random data in the source domain data set;
a construction module for constructing a dataset and a virtual sample;
the training module is used for training the neural network model and the pre-training model;
and the comparison module is used for evaluating the cross-domain recognition performance of the cross-domain recognition model.
As can be seen from the above technical solution, the present application provides a cross-domain behavior recognition method and apparatus based on image pre-fusion, where the method includes: constructing a behavior recognition dataset comprising a source domain dataset and a target domain dataset; training a neural network model by using the normalized source domain data to obtain a pre-training model; inputting the target domain data subjected to normalization processing into a pre-training model to obtain a prediction output label and a corresponding prediction confidence; taking the prediction output label and the prediction confidence as pseudo labels of the target domain data; constructing a virtual sample according to the pseudo tag; acquiring a fusion tag of data in a virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample; training a pre-training model by using the normalized fusion data to obtain a cross-domain behavior recognition model; and inputting the target domain data into a cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. The identification method can reduce the difference between the source domain and the target domain, so that the model can transfer the knowledge learned from one domain to other domains, thereby improving the generalization capability and the robustness of the model and solving the problem of low accuracy in cross-domain identification.
Drawings
In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of a cross-domain behavior recognition method based on image pre-fusion provided by the application;
FIG. 2 is a schematic diagram of a behavior recognition convolutional neural network model provided by the present application;
fig. 3 is a schematic diagram of image fusion and virtual sample generation provided by the present application.
Detailed Description
Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the examples below do not represent all embodiments consistent with the application. Merely exemplary of systems and methods consistent with aspects of the application as set forth in the claims.
In the field of behavior recognition, the key to cross-domain recognition is the domain difference between the source domain and the target domain, because there is a difference in data distribution between different domains, including differences in appearance, illumination, background, etc., for a cross-domain recognition task, samples that are trained and tested are often from different data sets, i.e., the distribution of samples is different. The method can lead to that the data distribution difference of samples can not be well eliminated under the cross-domain condition by some behavior recognition methods, so that the classification effect of the model is greatly reduced, and the accuracy of the cross-domain recognition is further affected.
In order to solve the problem of low accuracy in cross-domain identification, some embodiments of the present application provide a cross-domain behavior identification method based on image pre-fusion, referring to fig. 1, fig. 1 is a flowchart of the cross-domain behavior identification method based on image pre-fusion provided by the present application, where the cross-domain behavior identification method based on image pre-fusion provided by the embodiment of the present application includes:
s10: a behavior recognition dataset is constructed, the behavior recognition dataset comprising a source domain dataset and a target domain dataset.
The method comprises the steps of constructing a behavior recognition data set for cross-domain recognition, selecting two different behavior recognition data sets as A and B respectively, selecting data with the same behavior category from A, B, and constructing a source domain data set and a target domain data set respectively. It will be appreciated that the source domain data in the source domain data set is of the same behavior class as the target domain data in the target domain data set.
In some embodiments, taking the large behavior recognition dataset UCF101 and dataset HMDB51 as an example, dataset UCF101 provides 13320 videos of 101 action behavior categories, dataset HMDB51 contains 51 action behavior categories for a total of 6849 videos. The data set UCF101 is used as a source domain data set, and the data set HMDB51 is used as a target domain data set. Data with the same behavior category is selected from the data sets UCF101 and HMDB51, respectively, for example, the two selected data sets contain the same seven behavior categories, and the data sets are respectively constructed as a source domain data set UCF7 and a target domain data set HMDB7. It should be noted that, the data in the source domain data set UCF7 and the target domain data set HMDB7 respectively have a real tag, and the real tag is a behavior class tag to which the corresponding data provided in the data set belongs.
S20: and carrying out normalization processing on the data in the source domain data set to obtain source domain data.
The data in the source domain data set and the data in the target domain data set are both image data. The image data in the source domain dataset UCF7 is normalized, which includes processing of image shapes and pixel value ranges for the image data in the source domain dataset UCF7 to adjust the image data in the source domain dataset UCF7 to the same image shapes and the same pixel value ranges. The pixel value range after normalization processing is 0-1. For example, the image shapes in the set of the source domain data UCF7 can be uniformly adjusted to 224×224×3, and the pixel value range can be adjusted to 0-1, so as to obtain the source domain data.
S30: training a neural network model using the source domain data to obtain a pre-trained model.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a behavior recognition convolutional neural network model provided by the present application, for a single frame image, image features are extracted through a convolutional layer conv, and in order to obtain information of a previous frame, part of extracted features obtained from the previous frame replace features at a corresponding position of a current frame, feature fusion is performed through the convolutional layer, and finally all features are sent to a classification layer to obtain a final behavior recognition classification prediction result. The neural network model includes a multi-layer convolution layer, a full connection layer, and a residual structure. The source domain data having an image shape 224 x 3 is input to a neural network model, which in some embodiments may be a deep full convolution neural network model that can accommodate source domain data inputs of any size.
After the source domain data is input into the neural network model, a first loss function value of the source domain data is obtained based on a cross entropy loss function, and a cross entropy loss function formula is as follows:
wherein i is the sample, c is the class, N is the number of samples in the source domain dataset, M is the number of classes in the source domain dataset, y ic As a sign function, p ic The prediction probability of class c for sample i is predicted for the model. If the true class of sample i is the same as c, y ic 1, if the true class of sample i is different from c, y ic Is 0.
And calculating a first loss function value between a predicted value and a true value of the neural network model according to the cross entropy loss function, updating model parameters through a back propagation algorithm and a random gradient descent method, and reducing the loss between the predicted value and the true value of the model, so that the predicted value of the model can be more similar to the true value to train the neural network model, and outputting model parameters of the neural network model when the first loss function value is smaller than or equal to a first loss threshold value to finish model training so as to obtain a pre-training model.
S40: and carrying out normalization processing on the data in the target domain data set to obtain target domain data.
The image shape of the target domain data after normalization processing is the same as the image shape of the source domain data, and the pixel value range of the target domain data is the same as the pixel value range of the source domain data. In some embodiments, the data size of the target domain data may be made the same as the image shape of the source domain data, also 224×224×3. The image shapes of the source domain data and the target domain data are unified to be the same size, and subsequent data fusion is facilitated.
S50: and inputting the target domain data into the pre-training model to obtain a prediction output label and a prediction confidence of the prediction output label.
Inputting the target domain data in the target domain data set HMDB7 into a pre-training model to obtain a prediction output label of the pre-training model and a prediction confidence conf of the prediction output label.
S60: and taking the prediction output label and the prediction confidence as pseudo labels of the target domain data.
According to the identification method provided by the embodiment of the application, because cross-domain identification needs to be tested on the target domain data set HMDB7, the real label of the target domain data set HMDB7 is not used in the test process. And the pseudo tag can reduce the category overlapping of the data, and the pseudo tag can make the category boundary of the data clearer and the learned category more compact.
S70: and constructing a virtual sample according to the pseudo tag.
The target domain data in the target domain data set HMDB7 is screened based on the pseudo tag of the target domain data set HMDB7 obtained in the above step S60. In some embodiments, the confidence threshold may be t=0.7, and the target domain data with the predicted confidence higher than the confidence threshold and the predicted output tag of the target domain data are fetched from the target domain data set HMDB7 to obtain fetched data.
The fetched data is fused with random data in the source domain dataset UCF7 to construct a virtual sample. It should be noted that, the random data in the source domain data set UCF7 is the data with the same real label, and the source domain data set UCF7 does not need to generate a pseudo label, because the cross-domain identification is trained on the source domain, the test is performed on the target domain, and the real label on the source domain data set UCF7 can be used during the training. In some embodiments, the fetch data is fused with random data in the source domain dataset UCF7 according to the following:
wherein lambda is the fusion proportionality coefficient, lambda epsilon [0,1 ]],To fuse data, x i To fetch data, x j Is random data. In some embodiments, λ may be 0.5, and substituting the value of λ into the above formula to obtain the fusion data +.>Using the fusion data obtained->To construct a virtual sample. Referring to fig. 3, fig. 3 is a schematic diagram of image fusion and virtual sample generation provided by the present application. In the data used for fusion, the true label of the source domain data is consistent with the pseudo label of the target domain data, and the data with inconsistent labels is not fused.
S80: acquiring fusion data in the virtual sampleIs to use the fusion data +.>And constructing a fusion data set by the fusion tag. It can be understood that the behavior category corresponding to the fusion tag is the same as the behavior category corresponding to the real tag of the source domain data or the pseudo tag of the target domain data.
S90: and carrying out normalization processing on the data in the fusion data set to obtain processed fusion data.
The image shape of the processed fusion data is the same as the image shape of the source domain data, and the pixel value range of the processed fusion data is the same as the pixel value range of the source domain data. In some embodiments, the image shape of the fusion data may be processed to be the same as the image shape of the source domain data, also 224×224×3. The target domain data and the processed fusion data are respectively identical with the image shape of the source domain data, and the pixel value ranges of the target domain data and the processed fusion data are respectively identical with the pixel value ranges of the source domain data, so that the effect of cross-domain identification is achieved.
S100: and training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model.
And inputting the processed fusion data into a pre-training model, and acquiring a second loss function value based on the cross entropy loss function. It will be appreciated that the formula of the cross-over loss function in this step is the same as the formula in step S above, except that N in the formula is the number of samples in the target domain dataset and M is the number of categories in the target domain dataset.
And calculating a second loss function value between a predicted value and a true value of the pre-training model according to the cross entropy loss function, updating model parameters through a back propagation algorithm and a random gradient descent method, and reducing the loss between the predicted value and the true value of the model, so that the predicted value of the model can be more similar to the true value to train the pre-training model, and outputting model parameters of the pre-training model when the second loss function value is smaller than or equal to a second loss threshold value to finish model training so as to obtain a cross-domain behavior recognition model.
S110: and inputting the target domain data into the cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. And identifying the cross-domain behaviors by using the obtained cross-domain behavior identification result.
In some embodiments, the cross-domain behavior recognition method based on image pre-fusion provided by the application further comprises the following steps:
the real label of the target domain data set HMDB7 is obtained, and the cross-domain behavior recognition result obtained in the step S110 is compared with the real label of the target domain data set HMDB7, namely, the prediction accuracy of the cross-domain behavior recognition model on the target domain data set is evaluated. For example: and testing the pre-training model on the target domain data set to obtain a first accuracy acc1, and testing the cross-domain recognition behavior model on the target domain data set to obtain a second accuracy acc2. If acc2> acc1, the validity of the cross-domain behavior recognition method can be described, and the cross-domain recognition performance of the cross-domain behavior recognition model can be evaluated.
According to the cross-domain behavior recognition method, the virtual sample expansion data are constructed by using the pseudo tag fusion data, so that model degradation conditions when a pre-training model is migrated to a new scene are reduced, and the purposes of increasing model robustness and cross-domain recognition capability are achieved. And then, on the basis of data fusion, guiding a convolutional neural network model to learn and fuse the characteristics of the source domain data set through a cross entropy loss function, so that the characteristics related to human actions can be deeply mined in the learning process, and analysis and test are carried out on the model performance. The method can reduce the domain difference to the greatest extent and improve the performance of the cross-domain model.
Some embodiments of the present application further provide a cross-domain behavior recognition device based on image pre-fusion, which is applied to the recognition method provided in the foregoing embodiments, where the device includes:
the acquisition module is used for acquiring data from the data set;
the processing module is used for processing the data in the source domain data set, the target domain data set and the fusion data set;
the fusion module is used for fusing the extracted data in the target domain data set with the random data in the source domain data set;
a construction module for constructing a dataset and a virtual sample;
the training module is used for training the neural network model and the pre-training model;
and the comparison module is used for evaluating the cross-domain recognition performance of the cross-domain recognition model.
As can be seen from the above technical solutions, the embodiments of the present application provide a cross-domain behavior recognition method and apparatus based on image pre-fusion, where the method includes: constructing a behavior recognition dataset comprising a source domain dataset and a target domain dataset; training a neural network model by using the normalized source domain data to obtain a pre-training model; inputting the target domain data subjected to normalization processing into a pre-training model to obtain a prediction output label and a corresponding prediction confidence; taking the prediction output label and the prediction confidence as pseudo labels of the target domain data; constructing a virtual sample according to the pseudo tag; acquiring a fusion tag of data in a virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample; training a pre-training model by using the normalized fusion data to obtain a cross-domain behavior recognition model; and inputting the target domain data into a cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. The identification method can reduce the difference between the source domain and the target domain, so that the model can transfer the knowledge learned from one domain to other domains, thereby improving the generalization capability and the robustness of the model and solving the problem of low accuracy in cross-domain identification.
The above-provided detailed description is merely a few examples under the general inventive concept and does not limit the scope of the present application. Any other embodiments which are extended according to the solution of the application without inventive effort fall within the scope of protection of the application for a person skilled in the art.
Claims (10)
1. The cross-domain behavior recognition method based on image pre-fusion is characterized by comprising the following steps of:
constructing a behavior recognition data set, wherein the behavior recognition data set comprises a source domain data set and a target domain data set;
normalizing the data in the source domain data set to obtain source domain data, wherein the source domain data have the same image shape and the pixel values of the source domain data have the same value range;
training a neural network model using the source domain data to obtain a pre-training model;
normalizing the data in the target domain data set to obtain target domain data;
inputting the target domain data into the pre-training model to obtain a prediction output label and a prediction confidence of the prediction output label;
taking the prediction output label and the prediction confidence as pseudo labels of the target domain data;
constructing a virtual sample according to the pseudo tag;
acquiring a fusion tag of the data in the virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample and the fusion tag;
normalizing the data in the fusion data set to obtain the processed fusion data;
training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model;
and inputting the target domain data into the cross-domain behavior recognition model to obtain a cross-domain behavior recognition result.
2. The image pre-fusion based cross-domain behavior recognition method according to claim 1, wherein the source domain data in the source domain data set is the same as the behavior class of the target domain data in the target domain data set.
3. The image pre-fusion-based cross-domain behavior recognition method according to claim 1, wherein the neural network model comprises a multi-layer convolution layer, a full connection layer and a residual structure.
4. The image pre-fusion based cross-domain behavioral recognition method of claim 1, wherein training a neural network model using the source domain data to obtain a pre-trained model comprises:
acquiring a first loss function value based on the cross entropy loss function;
reducing the first loss function value using a back propagation algorithm and a random gradient descent method;
and outputting model parameters of the neural network model when the first loss function value is smaller than or equal to a first loss threshold value so as to obtain the pre-training model.
5. The image pre-fusion based cross-domain behavior recognition method according to claim 1, wherein the constructing a virtual sample according to the pseudo tag comprises:
taking out the target domain data with the prediction confidence coefficient higher than a confidence coefficient threshold and the prediction output label of the target domain data to obtain taken-out data;
and fusing the extracted data with random data in the source domain data set to construct a virtual sample, wherein the random data is the data with the same label.
6. The cross-domain behavior recognition method based on image pre-fusion according to claim 5, wherein the fetched data and random data in the source domain data set are fused according to the following formula:
wherein lambda is the fusion proportionality coefficient, lambda epsilon [0,1 ]],To fuse data, x i To fetch data, x j Is random data.
7. The image pre-fusion-based cross-domain behavior recognition method according to claim 1, wherein the image shape of the target domain data is the same as the image shape of the source domain data, and the pixel value range of the target domain data is the same as the pixel value range of the source domain data; the image shape of the processed fusion data is the same as the image shape of the source domain data, and the pixel value range of the processed fusion data is the same as the pixel value range of the source domain data.
8. The method of image pre-fusion based cross-domain behavior recognition according to claim 1, wherein training the pre-training model using the fusion data to obtain a cross-domain behavior recognition model comprises:
acquiring a second loss function value based on the cross entropy loss function;
reducing the second loss function value using a back propagation algorithm and a random gradient descent method;
and outputting model parameters of the pre-training model when the second loss function value is smaller than or equal to a second loss threshold value so as to obtain the cross-domain behavior recognition model.
9. The image pre-fusion-based cross-domain behavior recognition method according to claim 1, further comprising:
acquiring a real label of the target domain data set;
comparing the cross-domain behavior recognition result with the real tag of the target domain data set to evaluate the cross-domain recognition performance of the cross-domain behavior recognition model.
10. A cross-domain behavior recognition device based on image pre-fusion, characterized in that it is applied to the recognition method of any one of claims 1 to 9, said device comprising:
the acquisition module is used for acquiring data from the data set;
the processing module is used for processing the data in the source domain data set, the target domain data set and the fusion data set;
the fusion module is used for fusing the extracted data in the target domain data set with the random data in the source domain data set;
a construction module for constructing a dataset and a virtual sample;
the training module is used for training the neural network model and the pre-training model;
and the comparison module is used for evaluating the cross-domain recognition performance of the cross-domain recognition model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311044162.0A CN117058716A (en) | 2023-08-18 | 2023-08-18 | Cross-domain behavior recognition method and device based on image pre-fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311044162.0A CN117058716A (en) | 2023-08-18 | 2023-08-18 | Cross-domain behavior recognition method and device based on image pre-fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117058716A true CN117058716A (en) | 2023-11-14 |
Family
ID=88654968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311044162.0A Pending CN117058716A (en) | 2023-08-18 | 2023-08-18 | Cross-domain behavior recognition method and device based on image pre-fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117058716A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113792751A (en) * | 2021-07-28 | 2021-12-14 | 中国科学院自动化研究所 | Cross-domain behavior identification method, device, equipment and readable storage medium |
-
2023
- 2023-08-18 CN CN202311044162.0A patent/CN117058716A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113792751A (en) * | 2021-07-28 | 2021-12-14 | 中国科学院自动化研究所 | Cross-domain behavior identification method, device, equipment and readable storage medium |
CN113792751B (en) * | 2021-07-28 | 2024-06-04 | 中国科学院自动化研究所 | Cross-domain behavior recognition method, device, equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108537136B (en) | Pedestrian re-identification method based on attitude normalization image generation | |
CN107330453B (en) | Pornographic image identification method based on step-by-step identification and fusion key part detection | |
CN109816032B (en) | Unbiased mapping zero sample classification method and device based on generative countermeasure network | |
CN110059586B (en) | Iris positioning and segmenting system based on cavity residual error attention structure | |
CN111598182B (en) | Method, device, equipment and medium for training neural network and image recognition | |
CN110458084B (en) | Face age estimation method based on inverted residual error network | |
CN113011357A (en) | Depth fake face video positioning method based on space-time fusion | |
CN107247952B (en) | Deep supervision-based visual saliency detection method for cyclic convolution neural network | |
CN111428511B (en) | Event detection method and device | |
CN112434599B (en) | Pedestrian re-identification method based on random occlusion recovery of noise channel | |
CN117058716A (en) | Cross-domain behavior recognition method and device based on image pre-fusion | |
CN116994044A (en) | Construction method of image anomaly detection model based on mask multi-mode generation countermeasure network | |
CN115393698A (en) | Digital image tampering detection method based on improved DPN network | |
CN114170484A (en) | Picture attribute prediction method and device, electronic equipment and storage medium | |
CN117611838A (en) | Multi-label image classification method based on self-adaptive hypergraph convolutional network | |
CN117011219A (en) | Method, apparatus, device, storage medium and program product for detecting quality of article | |
CN114495114B (en) | Text sequence recognition model calibration method based on CTC decoder | |
CN116030077A (en) | Video salient region detection method based on multi-dataset collaborative learning | |
CN115546689A (en) | Video time sequence abnormal frame detection method based on unsupervised frame correlation | |
CN117036843A (en) | Target detection model training method, target detection method and device | |
CN112396126B (en) | Target detection method and system based on detection trunk and local feature optimization | |
Yang et al. | iCausalOSR: invertible Causal Disentanglement for Open-set Recognition | |
CN115131600A (en) | Detection model training method, detection method, device, equipment and storage medium | |
CN111582057B (en) | Face verification method based on local receptive field | |
CN114818945A (en) | Small sample image classification method and device integrating category adaptive metric learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |