CN117058716A - Cross-domain behavior recognition method and device based on image pre-fusion - Google Patents

Cross-domain behavior recognition method and device based on image pre-fusion Download PDF

Info

Publication number
CN117058716A
CN117058716A CN202311044162.0A CN202311044162A CN117058716A CN 117058716 A CN117058716 A CN 117058716A CN 202311044162 A CN202311044162 A CN 202311044162A CN 117058716 A CN117058716 A CN 117058716A
Authority
CN
China
Prior art keywords
data
domain
cross
fusion
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311044162.0A
Other languages
Chinese (zh)
Inventor
赵正平
宋登峰
杨永森
普碧才
田永军
霍智锋
赵丹
佘有明
李振弘
刘镇
苏慧
代礼琴
陈乐�
陆建锋
金春仙
党璐璐
马文亮
张颖
李均宏
杨征鸿
张玉梅
杨向娟
蒋孝敬
宋明明
和正美
杨杰
徐正国
杨永平
和春元
杜海燕
耿怀旭
唐智能
秦建明
王啸虎
冯建辉
陈铸亮
孔碧光
英自才
刘绍正
耿座学
李圣
刘凡波
张铁斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nujiang Power Supply Bureau of Yunnan Power Grid Co Ltd
Original Assignee
Nujiang Power Supply Bureau of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nujiang Power Supply Bureau of Yunnan Power Grid Co Ltd filed Critical Nujiang Power Supply Bureau of Yunnan Power Grid Co Ltd
Priority to CN202311044162.0A priority Critical patent/CN117058716A/en
Publication of CN117058716A publication Critical patent/CN117058716A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a cross-domain behavior recognition method and device based on image pre-fusion, wherein the method comprises the following steps: constructing a behavior recognition dataset comprising a source domain dataset and a target domain dataset; training a neural network model using the processed source domain data to obtain a pre-training model; inputting the processed target domain data into a pre-training model to obtain a prediction output label and a corresponding prediction confidence; taking the prediction output label and the prediction confidence as pseudo labels of the target domain data; constructing a virtual sample according to the pseudo tag; acquiring a fusion tag of data in a virtual sample, and constructing a fusion data set; training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model; and inputting the target domain data into a cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. The method enables the model to migrate the knowledge learned from one domain to other domains, thereby improving generalization capability and robustness of the model and solving the problem of low accuracy in cross-domain recognition.

Description

Cross-domain behavior recognition method and device based on image pre-fusion
Technical Field
The application relates to the technical field of computer vision and pattern recognition, in particular to a cross-domain behavior recognition method and device based on image pre-fusion.
Background
The field of behavior recognition includes methods for a variety of different depth architectures, such as: the double-flow network architecture uses two 2D convolution blocks to perform combined training on RGB and optical flow information and model time information; a time relation network depth model, which adopts a special pooling layer to model the time relation between video frames; an expanded two-dimensional convolution filter is integrated to take advantage of deep networks of large-scale pre-trained two-dimensional models, etc. However, the above method is trained on the same distributed training data set and test data set, i.e. all samples are from the same data set, and cannot be directly applied in the field of cross-domain behavior recognition.
The key to cross-domain identification is the domain difference between the source and target domains, because there is a difference in data distribution between the different domains, including differences in appearance, illumination, background, etc., for cross-domain identification tasks, the samples that are trained and tested are often from different data sets, i.e., the distribution of the samples is different. The method can lead to that the data distribution difference of samples can not be well eliminated under the cross-domain condition by some behavior recognition methods, so that the classification effect of the model is greatly reduced, and the accuracy of the cross-domain recognition is further affected.
Disclosure of Invention
The application provides a cross-domain behavior recognition method and device based on image pre-fusion, which are used for solving the problem of low accuracy of cross-domain recognition.
In a first aspect, the present application provides a cross-domain behavior recognition method based on image pre-fusion, including:
constructing a behavior recognition data set, wherein the behavior recognition data set comprises a source domain data set and a target domain data set;
normalizing the data in the source domain data set to obtain source domain data, wherein the source domain data have the same image shape and the pixel values of the source domain data have the same value range;
training a neural network model using the source domain data to obtain a pre-training model;
normalizing the data in the target domain data set to obtain target domain data;
inputting the target domain data into the pre-training model to obtain a prediction output label and a prediction confidence of the prediction output label;
taking the prediction output label and the prediction confidence as pseudo labels of the target domain data;
constructing a virtual sample according to the pseudo tag;
acquiring a fusion tag of the data in the virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample and the fusion tag;
normalizing the data in the fusion data set to obtain the processed fusion data;
training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model;
and inputting the target domain data into the cross-domain behavior recognition model to obtain a cross-domain behavior recognition result.
Optionally, the source domain data in the source domain data set is the same as the behavior class of the target domain data in the target domain data set.
Optionally, the neural network model includes a multi-layer convolution layer, a full connection layer, and a residual structure.
Optionally, the training the neural network model using the source domain data to obtain the pre-training model includes:
acquiring a first loss function value based on the cross entropy loss function;
reducing the first loss function value using a back propagation algorithm and a random gradient descent method;
and outputting model parameters of the neural network model when the first loss function value is smaller than or equal to a first loss threshold value so as to obtain the pre-training model.
Optionally, the constructing a virtual sample according to the pseudo tag includes:
taking out the target domain data with the prediction confidence coefficient higher than a confidence coefficient threshold and the prediction output label of the target domain data to obtain taken-out data;
and fusing the extracted data with random data in the source domain data set to construct a virtual sample, wherein the random data is the data with the same label.
Optionally, the method is characterized in that the fetched data and random data in the source domain data set are fused according to the following formula:
wherein lambda is the fusion proportionality coefficient, lambda epsilon [0,1 ]],To fuse data, x i To fetch data, x j Is random data.
Optionally, the image shape of the target domain data is the same as the image shape of the source domain data, and the pixel value range of the target domain data is the same as the pixel value range of the source domain data; the image shape of the processed fusion data is the same as the image shape of the source domain data, and the pixel value range of the processed fusion data is the same as the pixel value range of the source domain data.
Optionally, the training the pre-training model using the fusion data to obtain a cross-domain behavior recognition model includes:
acquiring a second loss function value based on the cross entropy loss function;
reducing the second loss function value using a back propagation algorithm and a random gradient descent method;
and outputting model parameters of the pre-training model when the second loss function value is smaller than or equal to a second loss threshold value so as to obtain the cross-domain behavior recognition model.
Optionally, the method further comprises:
acquiring a real label of the target domain data set;
comparing the cross-domain behavior recognition result with the real tag of the target domain data set to evaluate the cross-domain recognition performance of the cross-domain behavior recognition model.
In a second aspect, the present application provides a cross-domain behavior recognition device based on image pre-fusion, which is applied to the recognition method provided in the first aspect, and the device includes:
the acquisition module is used for acquiring data from the data set;
the processing module is used for processing the data in the source domain data set, the target domain data set and the fusion data set;
the fusion module is used for fusing the extracted data in the target domain data set with the random data in the source domain data set;
a construction module for constructing a dataset and a virtual sample;
the training module is used for training the neural network model and the pre-training model;
and the comparison module is used for evaluating the cross-domain recognition performance of the cross-domain recognition model.
As can be seen from the above technical solution, the present application provides a cross-domain behavior recognition method and apparatus based on image pre-fusion, where the method includes: constructing a behavior recognition dataset comprising a source domain dataset and a target domain dataset; training a neural network model by using the normalized source domain data to obtain a pre-training model; inputting the target domain data subjected to normalization processing into a pre-training model to obtain a prediction output label and a corresponding prediction confidence; taking the prediction output label and the prediction confidence as pseudo labels of the target domain data; constructing a virtual sample according to the pseudo tag; acquiring a fusion tag of data in a virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample; training a pre-training model by using the normalized fusion data to obtain a cross-domain behavior recognition model; and inputting the target domain data into a cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. The identification method can reduce the difference between the source domain and the target domain, so that the model can transfer the knowledge learned from one domain to other domains, thereby improving the generalization capability and the robustness of the model and solving the problem of low accuracy in cross-domain identification.
Drawings
In order to more clearly illustrate the technical solution of the present application, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of a cross-domain behavior recognition method based on image pre-fusion provided by the application;
FIG. 2 is a schematic diagram of a behavior recognition convolutional neural network model provided by the present application;
fig. 3 is a schematic diagram of image fusion and virtual sample generation provided by the present application.
Detailed Description
Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the examples below do not represent all embodiments consistent with the application. Merely exemplary of systems and methods consistent with aspects of the application as set forth in the claims.
In the field of behavior recognition, the key to cross-domain recognition is the domain difference between the source domain and the target domain, because there is a difference in data distribution between different domains, including differences in appearance, illumination, background, etc., for a cross-domain recognition task, samples that are trained and tested are often from different data sets, i.e., the distribution of samples is different. The method can lead to that the data distribution difference of samples can not be well eliminated under the cross-domain condition by some behavior recognition methods, so that the classification effect of the model is greatly reduced, and the accuracy of the cross-domain recognition is further affected.
In order to solve the problem of low accuracy in cross-domain identification, some embodiments of the present application provide a cross-domain behavior identification method based on image pre-fusion, referring to fig. 1, fig. 1 is a flowchart of the cross-domain behavior identification method based on image pre-fusion provided by the present application, where the cross-domain behavior identification method based on image pre-fusion provided by the embodiment of the present application includes:
s10: a behavior recognition dataset is constructed, the behavior recognition dataset comprising a source domain dataset and a target domain dataset.
The method comprises the steps of constructing a behavior recognition data set for cross-domain recognition, selecting two different behavior recognition data sets as A and B respectively, selecting data with the same behavior category from A, B, and constructing a source domain data set and a target domain data set respectively. It will be appreciated that the source domain data in the source domain data set is of the same behavior class as the target domain data in the target domain data set.
In some embodiments, taking the large behavior recognition dataset UCF101 and dataset HMDB51 as an example, dataset UCF101 provides 13320 videos of 101 action behavior categories, dataset HMDB51 contains 51 action behavior categories for a total of 6849 videos. The data set UCF101 is used as a source domain data set, and the data set HMDB51 is used as a target domain data set. Data with the same behavior category is selected from the data sets UCF101 and HMDB51, respectively, for example, the two selected data sets contain the same seven behavior categories, and the data sets are respectively constructed as a source domain data set UCF7 and a target domain data set HMDB7. It should be noted that, the data in the source domain data set UCF7 and the target domain data set HMDB7 respectively have a real tag, and the real tag is a behavior class tag to which the corresponding data provided in the data set belongs.
S20: and carrying out normalization processing on the data in the source domain data set to obtain source domain data.
The data in the source domain data set and the data in the target domain data set are both image data. The image data in the source domain dataset UCF7 is normalized, which includes processing of image shapes and pixel value ranges for the image data in the source domain dataset UCF7 to adjust the image data in the source domain dataset UCF7 to the same image shapes and the same pixel value ranges. The pixel value range after normalization processing is 0-1. For example, the image shapes in the set of the source domain data UCF7 can be uniformly adjusted to 224×224×3, and the pixel value range can be adjusted to 0-1, so as to obtain the source domain data.
S30: training a neural network model using the source domain data to obtain a pre-trained model.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a behavior recognition convolutional neural network model provided by the present application, for a single frame image, image features are extracted through a convolutional layer conv, and in order to obtain information of a previous frame, part of extracted features obtained from the previous frame replace features at a corresponding position of a current frame, feature fusion is performed through the convolutional layer, and finally all features are sent to a classification layer to obtain a final behavior recognition classification prediction result. The neural network model includes a multi-layer convolution layer, a full connection layer, and a residual structure. The source domain data having an image shape 224 x 3 is input to a neural network model, which in some embodiments may be a deep full convolution neural network model that can accommodate source domain data inputs of any size.
After the source domain data is input into the neural network model, a first loss function value of the source domain data is obtained based on a cross entropy loss function, and a cross entropy loss function formula is as follows:
wherein i is the sample, c is the class, N is the number of samples in the source domain dataset, M is the number of classes in the source domain dataset, y ic As a sign function, p ic The prediction probability of class c for sample i is predicted for the model. If the true class of sample i is the same as c, y ic 1, if the true class of sample i is different from c, y ic Is 0.
And calculating a first loss function value between a predicted value and a true value of the neural network model according to the cross entropy loss function, updating model parameters through a back propagation algorithm and a random gradient descent method, and reducing the loss between the predicted value and the true value of the model, so that the predicted value of the model can be more similar to the true value to train the neural network model, and outputting model parameters of the neural network model when the first loss function value is smaller than or equal to a first loss threshold value to finish model training so as to obtain a pre-training model.
S40: and carrying out normalization processing on the data in the target domain data set to obtain target domain data.
The image shape of the target domain data after normalization processing is the same as the image shape of the source domain data, and the pixel value range of the target domain data is the same as the pixel value range of the source domain data. In some embodiments, the data size of the target domain data may be made the same as the image shape of the source domain data, also 224×224×3. The image shapes of the source domain data and the target domain data are unified to be the same size, and subsequent data fusion is facilitated.
S50: and inputting the target domain data into the pre-training model to obtain a prediction output label and a prediction confidence of the prediction output label.
Inputting the target domain data in the target domain data set HMDB7 into a pre-training model to obtain a prediction output label of the pre-training model and a prediction confidence conf of the prediction output label.
S60: and taking the prediction output label and the prediction confidence as pseudo labels of the target domain data.
According to the identification method provided by the embodiment of the application, because cross-domain identification needs to be tested on the target domain data set HMDB7, the real label of the target domain data set HMDB7 is not used in the test process. And the pseudo tag can reduce the category overlapping of the data, and the pseudo tag can make the category boundary of the data clearer and the learned category more compact.
S70: and constructing a virtual sample according to the pseudo tag.
The target domain data in the target domain data set HMDB7 is screened based on the pseudo tag of the target domain data set HMDB7 obtained in the above step S60. In some embodiments, the confidence threshold may be t=0.7, and the target domain data with the predicted confidence higher than the confidence threshold and the predicted output tag of the target domain data are fetched from the target domain data set HMDB7 to obtain fetched data.
The fetched data is fused with random data in the source domain dataset UCF7 to construct a virtual sample. It should be noted that, the random data in the source domain data set UCF7 is the data with the same real label, and the source domain data set UCF7 does not need to generate a pseudo label, because the cross-domain identification is trained on the source domain, the test is performed on the target domain, and the real label on the source domain data set UCF7 can be used during the training. In some embodiments, the fetch data is fused with random data in the source domain dataset UCF7 according to the following:
wherein lambda is the fusion proportionality coefficient, lambda epsilon [0,1 ]],To fuse data, x i To fetch data, x j Is random data. In some embodiments, λ may be 0.5, and substituting the value of λ into the above formula to obtain the fusion data +.>Using the fusion data obtained->To construct a virtual sample. Referring to fig. 3, fig. 3 is a schematic diagram of image fusion and virtual sample generation provided by the present application. In the data used for fusion, the true label of the source domain data is consistent with the pseudo label of the target domain data, and the data with inconsistent labels is not fused.
S80: acquiring fusion data in the virtual sampleIs to use the fusion data +.>And constructing a fusion data set by the fusion tag. It can be understood that the behavior category corresponding to the fusion tag is the same as the behavior category corresponding to the real tag of the source domain data or the pseudo tag of the target domain data.
S90: and carrying out normalization processing on the data in the fusion data set to obtain processed fusion data.
The image shape of the processed fusion data is the same as the image shape of the source domain data, and the pixel value range of the processed fusion data is the same as the pixel value range of the source domain data. In some embodiments, the image shape of the fusion data may be processed to be the same as the image shape of the source domain data, also 224×224×3. The target domain data and the processed fusion data are respectively identical with the image shape of the source domain data, and the pixel value ranges of the target domain data and the processed fusion data are respectively identical with the pixel value ranges of the source domain data, so that the effect of cross-domain identification is achieved.
S100: and training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model.
And inputting the processed fusion data into a pre-training model, and acquiring a second loss function value based on the cross entropy loss function. It will be appreciated that the formula of the cross-over loss function in this step is the same as the formula in step S above, except that N in the formula is the number of samples in the target domain dataset and M is the number of categories in the target domain dataset.
And calculating a second loss function value between a predicted value and a true value of the pre-training model according to the cross entropy loss function, updating model parameters through a back propagation algorithm and a random gradient descent method, and reducing the loss between the predicted value and the true value of the model, so that the predicted value of the model can be more similar to the true value to train the pre-training model, and outputting model parameters of the pre-training model when the second loss function value is smaller than or equal to a second loss threshold value to finish model training so as to obtain a cross-domain behavior recognition model.
S110: and inputting the target domain data into the cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. And identifying the cross-domain behaviors by using the obtained cross-domain behavior identification result.
In some embodiments, the cross-domain behavior recognition method based on image pre-fusion provided by the application further comprises the following steps:
the real label of the target domain data set HMDB7 is obtained, and the cross-domain behavior recognition result obtained in the step S110 is compared with the real label of the target domain data set HMDB7, namely, the prediction accuracy of the cross-domain behavior recognition model on the target domain data set is evaluated. For example: and testing the pre-training model on the target domain data set to obtain a first accuracy acc1, and testing the cross-domain recognition behavior model on the target domain data set to obtain a second accuracy acc2. If acc2> acc1, the validity of the cross-domain behavior recognition method can be described, and the cross-domain recognition performance of the cross-domain behavior recognition model can be evaluated.
According to the cross-domain behavior recognition method, the virtual sample expansion data are constructed by using the pseudo tag fusion data, so that model degradation conditions when a pre-training model is migrated to a new scene are reduced, and the purposes of increasing model robustness and cross-domain recognition capability are achieved. And then, on the basis of data fusion, guiding a convolutional neural network model to learn and fuse the characteristics of the source domain data set through a cross entropy loss function, so that the characteristics related to human actions can be deeply mined in the learning process, and analysis and test are carried out on the model performance. The method can reduce the domain difference to the greatest extent and improve the performance of the cross-domain model.
Some embodiments of the present application further provide a cross-domain behavior recognition device based on image pre-fusion, which is applied to the recognition method provided in the foregoing embodiments, where the device includes:
the acquisition module is used for acquiring data from the data set;
the processing module is used for processing the data in the source domain data set, the target domain data set and the fusion data set;
the fusion module is used for fusing the extracted data in the target domain data set with the random data in the source domain data set;
a construction module for constructing a dataset and a virtual sample;
the training module is used for training the neural network model and the pre-training model;
and the comparison module is used for evaluating the cross-domain recognition performance of the cross-domain recognition model.
As can be seen from the above technical solutions, the embodiments of the present application provide a cross-domain behavior recognition method and apparatus based on image pre-fusion, where the method includes: constructing a behavior recognition dataset comprising a source domain dataset and a target domain dataset; training a neural network model by using the normalized source domain data to obtain a pre-training model; inputting the target domain data subjected to normalization processing into a pre-training model to obtain a prediction output label and a corresponding prediction confidence; taking the prediction output label and the prediction confidence as pseudo labels of the target domain data; constructing a virtual sample according to the pseudo tag; acquiring a fusion tag of data in a virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample; training a pre-training model by using the normalized fusion data to obtain a cross-domain behavior recognition model; and inputting the target domain data into a cross-domain behavior recognition model to obtain a cross-domain behavior recognition result. The identification method can reduce the difference between the source domain and the target domain, so that the model can transfer the knowledge learned from one domain to other domains, thereby improving the generalization capability and the robustness of the model and solving the problem of low accuracy in cross-domain identification.
The above-provided detailed description is merely a few examples under the general inventive concept and does not limit the scope of the present application. Any other embodiments which are extended according to the solution of the application without inventive effort fall within the scope of protection of the application for a person skilled in the art.

Claims (10)

1. The cross-domain behavior recognition method based on image pre-fusion is characterized by comprising the following steps of:
constructing a behavior recognition data set, wherein the behavior recognition data set comprises a source domain data set and a target domain data set;
normalizing the data in the source domain data set to obtain source domain data, wherein the source domain data have the same image shape and the pixel values of the source domain data have the same value range;
training a neural network model using the source domain data to obtain a pre-training model;
normalizing the data in the target domain data set to obtain target domain data;
inputting the target domain data into the pre-training model to obtain a prediction output label and a prediction confidence of the prediction output label;
taking the prediction output label and the prediction confidence as pseudo labels of the target domain data;
constructing a virtual sample according to the pseudo tag;
acquiring a fusion tag of the data in the virtual sample, and constructing a fusion data set by utilizing the data in the virtual sample and the fusion tag;
normalizing the data in the fusion data set to obtain the processed fusion data;
training the pre-training model by using the processed fusion data to obtain a cross-domain behavior recognition model;
and inputting the target domain data into the cross-domain behavior recognition model to obtain a cross-domain behavior recognition result.
2. The image pre-fusion based cross-domain behavior recognition method according to claim 1, wherein the source domain data in the source domain data set is the same as the behavior class of the target domain data in the target domain data set.
3. The image pre-fusion-based cross-domain behavior recognition method according to claim 1, wherein the neural network model comprises a multi-layer convolution layer, a full connection layer and a residual structure.
4. The image pre-fusion based cross-domain behavioral recognition method of claim 1, wherein training a neural network model using the source domain data to obtain a pre-trained model comprises:
acquiring a first loss function value based on the cross entropy loss function;
reducing the first loss function value using a back propagation algorithm and a random gradient descent method;
and outputting model parameters of the neural network model when the first loss function value is smaller than or equal to a first loss threshold value so as to obtain the pre-training model.
5. The image pre-fusion based cross-domain behavior recognition method according to claim 1, wherein the constructing a virtual sample according to the pseudo tag comprises:
taking out the target domain data with the prediction confidence coefficient higher than a confidence coefficient threshold and the prediction output label of the target domain data to obtain taken-out data;
and fusing the extracted data with random data in the source domain data set to construct a virtual sample, wherein the random data is the data with the same label.
6. The cross-domain behavior recognition method based on image pre-fusion according to claim 5, wherein the fetched data and random data in the source domain data set are fused according to the following formula:
wherein lambda is the fusion proportionality coefficient, lambda epsilon [0,1 ]],To fuse data, x i To fetch data, x j Is random data.
7. The image pre-fusion-based cross-domain behavior recognition method according to claim 1, wherein the image shape of the target domain data is the same as the image shape of the source domain data, and the pixel value range of the target domain data is the same as the pixel value range of the source domain data; the image shape of the processed fusion data is the same as the image shape of the source domain data, and the pixel value range of the processed fusion data is the same as the pixel value range of the source domain data.
8. The method of image pre-fusion based cross-domain behavior recognition according to claim 1, wherein training the pre-training model using the fusion data to obtain a cross-domain behavior recognition model comprises:
acquiring a second loss function value based on the cross entropy loss function;
reducing the second loss function value using a back propagation algorithm and a random gradient descent method;
and outputting model parameters of the pre-training model when the second loss function value is smaller than or equal to a second loss threshold value so as to obtain the cross-domain behavior recognition model.
9. The image pre-fusion-based cross-domain behavior recognition method according to claim 1, further comprising:
acquiring a real label of the target domain data set;
comparing the cross-domain behavior recognition result with the real tag of the target domain data set to evaluate the cross-domain recognition performance of the cross-domain behavior recognition model.
10. A cross-domain behavior recognition device based on image pre-fusion, characterized in that it is applied to the recognition method of any one of claims 1 to 9, said device comprising:
the acquisition module is used for acquiring data from the data set;
the processing module is used for processing the data in the source domain data set, the target domain data set and the fusion data set;
the fusion module is used for fusing the extracted data in the target domain data set with the random data in the source domain data set;
a construction module for constructing a dataset and a virtual sample;
the training module is used for training the neural network model and the pre-training model;
and the comparison module is used for evaluating the cross-domain recognition performance of the cross-domain recognition model.
CN202311044162.0A 2023-08-18 2023-08-18 Cross-domain behavior recognition method and device based on image pre-fusion Pending CN117058716A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311044162.0A CN117058716A (en) 2023-08-18 2023-08-18 Cross-domain behavior recognition method and device based on image pre-fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311044162.0A CN117058716A (en) 2023-08-18 2023-08-18 Cross-domain behavior recognition method and device based on image pre-fusion

Publications (1)

Publication Number Publication Date
CN117058716A true CN117058716A (en) 2023-11-14

Family

ID=88654968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311044162.0A Pending CN117058716A (en) 2023-08-18 2023-08-18 Cross-domain behavior recognition method and device based on image pre-fusion

Country Status (1)

Country Link
CN (1) CN117058716A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792751A (en) * 2021-07-28 2021-12-14 中国科学院自动化研究所 Cross-domain behavior identification method, device, equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792751A (en) * 2021-07-28 2021-12-14 中国科学院自动化研究所 Cross-domain behavior identification method, device, equipment and readable storage medium
CN113792751B (en) * 2021-07-28 2024-06-04 中国科学院自动化研究所 Cross-domain behavior recognition method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN108537136B (en) Pedestrian re-identification method based on attitude normalization image generation
CN107330453B (en) Pornographic image identification method based on step-by-step identification and fusion key part detection
CN109816032B (en) Unbiased mapping zero sample classification method and device based on generative countermeasure network
CN110059586B (en) Iris positioning and segmenting system based on cavity residual error attention structure
CN111598182B (en) Method, device, equipment and medium for training neural network and image recognition
CN110458084B (en) Face age estimation method based on inverted residual error network
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
CN107247952B (en) Deep supervision-based visual saliency detection method for cyclic convolution neural network
CN111428511B (en) Event detection method and device
CN112434599B (en) Pedestrian re-identification method based on random occlusion recovery of noise channel
CN117058716A (en) Cross-domain behavior recognition method and device based on image pre-fusion
CN116994044A (en) Construction method of image anomaly detection model based on mask multi-mode generation countermeasure network
CN115393698A (en) Digital image tampering detection method based on improved DPN network
CN114170484A (en) Picture attribute prediction method and device, electronic equipment and storage medium
CN117611838A (en) Multi-label image classification method based on self-adaptive hypergraph convolutional network
CN117011219A (en) Method, apparatus, device, storage medium and program product for detecting quality of article
CN114495114B (en) Text sequence recognition model calibration method based on CTC decoder
CN116030077A (en) Video salient region detection method based on multi-dataset collaborative learning
CN115546689A (en) Video time sequence abnormal frame detection method based on unsupervised frame correlation
CN117036843A (en) Target detection model training method, target detection method and device
CN112396126B (en) Target detection method and system based on detection trunk and local feature optimization
Yang et al. iCausalOSR: invertible Causal Disentanglement for Open-set Recognition
CN115131600A (en) Detection model training method, detection method, device, equipment and storage medium
CN111582057B (en) Face verification method based on local receptive field
CN114818945A (en) Small sample image classification method and device integrating category adaptive metric learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination