CN116486112A - RGB-D significance target detection method based on lightweight cross-modal fusion network - Google Patents

RGB-D significance target detection method based on lightweight cross-modal fusion network Download PDF

Info

Publication number
CN116486112A
CN116486112A CN202310410912.5A CN202310410912A CN116486112A CN 116486112 A CN116486112 A CN 116486112A CN 202310410912 A CN202310410912 A CN 202310410912A CN 116486112 A CN116486112 A CN 116486112A
Authority
CN
China
Prior art keywords
rgb
features
modal
saliency
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310410912.5A
Other languages
Chinese (zh)
Inventor
夏晨星
王晶晶
高修菊
葛斌
段松松
赵文俊
李续兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University of Science and Technology
Original Assignee
Anhui University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Science and Technology filed Critical Anhui University of Science and Technology
Priority to CN202310410912.5A priority Critical patent/CN116486112A/en
Publication of CN116486112A publication Critical patent/CN116486112A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/768Arrangements for image or video recognition or understanding using pattern recognition or machine learning using context analysis, e.g. recognition aided by known co-occurring patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer vision, and provides a RGB-D saliency target detection method based on a lightweight cross-modal fusion network, which comprises the following steps: 1) Acquiring an RGB-D dataset for training and testing the task and defining an algorithm object 2) of the invention, constructing an encoder for extracting RGB image features and an encoder for depth image features; 3) Establishing a cross-modal characteristic fusion network, and enhancing the expression of the characteristic features of the RGB image and the depth image through a progressive guiding attention mechanism; 4) Based on the multi-modal characteristics fused by the cross-modal characteristics, a lightweight global context integration module is constructed to extract the multi-scale context characteristics of the fused mode; 5) A simple and efficient multi-path aggregation module is constructed to integrate the fusion features, the original RGB and depth map features, and a final predicted saliency map is obtained by activating the function.

Description

RGB-D significance target detection method based on lightweight cross-modal fusion network
Technical field:
the invention relates to the field of computer vision and image processing, in particular to a high-efficiency light RGB-D (red, green and blue-digital) saliency target detection method.
The background technology is as follows:
the task of Salient Object Detection (SOD) is to find the most attractive objects in a scene by simulating the visual attention mechanism of humans. It has instructive implications in many computer vision processing tasks, including weakly supervised semantic segmentation, vision tracking, object recognition, and video analysis. The existing SOD method mainly focuses on the processing of RGB images and obtains good performance. However, they can only use visual cues in RGB images, which in some complex scenes encounter serious obstacles such as cluttered backgrounds, similar foreground and background. The main reason for this is that RGB images provide enough visual cues but lack defined spatial structure information. Meanwhile, along with the popularization of the depth sensor, the depth map can be conveniently acquired. The embedded depth information in the complex scene processing process is used as the supplement of the spatial structure information, so that the RGB can be helped to complete the robust significance detection. Due to the introduction of depth maps, RGB-D SODs have made tremendous progress in recent years.
Many RGB-D SOD processes benefit greatly from very deep and extensive models and achieve significant results. However, the success comes at the cost of a heavy computational burden and slow running speed. These models increase the depth and width of the network by adjusting the number of layers and channels, which brings about huge parameters and calculations. Considering calculation of a model and memory consumption, the invention designs a high-efficiency lightweight cross-modal fusion network for RGB-D SOD to realize light-weight and high-efficiency RGB-D significance target detection segmentation. Specifically, a feature interaction module (CMI) for fusing RGB and depth maps is first proposed. Context information is extracted from a single modality by depth separable convolution, RGB and depth map features are enhanced by a progressive guided attention mechanism (PAG) respectively, and features of all modalities are integrated by a multi-source feature integration unit (MAU). Finally, considering the saliency information retained by the original RGB and depth maps, the invention designs a multi-path aggregation Module (MPA) in the decoder to integrate the fusion features from different layers in a coarse-to-fine fusion manner.
The invention comprises the following steps:
aiming at the problems, the invention provides a RGB-D significance target detection method based on a lightweight cross-modal fusion network, which adopts the following technical scheme:
1. an RGB-D dataset is acquired that trains and tests the task.
1.1 The NJU2K data set, the NLPR data set, and the NLPR data set, the remaining NJU2K data set, the NLPR data set, the SIP data set, the STERE data set, and the DES data set are taken as test sets.
1.2 RGB-D image dataset comprising color image I RGB Corresponding depth image I Depth And a corresponding artificially annotated salient object segmentation image P.
2. Constructing a saliency target detection model network for extracting RGB image features and depth image features by using a convolutional neural network;
2.1 Using MobileNet-v3 as the backbone network of the model of the present invention for extracting RGB image features and depth image features of the causal pairs, respectively And->
3. Based on the multi-scale RGB image features extracted in step 2And corresponding depth image features->And utilizing the extracted features of each layer to perform cross-modal feature fusion. Since the lowest level features contain too much noise we do not use here, only 2,3,4 and 5 level features are used for fusion. Similarly, depth encoders also use only 2,3,4, and 5 layer features.
3.1 Cross-modal feature interaction network consisting of 4 levels of CMI modules and 4 levels of RGB image featuresAnd corresponding depth image features-> Composing and generating 4 layers of multi-modal features +.>And->
3.2 I) the input data of the CMI module of the i-th hierarchy isAnd->The composition and the output of the multi-modal feature of the ith hierarchy by the multi-source integration unit>Where i ε {2,3,4,5}.
3.3 The CMI module generates multi-modal features through a progressively guided attention mechanism, the specific process is as follows:
3.3.1 Firstly, the invention adopts a depth separable convolution module to extract the features of a single mode, enhances the expression capability of the significance of the features, and can further enhance the expression of RGB and depth features through the convolution module.
3.3.2 Then further feature extraction and enhancement processing is performed on the RGB and depth features, respectively, using two parallel progressively guided attention mechanisms. In order to obtain the global information of a single mode, we use two parallel channel attentions to extract features of the RGB and depth maps respectively:
where DSConv () represents a depth separable convolution module, AVG () represents a global average pooling operation at a channel level, sigmoid () represents a Sigmoid activation function.
3.3.3 At the time of obtaining global information X r And global information X d After that, we apply to X r And X d And then, learning local details so as to prevent the loss of local details of a plurality of remarkable targets, and generating a space feature map with a plurality of receptive fields by utilizing a multi-scale space attention mechanism:
wherein max () represents a max pooling operation, cat () represents a stitching operation, C 1 ()、C 3 () And C 5 () Representing voids with void fractions 1, 3 and 5, respectivelyHole convolution operation.
3.3.4 For all features Z by means of a multisource integration unit r 、Z dAnd->Integrating and fusing RGB features Z r And depth image Z d Finally, the fusion characteristic F is obtained i fusion
Where i ε {2,3,4,5} represents the hierarchy of the model where the feature is located, conv1 () represents the convolution operation with a convolution kernel size of 1×1, DSConv () represents the depth separable convolution operation, cat () represents the feature stitching operation, and add represents the addition operation.
4) Through the operation, the multi-mode characteristics of 4 layers are extracted Andand inputting the 4 layers into a context information extraction module, and enhancing the receptive field information of the multi-modal features and promoting the expression of the significance targets through convolution operations of multiple layers and different sizes.
4.1 Respectively extracting the context information of the fusion features from the multi-modal features through the context operation:
where i ε {2,3,4,5} represents the hierarchy where the fused feature is located and GCM () represents the contextual feature extraction module.
4.2 Inputting the context information modal characteristics generated by the steps into a decoder, integrating the fusion characteristics through a multi-path aggregation module, and integrating the RGB characteristics of each layerAnd depth feature per layer->
Wherein MPA () represents a multipath aggregation module, deconv () represents a deconvolution operation, S out Representing the predicted saliency map, i epsilon {2,3,4,5} represents the hierarchy of the model in which the fused feature is located, and finally we can obtain the final saliency map S out
5) Saliency map S predicted by the present invention out And calculating a loss function with the artificially marked salient object segmentation graph G, gradually updating the parameter weight of the model provided by the invention through SGD and a back propagation algorithm, and finally determining the structure and the parameter weight of the RGB-D salient detection algorithm.
6) On the basis of determining the structure and parameter weight of the model in step 5, testing the RGB-D image pairs on the test set to generate a saliency map S test And using MAE, S-measure, F-measure, E-measure evaluation index for evaluation.
The invention uses a lightweight MobileNet-v3 as backbone network, thereby avoiding heavy computation. Unlike previous fusion methods, here we do not do excessive modal interactions in order to avoid creating invalid fusion features. But uses a simple and efficient attention-directing mechanism to enhance the characterization capability of the features, and finally uses a multi-source integration unit to perform the final fusion operation. In the decoder, to be able to obtain more efficient saliency information, we use a simple multipath integration module to obtain the final saliency map. To make the whole network more lightweight we use separable convolution to learn the features. The invention can have certain robustness.
Drawings
FIG. 1 is a schematic view of a model structure according to the present invention
FIG. 2 is a schematic diagram of a cross-modal feature fusion module
FIG. 3 is a schematic diagram of a global context module
FIG. 4 is a schematic diagram of a multi-path aggregation module
FIG. 5 is a schematic diagram of model training and testing
Detailed Description
The following description of the embodiments of the present invention will be made more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown, and in which embodiments of the invention are shown, by way of illustration only, and not all embodiments in which the invention may be practiced. All other embodiments, which are obtained by a person of ordinary skill in the art without any inventive effort, are within the scope of the present invention based on the embodiments of the present invention.
Referring to fig. 1, an RGB-D saliency target detection method based on a lightweight cross-modal fusion network mainly includes the following steps:
1. the RGB-D data sets for training and testing the task are acquired and the algorithm targets of the present invention are defined and the training set and test set for training and testing the algorithm are determined. The NJU2K data set and the NLPR data set are used as training sets, and the rest data sets are the most tested sets and comprise the rest NJU2K data set, the rest NLPR data set, the SIP data set, the STERE data set and the DES data set.
2. Constructing a saliency target detection model backbone network for extracting RGB image features and depth image features by using a MobileNet-v3 network, wherein the backbone network comprises an encoder for extracting RGB image features and an encoder for extracting depth image features:
2.1. the RGB image with three channels is input to an RGB encoder to generate 4 levels of RGB features, respectivelyAnd->Since the lowest level features contain too much noise we do not use here, only 2,3,4 and 5 level features are used for fusion. Similarly, depth encoders also use only 2,3,4, and 5 layer features.
2.2. Inputting the three-channel depth image into a depth encoder to generate 4 layers of depth image features, namelyAnd->
3. Referring to fig. 2, the 4-level RGB image generated in step 2 is characterized by a cross-modal fusion moduleAnd->And depth image feature->And->Cross-modal fusion is carried out to obtain multi-modal characteristics of 4 layers +.> And->The main steps are as followsThe illustration is:
3.1. the cross-modal feature fusion network consists of 4 layers of CMI modules, and the cross-modal feature fusion network is characterized by 4 layers of RGB image featuresAnd->And corresponding depth image features-> And->Composing and generating 4 layers of multi-modal features +.> And->
33.2 I) the input data of the CMI module of the i-th hierarchy isAnd->The composition and the output of the multi-modal feature of the ith hierarchy by the multi-source integration unit>Where i ε {2,3,4,5}.
3.3 The CMI module generates multi-modal features through a progressively guided attention mechanism, the specific process is as follows:
3.3.1 Firstly, the invention adopts a depth separable convolution module to extract the features of a single mode, enhances the expression capability of the significance of the features, and can further enhance the expression of RGB and depth features through the convolution module.
3.3.2 Then further feature extraction and enhancement processing is performed on the RGB and depth features, respectively, using two parallel progressively guided attention mechanisms. In order to obtain the global information of a single mode, we use two parallel channel attentions to extract features of the RGB and depth maps respectively:
where DSConv () represents a depth separable convolution module, AVG () represents a global average pooling operation at a channel level, sigmoid () represents a Sigmoid activation function.
3.3.3 At the time of obtaining global information X r And global information X d After that, we apply to X r And X d And then, learning local details so as to prevent the loss of local details of a plurality of remarkable targets, and generating a space feature map with a plurality of receptive fields by utilizing a multi-scale space attention mechanism:
wherein Conv3 () represents a convolution module with a convolution kernel of 3×3, max () represents a max pooling operation, cat () represents a splicing operation, C 1 ()、C 3 () And C 5 () Representing hollow rolls with hollow rates of 1, 3 and 5, respectivelyAnd (5) performing product operation.
3.3.4 For all features Z by means of a multisource feature integration unit r 、Z dAnd->Integrating and fusing RGB features Z r And depth image Z d Finally, fusion characteristics are obtained>
Where i ε {2,3,4,5} represents the hierarchy of the model where the feature is located, conv1 () represents the convolution operation with a convolution kernel size of 1×1, DSConv () represents the depth separable convolution operation, cat () represents the feature stitching operation, and add represents the addition operation.
4. Referring to fig. 3, multi-modal features extracted into 4 levels And->And inputting the 4 layers into a context information extraction module, and enhancing the receptive field information of the multi-modal features and promoting the expression of the significance targets through convolution operations of multiple layers and different sizes.
4.1 Respectively extracting the context information of the fusion features from the multi-modal features through the context operation:
where i ε {2,3,4,5} represents the hierarchy where the fused feature is located and GCM () represents the contextual feature extraction module.
4.2 Referring to fig. 4, the context information modal characteristics generated in the above steps are input into a decoder, and the fusion characteristics are integrated by a multi-path aggregation module, and the RGB characteristics of each layerAnd depth feature per layer->
Where MPA () represents a multipath aggregation module, deconv () represents a deconvolution operation,representing the predicted saliency map, i epsilon {2,3,4,5} represents the hierarchy of the model where the fusion feature is located, and finally we can obtain the final saliency map
5) Saliency maps predicted by the present inventionAnd calculating a loss function with the artificially marked salient object segmentation graph G, gradually updating the parameter weight of the model provided by the invention through SGD and a back propagation algorithm, and finally determining the structure and the parameter weight of the RGB-D salient detection algorithm.
6) On the basis of determining the structure and parameter weight of the model in step 5, testing the RGB-D image pairs on the test set to generate a saliency map S test And using MAE, S-measure, F-measure, E-measure evaluation index for evaluation.
The foregoing is a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and variations may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (5)

1. The RGB-D significance target detection method based on the lightweight cross-modal fusion network is characterized by comprising the following steps:
1) Acquiring an RGB-D data set for training and testing the task, defining an algorithm target of the invention, and determining a training set and a testing set for training and testing an algorithm;
2) Constructing an encoder for extracting RGB image features and an encoder for extracting depth image features;
3) Establishing a lightweight network for fusing RGB features and depth map features, and guiding the fusion of the RGB features and the depth image features through a depth separable rolling and attention mechanism;
4) Based on the multi-modal characteristics fused by the cross-modal characteristics, a multi-scale context capturing mechanism is constructed to extract multi-modal characteristic context information;
5) Establishing a simple and efficient multipath aggregation decoder for fusing RGB, depth features and fusion features, and obtaining a final predicted saliency map through an activation function;
6) The predicted saliency map P and the artificially marked saliency target segmentation map G are subjected to loss function calculation, the parameter weights of the model provided by the invention are gradually updated through SGD and a back propagation algorithm, and finally the structure and the parameter weights of the RGB-D saliency detection algorithm are determined.
7) And (3) testing RGB-D image pairs on a test set on the basis of determining the structure and the parameter weight of the model in the step (6), generating a saliency map S, and performing performance evaluation by using an evaluation index.
2. The RGB-D saliency target detection method based on the lightweight cross-modal fusion network of claim 1, wherein the method is characterized by: the specific method of the step 2) is as follows:
2.1 With the NJUK data set and the NLPR data set as training sets and the remaining NLPR data set, the NJU2K data set, the SIP data set, the fire data set, and the DES data set as test sets.
2.2 RGB-D image dataset comprising a single color image I RGB Corresponding depth image I Depth And a corresponding artificially annotated salient object segmentation image G.
3. The RGB-D saliency target detection method based on the lightweight cross-modal fusion network of claim 1, wherein the method is characterized by: the specific method of the step 3) is as follows:
3.1 Using MobileNet-v3 as the backbone network of the model of the present invention for extracting RGB image features and corresponding depth image features, respectivelyE、/>And->
3.2 Initializing the MobileNet-v3 weights of the present invention for constructing a backbone network with pre-trained MobileNet-v3 parameter weights on an ImageNet dataset.
4. The RGB-D saliency target detection method based on the lightweight cross-modal fusion network of claim 1, wherein the method is characterized by: the specific method of the step 4) is as follows:
4.1 Cross-modal feature fusion network is composed of 4 layers of CMI modules and generates 4 layers of multi-modal featuresAnd->
4.2 I) the input data of the CMI module of the i-th hierarchy isAnd->Constituted and outputting the multi-modal feature of the ith hierarchy by the progressive guided attention mechanism +.>Where i ε {2,3,4,5}.
5. The RGB-D saliency target detection method based on the lightweight cross-modal fusion network of claim 1, wherein the method is characterized by: the specific method of the step 5) is as follows:
5.1 A multi-scale depth separable convolution operation, respectively, using different kernel sizes to obtain multiple acceptance domains, which can capture rich context information:
where i ε {2,3,4,5} represents the hierarchy where the fused feature is located and GCM () represents the contextual feature extraction operation.
6) Inputting the 4-level multi-mode features with a plurality of receiving domains, which are obtained in the step 5, into a decoder formed by a multi-path integration network to obtain a final fusion feature, and activating through a sigmoid function to obtain a predicted saliency map S:
where MPA () represents a multipath aggregation module.
7) The loss function is calculated by the predicted saliency map S and the artificially marked saliency target segmentation map G, the parameter weight of the model provided by the invention is gradually updated by SGD and a back propagation algorithm, and the structure and the parameter weight of the RGB-D saliency detection algorithm are finally determined.
CN202310410912.5A 2023-04-18 2023-04-18 RGB-D significance target detection method based on lightweight cross-modal fusion network Pending CN116486112A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310410912.5A CN116486112A (en) 2023-04-18 2023-04-18 RGB-D significance target detection method based on lightweight cross-modal fusion network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310410912.5A CN116486112A (en) 2023-04-18 2023-04-18 RGB-D significance target detection method based on lightweight cross-modal fusion network

Publications (1)

Publication Number Publication Date
CN116486112A true CN116486112A (en) 2023-07-25

Family

ID=87222602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310410912.5A Pending CN116486112A (en) 2023-04-18 2023-04-18 RGB-D significance target detection method based on lightweight cross-modal fusion network

Country Status (1)

Country Link
CN (1) CN116486112A (en)

Similar Documents

Publication Publication Date Title
CN108875807B (en) Image description method based on multiple attention and multiple scales
Hu et al. Learning supervised scoring ensemble for emotion recognition in the wild
WO2019228358A1 (en) Deep neural network training method and apparatus
CN109815826B (en) Method and device for generating face attribute model
CN113240691B (en) Medical image segmentation method based on U-shaped network
CN113628294B (en) Cross-mode communication system-oriented image reconstruction method and device
CN110414432A (en) Training method, object identifying method and the corresponding device of Object identifying model
CN112784764A (en) Expression recognition method and system based on local and global attention mechanism
CN108427740B (en) Image emotion classification and retrieval algorithm based on depth metric learning
CN113435520A (en) Neural network training method, device, equipment and computer readable storage medium
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN112861659A (en) Image model training method and device, electronic equipment and storage medium
CN114283315A (en) RGB-D significance target detection method based on interactive guidance attention and trapezoidal pyramid fusion
CN116524593A (en) Dynamic gesture recognition method, system, equipment and medium
CN112241959A (en) Attention mechanism generation semantic segmentation method based on superpixels
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN110704665A (en) Image feature expression method and system based on visual attention mechanism
CN114764870A (en) Object positioning model processing method, object positioning device and computer equipment
Abuowaida et al. Improved deep learning architecture for depth estimation from single image
CN114066844A (en) Pneumonia X-ray image analysis model and method based on attention superposition and feature fusion
CN117576149A (en) Single-target tracking method based on attention mechanism
CN116958324A (en) Training method, device, equipment and storage medium of image generation model
CN116310396A (en) RGB-D significance target detection method based on depth quality weighting
CN109583406B (en) Facial expression recognition method based on feature attention mechanism
CN116486112A (en) RGB-D significance target detection method based on lightweight cross-modal fusion network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination