NL2030745B1 - Computer system for saliency detection of rgbd images based on interactive feature fusion - Google Patents

Computer system for saliency detection of rgbd images based on interactive feature fusion Download PDF

Info

Publication number
NL2030745B1
NL2030745B1 NL2030745A NL2030745A NL2030745B1 NL 2030745 B1 NL2030745 B1 NL 2030745B1 NL 2030745 A NL2030745 A NL 2030745A NL 2030745 A NL2030745 A NL 2030745A NL 2030745 B1 NL2030745 B1 NL 2030745B1
Authority
NL
Netherlands
Prior art keywords
image
feature
convolution
salience
color
Prior art date
Application number
NL2030745A
Other languages
Dutch (nl)
Inventor
Fang Zhijun
Zhao Xiaoli
Zhang Zhuorao
Chen Zheng
Original Assignee
Univ Shanghai Eng Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Shanghai Eng Science filed Critical Univ Shanghai Eng Science
Priority to NL2030745A priority Critical patent/NL2030745B1/en
Application granted granted Critical
Publication of NL2030745B1 publication Critical patent/NL2030745B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

A computer system for saliency detection of RGBD images based on interactive feature fusion, for each image in an image sample set, first a multi-layer convolutional neural 5 network module is used to extract a multi-level color and depth image feature from a color image and a depth image respectively, a cross-feature fusion module is used to perform multi-level dot product fusion on the color and depth image feature extracted by deep convolution to obtain an initial salient image, then an Inception structure is used to perform multi-scale fusion on the initial salient image to output a network predicted salient 10 image, finally, the network predicted salient image and a target salient image are used to solve a focus entropy loss function to learn an optimal parameter of the image saliency detection model and obtain a trained image saliency detection model, so as to perform saliency detection of a to-be-processed RGBD image.

Description

COMPUTER SYSTEM FOR SALIENCY DETECTION OF RGBD IMAGES
BASED ON INTERACTIVE FEATURE FUSION
TECHNICAL FIELD
[01] The present disclosure relates to a technical field of image processing, in particular to a computer system for saliency detection of RGBD images based on interactive feature fusion.
BACKGROUND ART
[02] In application fields such as autonomous driving, robotics and virtual reality, finding salient objects a scene and filtering information that is weakly related to tasks is of great significance to reduce computational complexity of a system and improve an ability to understand the scene and is one of the core issues and research hotspots in a field of computer vision.
[03] In recent years, with a wide application of deep convolutional neural networks in the field of image processing, saliency detection has developed rapidly, and a large number of saliency models based on visual features such as color and brightness have been proposed. In “Visual saliency based on multiscale deep feature”, Li et al. use a deep neural network for the first time to build a saliency model based on multi-scale features; in “Deeply Supervised Salient Object Detection with Short Connections”, Hou et al. proposes a DSS model, which uses a fully convolutional network (FCN) to extract multi-layer and multi-scale features, and then fuses the extracted multi-layer and multi-scale features together by introducing a skip layer structure; in “Attentive feedback network for boundary-aware salient object detection”, Feng et al. use a global perceptron module to refine the most salient features as a whole, and use an attention feedback module to transfer information between the corresponding codecs.
[04] However, saliency detection of RGB images faces two major challenges: one is that when a target and a background have similar appearance, it is difficult to distinguish the target and the background by relying only on RGB information; the other 1s that when the same object includes different colors, it is easy to be misjudged as different object. A depth map includes rich spatial structure and three-dimensional layout information, which can provide a large number of additional clues to distinguish the target from the background on a basis of ensuring the integrity of a detection region. Therefore, the use of depth information can effectively improve an effect of the saliency detection. In “An in depth view of saliency”, Ciptadi et al. introduce depth information based on RGB for the first time, and proposes a saliency segmentation model based on RGB-D; in “Rgbd salient object detection: a benchmark and algorithms”, Peng et al. propose a multi-stage RGB-D model that simultaneously considers depth and appearance cues from low-level feature contrast, mid-level region grouping, and high-level prior enhancement; in “Progressively complementarity-aware fusion network for RGB-D salient object detection”, Chen et al. design a complementary perceptual fusion module to learn color and depth complementary information, and densely increase layer-by-layer supervision from deep to shallow to gradually fuse multi-level information through a cascaded module; and in “Depth-induced multi-scale recurrent attention network for saliency detection”, Piao et al. propose a depth-induced multi-scale recurrent attention network, which uses deep refinement blocks including residual structures to fuse color and depth complementary information, and combines multi-scale contextual features with depth information to accurately locate salient objects , while using a recurrent attention module to obtain more improvements in model performance.
[05] In summary, the existing RGB-D saliency detection methods mainly propose some sub-networks based on a backbone network to learn color and depth complementary information, and perform feature fusion, but most of the network structures are very large, the number of parameters is large, and the training is difficult.
SUMMARY
[06] The present disclosure provides a computer system for saliency detection of
RGBD images based on interactive feature fusion, proposes a novel interactive dual-stream saliency detection framework, which designs a global and local feature extraction convolution block (GL Block) used to obtain a global feature and guide local feature extraction, proposes a dot product method to obtain common features of color images and depth images, and build a cross-model feature fusion module (CFFM) to cross-fuse the feature information of color images and depth images. The detection method performs saliency detection with high accuracy and few model parameters.
[07] The present disclosure can be realized through the following technical solutions:
[08] A computer system for saliency detection of RGBD images based on interactive feature fusion, includes: a processor, a memory, and a computer program stored on the memory and running on the processor, and when the processor executes the computer program, the following modules are executed:
[09] an image sample set establishing module, establishing an image sample set for training;
[10] a saliency detection model establishing module, establishing an image saliency detection model;
[11] for each image in the image sample set, first using a multi-layer convolutional neural network module to extract a multi-level color and depth image feature from a color image and a depth image respectively, and using a cross-feature fusion module to perform multi-level dot product fusion on the color and depth image feature extracted by deep convolution to obtain an initial salient image; then using an Inception structure to perform multi-scale fusion on the initial salient image to output a network predicted salient image; finally, using the network predicted salient image and a target salient image to solve a focus entropy loss function to learn an optimal parameter of the image saliency detection model and obtain a trained image saliency detection model;
[12] an output module, inputting a to-be-processed RGBD image into the trained image saliency detection model and outputting a corresponding saliency detection result which 1s a saliency map through a model calculation.
[13] Further, the cross-feature fusion module including a first convolution and a second convolution uses the first convolution to perform feature extraction on a color image feature, uses the second convolution to perform feature extraction on a depth image feature, and a common feature of the color image feature and the depth image feature is extracted by a dot product method, fused and transformed, and then a third convolution is used to merge the fused feature with an original color image feature and an original depth image feature through convolution and activation operations, respectively.
[14] Further, structures of the first convolution, the second convolution and the third convolution are same.
[15] Further, the multi-layer convolutional neural network module includes two same branches, which act on the color image and the depth image respectively, and both adopt a
FCN structure comprising five layers of convolution, wherein the first convolution adopts a standard convolution block, and all other layers of convolution use a global-local feature extraction convolution block;
[16] the global-local feature extraction convolution block includes a global branch and a local branch, the local branch first reduces an input feature map to 1/4 of an original feature map with a convolution with a step size of 2, and then uses two identical convolutions with a step size of 1 to extract a local feature, the global branch adopts a bottleneck structure to extract a global feature, finally, the extracted global feature and local feature are fused by using a dot product method.
[17] Further, a size of a convolution kernel of the convolutions with the step size of 1 is 3x3, and an activation function is ReLU.
[18] Further, the focus entropy loss function L(y.5) is set as: —(1-a) 3 log(1-§).y=0
[20] wherein, y and J represent the target salient image and the network predicted salient image respectively, y represent a constant, and « represent a balance factor. [BI] The beneficial technical effects of the present disclosure are:
[22] A novel interactive dual-stream saliency detection framework 1s adopted, which can well detect a salient region and generate an accurate saliency map, thereby improving the detection efficiency and accuracy of a saliency target. The experimental results show that the comprehensive experiments on three public data sets of NJU2000, NLPR and
STEREO show that the present disclosure has a good detection effect on mainstream evaluation indicators. In addition, the method of the present disclosure is simple and 5 reliable, easy to operate, easy to implement, and easy to popularize and apply.
BRIEF DESCRIPTION OF THE DRAWINGS
[23] FIG. 1 is a schematic structural diagram of a dual-stream network of the present disclosure;
[24] FIG. 2 is a schematic structural diagram of a global-local feature extraction convolution block (GL Block) of the present disclosure;
[25] FIG. 3 is a schematic structural diagram of a cross feature fusion module (CFFM) of the present disclosure;
[26] FIG. 4 is a schematic diagram of a comparison result of saliency detection by using a method of the present disclosure and other methods;
[27] FIG. 5is a P-R curve comparison diagram of saliency detection by using a method of the present disclosure and other methods;
[28] FIG. 6 is a model size comparison diagram of saliency detection by using a method of the present disclosure and other methods.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[29] The specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings and preferred embodiments.
[30] The present disclosure proposes a computer system for saliency detection of
RGBD images based on interactive feature fusion whose network framework adopts a dual-stream network as shown in FIG. 1, a proposed global-local feature extraction convolution block (GL Block) is used to obtain and fuse global and local features, and replace an original standard convolution block in FCN to generate an initial saliency map; in order to obtain a common salient feature of color and depth information, a cross-feature fusion module (CFFM) based on a dot product method is proposed; considering that a shallow feature has more noise, a deep layer of the present disclosure in a FCN network uses the CFFM to cross-fuse color and depth features to reduce redundant features; finally, the initial saliency map 1s fused through an Inception structure to improve the scale adaptability of a network. Details are as follows:
[31] An image sample set establishing module:
[32] scales a color image, a depth map and a manual annotation map of a corresponding saliency map in each RGB-D image in an image sample set together to enable a computing device to take on a computational load of a neural network, can also perform random cropping, horizontal flipping and other operations together to increase the diversity of data; then normalizes the color image and depth map in the image sample set to highlight a foreground feature of an image.
[33] Asaliency detection model establishing module:
[34] 1, for each image in the image sample set, first uses a multi-layer convolutional neural network module to extract a multi-level color and depth image feature from a color image and a depth image respectively.
[35] For a segmentation network, the larger the receptive field, the larger the range captured by the network, the more information that can be used for analysis, the better the segmentation effect. A receptive field of a convolutional layer located in a shallow layer is relatively narrow, which retains a large amount of detailed information, helping to refine a segmentation image; the receptive field of a deep convolutional layer is relatively wide, which can be used to learn some abstract features and improve the classification performance. A FCN network adopts a skip-level structure and makes full use of the shallow information to assist the gradual upsampling, so as to obtain the refined segmentation image. However, in FCN, an actual receptive field of fc7 layer is only 1/4 of a full image, not an entire image, which is not enough to complete the task well. In order to obtain a larger receptive field, methods of increasing a network depth and using large convolution kernels are usually used. However, capturing the global context information through the former will not only greatly increase the network burden, but also easily cause gradient explosion and gradient disappearance; the latter will lead to a sudden increase in the amount of calculation, which is not conducive to the increase of the network depth, and the calculation performance will also be reduced.
[36] Based on the above problems, the present disclosure designs the global-local feature extraction convolution block (GL Block), which adopts a dual-branch structure to extract local and global features respectively, whose structure is shown in FIG. 2, and is used in the dual-stream network as shown in FIG. 1 proposed by the present disclosure, wherein the multi-layer convolutional neural network module includes two same branches, which act on the color image and the depth image respectively and both include five convolution blocks, and the first is the standard convolution block, the rest are the GL
Block proposed by the present disclosure. Then, deconvolution is used for upsampling, and the shallow information is fused through skip-level connections. In this way, each convolution block can perform global feature extraction, which will not increase the network burden, but also ensure the calculation speed and is conducive to the optimization of the entire network structure.
[37] The GL Block proposed by the present disclosure is the dual-branch structure, namely a local branch and a global branch, so as to extract a local feature and a global feature respectively. The local branch first reduces an input feature map to 1/4 of an original feature map with a convolutional layer with a step size of 2, a size of a convolution kernel of 3x3, and an activation function of ReLU, and then uses two identical convolutions with a step size of 1 to extract the local feature; reduce the amount of branch network calculation, the global branch adopts a bottleneck structure, that is, a global average pooling layer is used to explicitly extract the global features, whose purpose is to integrate the global spatial information of the entire image. After a series of convolution operations, Softmax is used to learn a global feature distribution, and finally, the dot product method is used to fuse the global feature and the local feature.
[38] 2, uses a cross-feature fusion module to perform multi-level dot product fusion on the color and depth image feature extracted by deep convolution to obtain an initial salient image.
[39] Since the design of existing cross-modal feature fusion methods is mostly based on addition or cascade, not only the structure is complex, the amount of calculation is large, but also redundant noise is easily introduced. Inspired by an attention mechanism, the present disclosure adopts the dot product method to build the cross-feature fusion module (CFFM), as shown in FIG. 3, which is used to fuse the color image feature f. eR” Ee with distinct appearance and texture information and the depth image feature f; € R” re that provides clear object shape, contour, and spatial structure. Considering that the shallow depth feature contains a lot of noise, the present disclosure applies the cross feature fusion module to a deeper layer in the multi-layer convolutional neural network module.
[40] The cross feature fusion module including a first convolution and a second convolution uses the dot product method to fuse the color image feature f and the depth image feature f;, uses the first convolution to perform feature extraction and channel compression on the color image feature f extracted by a branch in the multi-layer convolutional neural network module, so as to reduce the calculation amount of the module and facilitate subsequent processing; at the same time, uses the second convolution to perform feature extraction and channel compression on the depth image feature fy extracted by another branch. Then a common feature of the color image feature f and the depth image feature fs is extracted by the dot product method, fused and transformed to make the fusion feature have clear boundaries and semantic consistency, and then a third convolution is used to merge the fused feature with the original color image feature f- and the original depth image feature fs through convolution and activation operations, and then merge with the original feature by addition if the channel is restored. In this way, through multiple cross feature fusions, the color image feature /- and the depth image feature fs will gradually absorb the useful information of each other, make them complementary, the redundant information of the color image feature fis reduced, and the boundaries of the depth image feature /: are sharpened. Finally, a 3 x 3 convolution is still used to restore the original channel and added to the original color image feature f. and depth image feature fy,
which are represented by the refined feature. The process can be expressed by following formulas: ay AANA)
Ja = Ja TW WF) Wa (fa)
[42] wherein, W,, Wa and W: are all network parameters of the 343 convolution for compressing and restoring channels.
[43] The entire cross feature fusion module adopts a symmetrical structure. After dot product, the original color image feature f- and depth image feature {: extracted by the two branches in the multi-layer convolutional neural network module are respectively introduced back to corresponding branches of the multi-layer convolutional neural network module, multiplying the two, the shared information will be larger, the color image feature frtransfers the detail information to the depth image feature fs to refine an edge, and the depth image feature f; transfers the saliency semantics to the color image feature f: to discard redundant information, so the edge can be refined, and redundant information does not appear in both color and depth components.
[44] 3, uses an Inception structure to perform multi-scale fusion on the initial salient image to output a network predicted salient image; finally, uses the network predicted salient image and a target salient image to solve a focus entropy loss function to learn an optimal parameter of the image saliency detection model and obtain a trained image saliency detection model.
[45] The Inception structure is used to fuse initial salient images of the color and depth output by the depth branch and the color branch, and output the network predicted salient image. The structure achieves an expected purpose by connecting the small convolution kernel and the large convolution kernel in parallel, while compressing parameter amount of the model.
[46] Because the focus entropy loss function cannot solve the problem of unbalanced positive and negative samples and unbalanced background and foreground in real scenes.
Therefore, the present disclosure introduces Focalloss to solve this problem, and a formula is as follows:
-(1-a)$"log{1-9).y=0
[48] wherein, y and J represent the target salient image and the network predicted salient image respectively, 7 represent a constant, which reduces a loss of easy-to-classify samples and makes the network pay more attention to difficult samples, and « represent a balance factor, which increases the contribution of the foreground to the loss function to balance positive and negative samples.
[49] An output module inputs a to-be-processed RGBD image into the trained image saliency detection model and outputs a corresponding saliency detection result which is a saliency map through a model calculation.
[50] The model of the present disclosure is implemented based on PyTorch, a machine graphics card is configured as two GTX1080Ti (11 GB), an Adam optimizer is used for training, and the training impulse, learning rate, weight decay rate and batch size are respectively set to (0.9, 0.999), 0.0005, 1E-5 and 16. Since the model in this paper is an end-to-end model, no training or other operations are required.
[51] In order to verify the feasibility of the present disclosure, the present disclosure selects 1585 images as a training set and 400 images as a test set on the NJU2000 data set; 800 images are selected as the training set and 200 images as the test set on the NLPR data set; 637 images are selected as the training set and 160 images as the test set on the dataset
STEREO. The experimental results in Figs. 5 to 6 show that the model proposed by the present disclosure always has certain advantages, can accurately detect the salient region of images, and occupies less computing resources than other methods.
[52] The present disclosure adopts Precison and Recall values as evaluation indicators and draws a P-R curve to evaluate the performance of the algorithm, as shown in FIG. 5, the calculation formula is as follows: u [53] Precision =1- Vp + FP
Recall INL EN
[54] wherein, TP, FP, TN, FN represent the number of true positives, false positives,
true negatives, and false negatives, respectively.
[55] Although the specific embodiments of the present disclosure have been described above, those skilled in the art should understand that these are only examples, and various changes and modifications may be made to these embodiments without departing from the principle and essence of the present disclosure, therefore, the scope of protection of the present disclosure is defined by the appended claims.

Claims (6)

Conclusies L Computersysteem voor opvallendheidsdetectie van RGBD-afbeeldingen op basis van interactieve kenmerkfusie, waarbij het systeem het volgende omvat: een processor, een geheugen en een computerprogramma dat in het geheugen opgeslagen is en op de processor draait, en waarbij indien de processor het computerprogramma uitvoert, de volgende modules uitgevoerd worden: een afbeeldingsmonsterverzameling-tot-stand-brengingsmodule, dat een afbeeldingsmonsterverzameling tot stand brengt voor training; een opvallendheidsdetectiemodel-tot-stand-brengingsmodule, die een afbeeldingsopvallendheidsdetectiemodel tot stand brengt; voor elke afbeelding in de afbeeldingsmonsterverzameling, het eerst gebruiken van een meerlagige convolutioneelneuraalnetwerkmodule om een multiniveaukleur- en diepteafbeeldingskenmerk te extraheren uit respectievelijk een kleurafbeelding en een diepteafbeelding en het gebruiken van een kruiskenmerkfusiemodule om multiniveaupuntproductfusie uit te voeren op het kleur- en diepteafbeeldingskenmerk die geëxtraheerd 1s door diepe convolutie om een eerste initiële opvallendheidsafbeelding te verkrijgen; het vervolgens gebruiken van een Inceptionstructuur om multischaalfusie uit te voeren op het initiële meest opvallendheidsafbeelding om een door het netwerk voorspelde opvallendheidsafbeelding uit te voeren; tot slot, het gebruiken van het door het netwerk voorspelde opvallemdheidsafbeelding en een doelopvallendheidsafbeelding om een focusentropieverliesfunctie op te lossen om een optimale parameter van het afbeeldingsopvallendheidsdetectiemodel te leren en om een getraind afbeeldingsopvallendheidsdetectiemodel te verkrijgen; een uitvoermodule, die een te verwerken RGB-D-afbeelding invoert in het getrainde afbeeldingsopvallendheidsdetectiemodel en een overeenkomstige opvallendheidsdetectieresultaat uitvoert dat een opvallendheidskaart door middel van een modelberekening is.Claims L Computer system for RGBD image salience detection based on interactive feature fusion, the system comprising: a processor, a memory and a computer program stored in memory and running on the processor, and where if the processor executes the computer program , the following modules are executed: an image sample set creation module, which creates an image sample set for training; a salience detection model creation module, which establishes an image salience detection model; for each image in the image sample set, first using a multilayer convolutional neural network module to extract a multilevel color and depth image feature from a color image and a depth image, respectively, and using a cross feature fusion module to perform multilevel point product fusion on the color and depth image feature extracted by 1s by deep convolution to obtain a first initial salience image; then using an Inception structure to perform multiscale fusion on the initial salience map to output a network predicted salience map; finally, using the network predicted salience map and a target salience map to solve a focus entropy loss function to learn an optimal parameter of the image salience detection model and to obtain a trained image salience detection model; an output module, which inputs an RGB-D image to be processed into the trained image salience detection model and outputs a corresponding salience detection result which is a salience map by model calculation. 2. Computersysteem voor opvallendheidsdetectie van RGBD-afbeeldingen op basis van interactieve kenmerkfusie volgens conclusie 1, waarbij de kruiskenmerkfusiemodule een eerste convolutie en een tweede convolutie omvat, de eerste convolutie gebruikt om kenmerkextractie uit te voeren op een kleurafbeeldingskenmerk, de tweede convolutie gebruikt om kenmerkextractie uit te voeren op een diepteafbeeldingskenmerk, en waarbij een gemeenschappelijk kenmerk van het kleurafbeeldingskenmerk en het diepteafbeeldingskenmerk geëxtraheerd wordt door een puntproductwerkwijze, gefuseerd en getransformeerd wordt, waarna een derde convolutie gebruikt wordt om het gefuseerde kenmerk respectievelijk te verenigen met een originele kleurafbeeldingskenmerk en een originele diepteafbeeldingskenmerk door convolutie- en activeringsbewerkingen.The computer system for RGBD image salience detection based on interactive feature fusion according to claim 1, wherein the cross-feature fusion module comprises a first convolution and a second convolution, the first convolution used to perform feature extraction on a color image feature, the second convolution used to perform feature extraction from on a depth image feature, and wherein a common feature of the color image feature and the depth image feature is extracted by a dot product method, fused and transformed, then a third convolution is used to reconcile the fused feature with an original color image feature and an original depth image feature by convolution and activation operations. 3. Computersysteem voor opvallendheidsdetectie van RGBD-afbeeldingen op basis van interactieve kenmerkfusie volgens conclusie 2, waarbij de structuren van de eerste convolutie, de tweede convolutie en de derde convolutie hetzelfde zijn.The computer system for RGBD image salience detection based on interactive feature fusion according to claim 2, wherein the structures of the first convolution, the second convolution and the third convolution are the same. 4. Computersysteem voor opvallendheidsdetectie van RGBD-afbeeldingen op basis van interactieve kenmerkfusie volgens conclusie 1, waarbij de meerlagige convolutioneelneuraalnetwerkmodule twee dezelfde vertakkingen omvat, die respectievelijk inwerken op de kleurafbeelding en de diepteafbeelding en beide een FCN-structuur aannemen die vijf lagen van convolutie omvat, waarbij de eerste convolutie een standaard convolutieblok aanneemt en alle andere convolutielagen een mondiaal-lokaalkenmerkextractieconvolutieblok gebruiken; waarbij het mondiaal-lokaalkenmerkextractieconvolutieblok een mondiale vertakking en een lokale vertakking omvat, waarbij de lokale vertakking eerst een invoerkenmerkkaart reduceert tot 1/4 van een originele kenmerkkaart met een convolutie met een stapgrootte van 2 en dan twee identieke convoluties met een stapgrootte van 1 gebruikt om een lokaal kenmerk te extraheren, waarbij de mondiale vertakking een bottleneckstructuur aanneemt om een mondiaal kenmerk te extraheren, en tot slot waarbij het geëxtraheerde mondiale kenmerk en lokale kenmerk gefuseerd worden met behulp van een puntproductwerkwijze.The computer system for RGBD image salience detection based on interactive feature fusion according to claim 1, wherein the multi-layer convolutional neural network module comprises two same branches, respectively acting on the color image and the depth image and both adopting an FCN structure comprising five layers of convolution, wherein the first convolution adopts a standard convolution block and all other convolution layers use a global-local feature extraction convolution block; where the global-local feature extraction convolution block comprises a global branch and a local branch, the local branch first reducing an input feature map to 1/4 of an original feature map with a 2 step size convolution and then using two identical 1 step size convolutions to extracting a local feature, the global branch adopting a bottleneck structure to extract a global feature, and finally fusing the extracted global feature and local feature using a dot product method. 5. Computersysteem voor opvallendheidsdetectie van RGBD-afbeeldingen op basis van interactieve kenmerkfusie volgens conclusie 4, waarbij een grootte van een convolutiekernel van de convoluties met de stapgrootte van 1, 3x3 is en waarbij een activeringsfunctie ReLU is.The computer system for RGBD image salience detection based on interactive feature fusion according to claim 4, wherein a size of a convolution kernel of the step size convolutions is 1.3x3 and wherein an activation function is ReLU. 6. Computersysteem voor opvallendheidsdetectie van RGBD-afbeeldingen op basis van interactieve kenmerkfusie volgens conclusie 1, waarbij de focusentropieverliesfunctie L(y, f) ingesteld is als: TE eR! Hee) log{l- 3), vy =0 waarbij, y en y respectievelijk de doelopvallendheidsafbeelding en de door het netwerk voorspelde opvallendheidsafbeelding voorstellen, y een constante voorstelt en a een evenwichtsfactor voorstelt.The computer system for RGBD image salience detection based on interactive feature fusion according to claim 1, wherein the focus entropy loss function L(y, f) is set as: TE eR! Hee) log{1- 3), vy =0 where, y and y represent the target salience map and the network predicted salience map respectively, y represents a constant and a represents a balancing factor.
NL2030745A 2022-01-27 2022-01-27 Computer system for saliency detection of rgbd images based on interactive feature fusion NL2030745B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
NL2030745A NL2030745B1 (en) 2022-01-27 2022-01-27 Computer system for saliency detection of rgbd images based on interactive feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
NL2030745A NL2030745B1 (en) 2022-01-27 2022-01-27 Computer system for saliency detection of rgbd images based on interactive feature fusion

Publications (1)

Publication Number Publication Date
NL2030745B1 true NL2030745B1 (en) 2023-08-07

Family

ID=87569292

Family Applications (1)

Application Number Title Priority Date Filing Date
NL2030745A NL2030745B1 (en) 2022-01-27 2022-01-27 Computer system for saliency detection of rgbd images based on interactive feature fusion

Country Status (1)

Country Link
NL (1) NL2030745B1 (en)

Similar Documents

Publication Publication Date Title
CN110956111A (en) Artificial intelligence CNN, LSTM neural network gait recognition system
CN111382677B (en) Human behavior recognition method and system based on 3D attention residual error model
CN110674741A (en) Machine vision gesture recognition method based on dual-channel feature fusion
Zhang et al. ReYOLO: A traffic sign detector based on network reparameterization and features adaptive weighting
US20220122351A1 (en) Sequence recognition method and apparatus, electronic device, and storage medium
CN113963170A (en) RGBD image saliency detection method based on interactive feature fusion
WO2023040146A1 (en) Behavior recognition method and apparatus based on image fusion, and electronic device and medium
CN111242181B (en) RGB-D saliency object detector based on image semantics and detail
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN110852330A (en) Behavior identification method based on single stage
Yuan et al. A lightweight network for smoke semantic segmentation
CN115171014B (en) Video processing method, video processing device, electronic equipment and computer readable storage medium
Xiang et al. Crowd density estimation method using deep learning for passenger flow detection system in exhibition center
CN111368707A (en) Face detection method, system, device and medium based on feature pyramid and dense block
Zhang et al. An industrial interference-resistant gear defect detection method through improved YOLOv5 network using attention mechanism and feature fusion
CN114492634A (en) Fine-grained equipment image classification and identification method and system
CN117953581A (en) Method and device for identifying actions, electronic equipment and readable storage medium
CN113920066A (en) Multispectral infrared inspection hardware detection method based on decoupling attention mechanism
NL2030745B1 (en) Computer system for saliency detection of rgbd images based on interactive feature fusion
CN116453192A (en) Self-attention shielding face recognition method based on blocking
Sang et al. Image recognition based on multiscale pooling deep convolution neural networks
CN114596609A (en) Audio-visual counterfeit detection method and device
CN117036658A (en) Image processing method and related equipment
CN114494978A (en) Pipeline-based parallel video structured inference method and system
CN112802026A (en) Deep learning-based real-time traffic scene semantic segmentation method