CN116778186A

CN116778186A - Panoramic image saliency object detection method, device, equipment and storage medium

Info

Publication number: CN116778186A
Application number: CN202310602967.6A
Authority: CN
Inventors: 张秋丹; 张洁; 王旭; 江健民
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-09-19

Abstract

The invention is applicable to the technical field of image processing, and provides a method, a device, equipment and a storage medium for detecting a panoramic image salient object, wherein the method comprises the following steps: when a saliency object detection request is received, a panoramic image to be detected is obtained, and the panoramic image is processed through a pre-established saliency detection model to obtain a saliency image of the panoramic image, wherein the saliency detection model comprises a double-branch structure network, a mixed projection feature fusion module and a progressive prediction module, so that the detection performance of the saliency object of the panoramic image is improved, redundant information in the panoramic image is effectively filtered, and the saliency of the saliency image of the panoramic image is improved.

Description

Panoramic image saliency object detection method, device, equipment and storage medium

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a method, a device, equipment and a storage medium for detecting a panoramic image salient object.

Background

With rapid development of science and technology, popularization of panoramic cameras and development of AR/VR, 360-degree panoramic images are widely applied in various fields, such as real estate, tourist attractions, exhibition, automatic driving and the like, and the panoramic images are the most common virtual reality resources in life and contain abundant surrounding scene information, so that wider fields of view and more real scenes can be provided for viewers, and the viewers obtain immersive experience. Panoramic images generally have a greater resolution than traditional 2D images, and how to efficiently transfer and store such massive amounts of panoramic image data becomes a great challenge for panoramic image development. Saliency object detection captures human visual attention by simulating human vision, so that the most attractive target in an image is identified, and the most attractive target is an important initial step of some computer visual tasks, such as image segmentation, image compression, visual tracking and the like, so that a saliency object detection algorithm on a panoramic image has important research significance, and researchers are interested in the saliency object detection on the panoramic image more and more.

Panoramic images are typically presented in two formats, equidistant cylindrical projection (ERP) that uniformly samples latitude and longitude onto a rectangular plane and cubic projection (CMP), however, projection from a sphere onto a two-dimensional plane will result in distortion of the image content, CMP separating the panoramic image with a complete scene into 6 facets, but such a projection inevitably destroys the integrity of the image. Recently, several salient object detection methods have emerged based on panoramic images of different projection formats, for example, li et al propose a distortion adaptive salient object detection method to process distortion caused by spherical projection onto a plane and adaptively correct ERP images; huang et al designed a feature adaptive saliency object detection network by taking advantage of ERP and CMP images. Although the existing panoramic image saliency object detection model considers the distortion problem of the panoramic image ERP format, the distortion of the panoramic image is usually relieved by utilizing the CMP image, but the complete global characteristics of the ERP image and the specific local characteristics of the CMP image are not fully utilized. For the multi-layer feature map proposed by the feature extraction network, the existing panoramic image salient object detection method cannot fully utilize the space information with abundant low-level features and the semantic information with abundant high-level features, and for the large view field of the panoramic image, the existing method does not consider multi-scale salient objects in the image content, so that the existing panoramic image salient object detection method cannot fully utilize the semantic information with abundant high-level features and fails to consider the characteristics of different sizes of the salient objects in the panoramic image, and the problem of inaccurate salient object detection is easily caused.

Disclosure of Invention

The invention aims to provide a method, a device, equipment and a storage medium for detecting a panoramic image salient object, and aims to solve the problems that the detection performance of the panoramic image salient object is poor and the detected salient object is not obvious enough because the prior art can not provide an effective method for detecting the panoramic image salient object.

In one aspect, the present invention provides a method for detecting a panoramic image saliency object, the method comprising the steps of:

when a saliency object detection request is received, acquiring a panoramic image to be detected;

and processing the panoramic image through a pre-established saliency detection model to obtain a saliency map of the panoramic image, wherein the saliency detection model comprises a double-branch structure network, a hybrid projection feature fusion module and a progressive prediction module.

Preferably, the step of processing the panoramic image through a pre-established saliency detection model includes:

extracting features of the panoramic image through the network with the double branch structure to obtain a first feature and a second feature;

performing feature fusion on the first feature and the second feature through the mixed projection feature fusion module to obtain a third feature;

and processing the first feature, the second feature and the third feature through the progressive prediction module to obtain the saliency map.

Preferably, the progressive prediction module includes a top-level guided convolution module and a progressive refinement module.

Preferably, the step of processing, by the progressive prediction module, the first feature, the second feature and the third feature includes:

obtaining a fourth feature through the top-level guided convolution module according to the first feature and the second feature;

the saliency map is obtained by the progressive refinement module according to the first feature, the third feature and the fourth feature.

In another aspect, the present invention provides a device for detecting a panoramic image saliency object, the device comprising:

the image acquisition unit is used for acquiring a panoramic image to be detected when a saliency object detection request is received; and

the saliency map obtaining unit is used for processing the panoramic image through a pre-established saliency detection model to obtain a saliency map of the panoramic image, wherein the saliency detection model comprises a double-branch structure network, a mixed projection feature fusion module and a progressive prediction module.

Preferably, the saliency map obtaining unit includes:

the feature extraction unit is used for extracting features of the panoramic image through the double-branch structure network to obtain a first feature and a second feature;

the feature fusion unit is used for carrying out feature fusion on the first feature and the second feature through the mixed projection feature fusion module to obtain a third feature; and

and the feature processing unit is used for processing the first feature, the second feature and the third feature through the progressive prediction module to obtain the saliency map.

Preferably, the feature processing unit includes:

a feature obtaining unit, configured to obtain a fourth feature through the top-level guided convolution module according to the first feature and the second feature; and

a saliency map obtaining subunit, configured to obtain the saliency map through the progressive refinement module according to the first feature, the third feature, and the fourth feature.

In another aspect, the present invention further provides an image processing apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps described in the above method for detecting a panoramic image saliency object when the processor executes the computer program.

In another aspect, the present invention also provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of the method for detecting a panoramic image saliency object described above.

When a saliency object detection request is received, the panoramic image to be detected is obtained, and the panoramic image is processed through the pre-established saliency detection model to obtain the saliency image of the panoramic image, wherein the saliency detection model comprises a double-branch structure network, a mixed projection feature fusion module and a progressive prediction module, so that the detection performance of the saliency object of the panoramic image is improved, redundant information in the panoramic image is effectively filtered, and the saliency of the saliency image of the panoramic image is improved.

Drawings

Fig. 1 is a flowchart of an implementation of a method for detecting a salient object of a panoramic image according to an embodiment of the present invention;

fig. 2 is a flowchart of an implementation of a method for detecting a salient object of a panoramic image according to a second embodiment of the present invention;

fig. 3 is a flowchart of an implementation of a method for detecting a panoramic image salient object according to the third embodiment of the present invention;

fig. 4 is a schematic architecture diagram of a top-level guided convolution module in a method for detecting a panoramic image salient object according to a third embodiment of the present invention;

fig. 5 is a schematic diagram of an overall framework of a saliency detection model in a method for detecting a saliency object of a panoramic image according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of a device for detecting a panoramic image salient object according to a fourth embodiment of the present invention;

fig. 7 is a schematic diagram of a preferred structure of a device for detecting a panoramic image salient object provided in the fourth embodiment of the present invention;

fig. 8 is a schematic structural diagram of an image processing apparatus according to a fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The following describes in detail the implementation of the present invention in connection with specific embodiments:

embodiment one:

fig. 1 shows a flow of implementation of a method for detecting a salient object of a panoramic image according to an embodiment of the present invention, and for convenience of explanation, only a portion related to the embodiment of the present invention is shown, which is described in detail below:

in step S101, when a salient object detection request is received, a panoramic image to be detected is acquired.

The embodiment of the invention is applicable to an image processing platform, equipment or system, such as a personal computer, a server and the like. In the embodiment of the invention, the panoramic image is a 360-degree image which contains abundant surrounding scene information, can provide wider field of view and more real scene for a viewer and enables the viewer to obtain immersive experience, and can be obtained through a panoramic camera or virtual equipment such as AR/VR, and the panoramic image is usually displayed in an ERP format and can be also called as an ERP image.

In step S102, the panoramic image is processed through a pre-established saliency detection model, so as to obtain a saliency map of the panoramic image, where the saliency detection model includes a dual-branch structure network, a hybrid projection feature fusion module and a progressive prediction module.

In the embodiment of the invention, the panoramic image is correspondingly processed through a double-branch structure network, a mixed projection feature fusion module and a progressive prediction module in a pre-established saliency detection model, so as to obtain a saliency map of the panoramic image.

In the embodiment of the present invention, a specific implementation manner of processing a panoramic image through a pre-established saliency detection model is described in detail in the following method embodiment, which is not described herein.

In the embodiment of the invention, when a saliency object detection request is received, a panoramic image to be detected is obtained, and the panoramic image is processed through a pre-established saliency detection model to obtain a saliency image of the panoramic image, wherein the saliency detection model comprises a double-branch structure network, a hybrid projection feature fusion module and a progressive prediction module, so that the detection performance of the saliency object of the panoramic image is improved, redundant information in the panoramic image is effectively filtered, and the saliency of the saliency image of the panoramic image is improved.

Embodiment two:

fig. 2 shows a flow of implementation of the method for detecting a salient object of a panoramic image according to the second embodiment of the present invention, and for convenience of explanation, only the portion relevant to the embodiment of the present invention is shown, which is described in detail below:

the processing of the panoramic image in step S102 of the first embodiment is realized by the steps of:

in step S201, feature extraction is performed on the panoramic image through the dual-branch structure network, so as to obtain a first feature and a second feature.

In the embodiment of the invention, because the ERP image has serious distortion problem, all remarkable targets in the panoramic image can not be detected comprehensively and accurately only by using the ERP image, in order to solve the problem, a dual-branch network (Bi-branch Net) is constructed, the panoramic image in the ERP format to be detected is input into the dual-branch network of the saliency detection model, E2C operation is carried out on the ERP image through the dual-branch network, so that the panoramic image in the ERP format is converted into the CMP format to obtain a corresponding CMP image, and then backbone network feature extraction is carried out on the ERP image and the CMP image to obtain two groups of global features and local features, wherein the global features are the first features, and the local features are the second features.

In the feature extraction of the panoramic image through the dual branch structure network, the feature extraction of the panoramic image is preferably achieved by:

(1) ERP image I _E ∈R ^B×3×H×W Conversion to a corresponding CMP image by an E2C moduleWherein 6 represents front, right, back, left, upper and lower six faces of the three-dimensional space, 3 represents intensities of red, green and blue channels corresponding to each space position, R represents real number domain, B represents batch size, H, W represents height and width of the panoramic image respectively, and the E2C module is used for performing E2C operation on the panoramic image;

(2) Encoder ResNet-50 is used as a feature extractor, by which ERP images I are respectively extracted _E And CMP image I _C Extracting features to generate two corresponding features, namely a first featureAnd second feature->Where i=1, 2,..5, is the index of the encoder corresponding layer.

Feature extraction of the panoramic image is achieved through the steps (1) - (2), and therefore remarkable targets in the panoramic image are captured more comprehensively.

In step S202, feature fusion is performed on the first feature and the second feature by the hybrid projection feature fusion module, so as to obtain a third feature.

In an embodiment of the invention, the first featureMore complete salient object information is retained, second feature +.>Although the integrity of the panoramic image is destroyed, it retains more detail features, in order to take full advantage of both features, a hybrid projection feature fusion module (Hybrid ProjectionFeature Fusion Module, FFM) is constructed, and each layer of the encoder corresponds to one FFM, i.e. FFM corresponds to index number 1, 2..5, which can be expressed as FFM ⁱ FFM through corresponding index number ⁱ Fusion of the first feature extracted by Bi-branchNet->And second feature->A fusion feature with the advantages of ERP image and CMP image, namely a third feature is obtained>

In feature fusion of the first feature and the second feature by the hybrid projection feature fusion module, the feature fusion of the first feature and the second feature is preferably achieved by:

(1) Using a C2E module to compare the second characteristicReprojection into ERP format and is associated with the first feature +.>Alignment is performed to obtain alignment features +.>The specific operation is->The C2E module is used for converting the CMP format into the ERP format, and i represents indexes of corresponding layers of the encoder in the double-branch structure network;

(2) For a pair ofAnd->Performing element-level multiplication operation, and adaptively adjusting characteristics of each channel by using a Squeeze-and-Excitation (SE) module to obtain an enhanced characteristic ∈ ->Specifically, the operation is thatWherein "×" represents an element level multiplication operation, and SE (·) represents an SE module;

(3) Will enhance the characteristicsRespectively->Alignment feature->Concatenating and feed to a GConv ₁ In the convolution group, the corresponding enhanced feature +.>And->The specific operation isGConv _k (·)＝ReLU(BN(Conv _k×k (·))), wherein GConv ₁ The convolution group comprises a 1 multiplied by 1 convolution layer, a BN layer and a ReLU layer, and k is the size of a convolution kernel;

(4) Enhanced ERP featuresAnd CMP characteristics->Connecting and feeding the connected features to the 1×1 convolution layer and the ReLU layer to finally obtain the final fusion feature of the ERP and CMP images +.>Specifically, the operation is that

The feature fusion of the first feature and the second feature is realized through the steps (1) - (4), and the complementary correlation between the ERP image and the CMP image is adaptively learned through the FFM, so that the third feature combines the advantages of the features in the ERP image and the CMP image, and the significance clue of the 360-degree panoramic image can be comprehensively represented, thereby reducing the influence of ERP image distortion on the model performance.

In step S203, the first feature, the second feature, and the third feature are processed by the progressive prediction module to obtain a saliency map.

In the embodiment of the invention, the panoramic image generally has larger resolution, and in order to detect the salient objects with different scales, a progressive prediction module (Progressive Prediction Module, PPM) is constructed to combine the salient features with different scales, and the PPM is used for processing the first features, the second features and the third features with different scales to obtain the salient map of the panoramic image.

Preferably, progressiveThe prediction module PPM includes a Top-level guided convolution (Top-Level Guided Convolution, TLGC) module and a progressive refinement module (Progressive Refinement Module, PRM), and each layer of the encoder corresponds to one PRM, i.e., the PRM corresponds to an index number 1, 2..5, which may be denoted as PRM ⁱ The TLGC is used for carrying out multi-scale fusion processing on the topmost features of the first features and the second features extracted by the Bi-branchNet to obtain high-level semantic features with multi-scale information, the PRM is used for extracting significance clues for the high-level semantic features obtained in the TLGC, and the significance map of the panoramic image is obtained through gradually refining by the layer-by-layer PRM modules, so that the features of each layer are further refined and combined under the guidance of related features, and the performance and the accuracy of the model are improved.

In the embodiment of the present invention, a specific implementation manner of processing the first feature, the second feature and the third feature by the progressive prediction module is described in the following method embodiment, which is not described herein.

In the embodiment of the invention, the panoramic image is subjected to feature extraction through the network with the double-branch structure to obtain the first feature and the second feature, the first feature and the second feature are subjected to feature fusion through the mixed projection feature fusion module to obtain the third feature, and finally, the first feature, the second feature and the third feature are processed through the progressive prediction module to obtain the saliency map, so that the performance and the accuracy of the saliency detection model are improved, and the saliency of the panoramic image saliency map is improved.

Embodiment III:

fig. 3 shows a flow of implementation of the method for detecting a salient object of a panoramic image according to the third embodiment of the present invention, and for convenience of explanation, only the portion relevant to the embodiment of the present invention is shown, which is described in detail below:

the processing of the first feature, the second feature, and the third feature in step S203 of the second embodiment is realized by the following steps:

in step S301, a fourth feature is obtained by the top-level guided convolution module based on the first feature and the second feature.

At the bookIn the embodiment of the invention, the top-layer guided convolution module TLGC can capture high-level significant features of different scales, and the TLGC corresponds to one FFM module, and the index number of the FFM is 6, which can be expressed as FFM ⁶ To receive high-level semantic features (i.e., fourth features) of the TLGC output with multi-scale information. First feature of output of fifth layer of encoder in Bi-branch NetAnd second feature->Input to TLGC, first, the received second feature +.>Conversion to ERP format>The first received feature is reduced by the 1 x 1 convolution layer in TLGC>And features->The reduced channel features are then fed to three branches in a TLGC to obtain three salient features with different scale receptive fields, and the output three salient features are cascaded to obtain feature F _scales Finally, F _scales One GConv fed into TLGC ₃ Convolving the group to obtain the data with redundant semantic information, and respectively with +.>And->Corresponding top layer guiding feature->That is, the fourth feature, abbreviated as +.>Wherein the three branches are each a GConv with an expansion ratio of 1 ₃ Convolution set, a GConv comprising an expansion ratio of 3 ₃ Convolution group and a GConv with a dilation ratio of 5 ₃ Convolution group, GConv ₃ The convolution group comprises a 3 x 3 convolution layer, a BN layer and a ReLU layer, specifically denoted as GConv ₃ (·)＝ReLU(BN(Conv _3×3 (·))), fig. 4 shows the architecture of the top-level guided-convolution module.

In step S302, a saliency map is obtained by a progressive refinement module from the first feature, the third feature, and the fourth feature.

In an embodiment of the invention, a foreground-background attention mechanism is introduced in a Progressive Refinement Module (PRM) to refine the saliency map, a first featureThird feature->Fourth feature->After the PRM corresponding to the index number is input, the PRM is used for carrying out corresponding processing to obtain a saliency map, and specifically, the saliency map is obtained through the following steps:

(1) The third feature is calculated by convolutionIs>Alignment is carried out to obtain the aligned characteristics respectivelyAnd->

(2) Will beAnd->Connected together, followed by a 1 x 1 convolution to reduce the characteristic channel, resulting in the characteristic +.>

(3) Will beFeeding 3 x 3 convolution to obtain optimized fusion feature ∈ ->

(4) Connecting features by a residual operationAnd->To ensure the integrity of the salient objects and obtain the characteristics

(5) Features to be characterizedAnd->Performing ligation to obtain cascade characteristic->

(6) Cascading featuresIs fed into the SE module to adaptively learn channel characteristics and filter redundant information to obtain final optimized fusion characteristics ∈ ->

(7) Using 1 x 1 convolution for featuresIs reduced in characteristic channels and output characteristics +.>

(8) By each PRM ⁱ (i.ltoreq.4) progressively refining the previous module PRM ⁱ⁺¹ In particular, when i=5, the fourth feature isInput FFM ⁶ And FFM is carried out ⁶ The output characteristics produced->Feeding GConv ₃ Convolving the set, then reducing the number of channels using a 1 x 1 convolution, resulting in a single channel significance signature +.>And add the features->A PRM designated as the fifth layer of the encoder (i.e., PRM ⁵ ) When i= {1,2,3,4}, the previous layer PRM will be ⁱ⁺¹ Output characteristics of->As the current modeBlock PRM ⁱ Is input to the current module PRMi, and then the feature is applied using an up-sampling operation>And->Performing feature alignment to obtain foreground features +.>

(10) Using sigmoid operations from foreground featuresObtain prospect saliency map->

(11) According to the formulaFor foreground saliency map->Subtracting from matrix E to obtain background saliency map of background region>Wherein all elements in matrix E are 1, "-" represents an element level subtraction operation;

(12) Features to be characterizedAnd foreground saliency map->Into a system comprising an element-by-element multiplication and a GConv ₃ Branches of the convolution group get a foreground saliency cue while characterizing +.>And background saliency map->Feeding another also comprising an element-wise multiplication and a GConv ₃ The branches of the convolution group obtain background significance clues;

(13) Cascading and inputting foreground significant clues and background significant clues generated by the two branches into GConv ₁ In the convolution set to reduce the number of channels, 3 x 3 convolution is then used to obtain salient features with foreground and background information

(14) Features of foregroundAnd significant features->Adding element by element to obtain complete salient feature +.>Final complete salient features->Saliency map S constituting panoramic image ₃₆₀ 。

The saliency maps are obtained through the steps (1) - (14), so that the performance and accuracy of the model are improved, and the saliency of the saliency maps is improved.

Fig. 5 shows the overall framework of the saliency detection model.

In the embodiment of the invention, the fourth characteristic is obtained through the top layer guiding convolution module according to the first characteristic and the second characteristic, and the saliency map is obtained through the progressive refinement module according to the first characteristic, the third characteristic and the fourth characteristic, so that the performance and the accuracy of the model are improved, and the saliency of the saliency map is improved.

Embodiment four:

fig. 6 shows a structure of a device for detecting a salient object of a panoramic image according to a fourth embodiment of the present invention, and for convenience of explanation, only a portion related to the embodiment of the present invention is shown, including:

an image acquisition unit 61 for acquiring a panoramic image to be detected when a salient object detection request is received.

The saliency map obtaining unit 62 is configured to process the panoramic image through a pre-established saliency detection model, so as to obtain a saliency map of the panoramic image, where the saliency detection model includes a dual-branch structure network, a hybrid projection feature fusion module, and a progressive prediction module.

As shown in fig. 7, the saliency map obtaining unit 62 preferably includes:

the feature extraction unit 621 is configured to perform feature extraction on the panoramic image through the dual-branch structure network, so as to obtain a first feature and a second feature.

In the embodiment of the invention, because the ERP image has serious distortion problem, all remarkable targets in the panoramic image can not be detected comprehensively and accurately only by using the ERP image, in order to solve the problem, a dual-branch structure network (Bi-branchNet) is constructed, the panoramic image in the ERP format to be detected is input into the dual-branch structure network of the saliency detection model, E2C operation is carried out on the ERP image through the dual-branch structure network, so that the panoramic image in the ERP format is converted into the CMP format to obtain a corresponding CMP image, and backbone network feature extraction is carried out on the ERP image and the CMP image to obtain two groups of global features and local features, wherein the global features are the first features, and the local features are the second features.

Preferably, the feature extraction unit 621 includes:

format conversion unit 6211 for converting ERP image I _E ∈R ^B×3×H×W Conversion to a corresponding CMP image by an E2C moduleWherein 6 represents front, right, back, left, upper and lower six faces of the three-dimensional space, 3 represents intensities of red, green and blue channels corresponding to each space position, R represents real number domain, B represents batch size, H, W represents height and width of the panoramic image respectively, and the E2C module is used for performing E2C operation on the panoramic image; and

a feature extraction subunit 6212 for employing the encoder ResNet-50 as a feature extractor by which ERP images I are respectively extracted _E And CMP image I _C Extracting features to generate two corresponding features, namely a first featureAnd second feature->Where i=1, 2,..5, is the index of the encoder corresponding layer.

The feature fusion unit 622 is configured to perform feature fusion on the first feature and the second feature through the hybrid projection feature fusion module, so as to obtain a third feature.

In an embodiment of the invention, the first featureMore complete salient object information is retained, second feature +.>Although the integrity of the panoramic image is compromised, it retains more detail features, in order to take full advantage of both features, a hybrid projection Feature Fusion Module (FFM) is constructed, and each layer of the encoder corresponds to one FFM, i.e., FFM corresponds to index number 1, 2..5, which can be expressed as FFM ⁱ FFM through corresponding index number ⁱ Fusion of the first feature extracted by Bi-branchNet->And second feature->A fusion feature with the advantages of ERP image and CMP image, namely a third feature is obtained>

Preferably, the feature fusion unit 622 includes:

a feature alignment unit 6221 for using a C2E module to align the second featureReprojection into ERP format and is associated with the first feature +.>Alignment is performed to obtain alignment features +.>The specific operation is->The C2E module is used for converting the CMP format into the ERP format, and i represents the index of the corresponding layer of the encoder;

a first enhancement unit 6222 forAnd->Performing element-level multiplication operation, and adaptively adjusting characteristics of each channel by using a Squeeze-and-Excitation (SE) module to obtain an enhanced characteristic ∈ ->The specific operation is->Wherein "×" represents an element level multiplication operation, and SE (·) represents an SE module;

a second enhancement unit 6223 for enhancing the characteristicsRespectively->Alignment featuresConcatenating and feed to a GConv ₁ In the convolution group, the corresponding enhanced feature +.>And->The specific operation is-> GConv _k (·)＝ReLU(BN(Conv _k×k (·))), wherein GConv ₁ The convolution group comprises a 1 multiplied by 1 convolution layer, a BN layer and a ReLU layer, and k is the size of a convolution kernel; and

a feature fusion subunit 6224 for fusing enhanced ERP featuresAnd CMP characteristics->Connecting and feeding the connected features to the 1×1 convolution layer and the ReLU layer to finally obtain the final fusion feature of the ERP and CMP images +.>Specific operation is->

The feature processing unit 623 is configured to process the first feature, the second feature, and the third feature through the progressive prediction module, and obtain a saliency map.

In the embodiment of the invention, the panoramic image generally has larger resolution, and in order to detect the salient objects with different scales, a Progressive Prediction Module (PPM) is constructed to combine the salient features with different scales, and the PPM is used for processing the first features, the second features and the third features with different scales to obtain the salient map of the panoramic image.

Preferably, the progressive prediction module PPM comprises a top-level guided convolution (TLGC) module and a Progressive Refinement Module (PRM), and each layer of the encoder corresponds to one PRM, i.e. the PRM corresponds to an index number 1, 2..5, which may be denoted as PRM ⁱ The TLGC is used for carrying out multi-scale fusion processing on the topmost features of the first features and the second features extracted by the Bi-branch Net to obtain high-level semantic features with multi-scale information, the PRM is used for extracting significance clues from the high-level semantic features obtained in the TLGC, and the high-level semantic features are gradually thinned through a layer-by-layer PRM module to obtain the full-scale semantic featureThe scene image saliency map is further refined and combined with the characteristics of each layer under the guidance of the related characteristics, so that the performance and accuracy of the model are improved.

Preferably, the feature processing unit 623 includes:

a feature obtaining unit 6231 for obtaining a fourth feature by the top-level guided convolution module based on the first feature and the second feature.

In the embodiment of the invention, the top-level guided convolution module TLGC can capture high-level significant features of different scales, and the TLGC corresponds to one FFM module, and the index number of the FFM is 6 and can be expressed as FFM ⁶ To receive high-level semantic features (i.e., fourth features) of the TLGC output with multi-scale information. First feature of output of fifth layer of encoder in Bi-branch NetAnd second feature->Input to TLGC, first, the received second feature +.>Conversion to ERP format>The first received feature is reduced by the 1 x 1 convolution layer in TLGC>And features->The reduced channel features are then fed to three branches in a TLGC to obtain three salient features with different scale receptive fields, and the output three salient features are cascaded to obtain feature F _scales Finally, F _scales One GConv fed into TLGC ₃ Convolution group getsHaving redundant semantic information, and +.>And->Corresponding top layer guiding feature->That is, the fourth feature, abbreviated as +.>Wherein the three branches are each a GConv with an expansion ratio of 1 ₃ Convolution set, a GConv comprising an expansion ratio of 3 ₃ Convolution group and a GConv with a dilation ratio of 5 ₃ Convolution group, GConv ₃ The convolution group comprises a 3 x 3 convolution layer, a BN layer and a ReLU layer, specifically denoted as GConv ₃ (·)＝ReLU(BN(Conv _3×3 (·))))。

A saliency map obtaining subunit 6232 configured to obtain a saliency map by a progressive refinement module according to the first feature, the third feature, and the fourth feature.

(3) Will beFeeding 3 x 3 convolution to obtain optimized fusion feature ∈ ->

(8) By each PRM ⁱ (i.ltoreq.4) progressively refining the previous module PRM ⁱ⁺¹ In particular, when i=5, the fourth feature isInput FFM ⁶ And FFM is carried out ⁶ The output characteristics produced->Feeding GConv ₃ Convolving the set, then reducing the number of channels using a 1 x 1 convolution, resulting in a single channel significance signature +.>And add the features->A PRM designated as the fifth layer of the encoder (i.e., PRM ⁵ ) When i= {1,2,3,4}, the previous layer PRM will be ⁱ⁺¹ Output characteristics of->PRM as the current module ⁱ Is input to the current module PRM ⁱ Subsequently, an up-sampling operation is used for the feature +.>And->Performing feature alignment to obtain foreground features +.>

(13) Cascading and inputting foreground significant clues and background significant clues generated by the two branches into GConv ₁ In the convolution set to reduce the number of channels, 3 x 3 convolution is then used to obtain salient features with foreground and background information；

In the embodiment of the invention, each unit of the panoramic image saliency object detection device can be realized by corresponding hardware or software units, each unit can be an independent software and hardware unit, and can also be integrated into one software and hardware unit, and the invention is not limited herein.

Fifth embodiment:

fig. 8 shows the structure of an image processing apparatus provided in the fifth embodiment of the present invention, and for convenience of explanation, only the portions related to the embodiments of the present invention are shown.

The image processing apparatus 8 of the embodiment of the present invention includes a processor 80, a memory 81, and a computer program 82 stored in the memory 81 and executable on the processor 80. The processor 80, when executing the computer program 82, implements the steps in the above-described embodiment of the method for detecting a salient object of a panoramic image, for example, steps S101 to S102 shown in fig. 1. Alternatively, the processor 80, when executing the computer program 82, performs the functions of the units in the above-described device embodiments, for example, the functions of the units 61 to 62 shown in fig. 6.

The image processing device of the embodiment of the invention can be a personal computer or a server. The steps implemented when the processor 80 in the image processing apparatus 8 executes the computer program 82 to implement the method for detecting the salient object of the panoramic image may refer to the description of the foregoing method embodiments, and will not be repeated herein.

Example six:

in an embodiment of the present invention, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps in the above-described embodiment of a method for detecting a panoramic image saliency object, for example, steps S101 to S102 shown in fig. 1. Alternatively, the computer program, when executed by a processor, implements the functions of the units in the above-described respective apparatus embodiments, such as the functions of the units 61 to 62 shown in fig. 6.

The computer readable storage medium of embodiments of the present invention may include any entity or device capable of carrying computer program code, recording medium, such as ROM/RAM, magnetic disk, optical disk, flash memory, and so on.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A method for detecting a salient object of a panoramic image, the method comprising the steps of:

2. The method of claim 1, wherein the step of processing the panoramic image through a pre-established saliency detection model comprises:

3. The method of claim 2, wherein the progressive prediction module comprises a top-level guided convolution module and a progressive refinement module.

4. The method of claim 3, wherein the step of processing the first feature, the second feature, and the third feature by the progressive prediction module comprises:

5. A device for detecting a panoramic image saliency object, the device comprising:

6. The apparatus of claim 5, wherein the saliency map obtaining unit includes:

7. The apparatus of claim 6, wherein the progressive prediction module comprises a top-level guided convolution module and a progressive refinement module.

8. The apparatus of claim 7, wherein the feature processing unit comprises:

9. An image processing device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 4.