CN112101376A

CN112101376A - Image processing method, image processing device, electronic equipment and computer readable medium

Info

Publication number: CN112101376A
Application number: CN202010822524.4A
Authority: CN
Inventors: 熊鹏飞
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-12-18
Also published as: WO2022033088A1

Abstract

The invention provides an image processing method, an image processing device, electronic equipment and a computer readable medium, wherein the image processing method comprises the following steps: acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a multi-scale feature map; strengthening the part corresponding to the salient object in the multi-scale characteristic diagram to obtain a multi-scale strengthened characteristic diagram; and carrying out image restoration on the multi-scale enhanced characteristic image to obtain a significant object mask corresponding to the image to be processed. According to the method, after the part corresponding to the salient object in the multi-scale characteristic diagram is subjected to strengthening treatment, the characteristic diagram corresponding to the salient object is more prominent in the obtained multi-scale strengthening characteristic diagram, and finally after the multi-scale strengthening characteristic diagram is subjected to image restoration, the mask of the salient object obtained by segmentation is more accurate, so that the technical problem of poor precision when the image is processed by the conventional salient object segmentation method is solved.

Description

Image processing method, image processing device, electronic equipment and computer readable medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable medium.

Background

Salient Object Segmentation (Salient Object Segmentation) is an important topic of Computer Vision (Computer Vision). The method has wide application in the fields of mobile phone automatic focusing, unmanned driving, scene understanding, image editing and the like. The purpose of salient object segmentation is to distinguish the pixel points of salient objects from other background pixel points in one image. Different from the traditional semantic segmentation task, the salient objects do not belong to the same class of objects and do not have semantically related labels. However, the salient object is often in the middle of the image and is rich in color, as shown in fig. 1(a), and fig. 1(b) is a schematic diagram of the segmentation result of the salient object corresponding to fig. 1 (a).

The existing salient object segmentation methods are mainly divided into two types. One type of the method is to analyze the texture of an image to determine an area with rich texture in the image, and then distinguish an object from other areas with single texture by a clustering method. The method is limited by a clustering method, and high precision is difficult to obtain; another category is the consideration of salient object segmentation as a standard object segmentation problem. However, the standard object segmentation is to segment preset types of objects in the image, for example, to segment people, cars, and dogs in the image, but these objects may not be salient objects or all of them may not be salient objects for a specific image, which results in errors of the segmented salient objects.

In summary, the conventional salient object segmentation method has the technical problem of poor precision when processing an image.

Disclosure of Invention

In view of the above, the present invention provides an image processing method, an image processing apparatus, an electronic device, and a computer readable medium, so as to alleviate the technical problem of poor accuracy when processing an image in the conventional salient object segmentation method.

In a first aspect, an embodiment of the present invention provides an image processing method, including: acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a multi-scale feature map; performing reinforcement processing on a part corresponding to the salient object in the multi-scale feature map to obtain a multi-scale reinforced feature map; and carrying out image restoration on the multi-scale enhanced feature map to obtain a significant object mask corresponding to the image to be processed.

Further, the feature extraction of the image to be processed includes: carrying out multilayer downsampling processing on the image to be processed to obtain a multi-scale original characteristic diagram; and optimizing the multi-scale original characteristic diagram to obtain the multi-scale characteristic diagram.

Further, the optimizing the multi-scale original feature map comprises: performing first optimization processing on a target original feature map in the multi-scale original feature map to obtain a first optimized feature map, wherein the target original feature map is a feature map in the multi-scale original feature map except for a highest-dimension original feature map; performing second optimization processing on the highest-dimensional original feature map in the multi-scale original feature map to obtain a second optimized feature map; and taking the first optimized feature map and the second optimized feature map as the multi-scale feature map.

Further, the performing of the first optimization process on the target original feature map in the multi-scale original feature map includes: optimizing the target original feature map by using a first optimization module to obtain a first initially optimized feature map, wherein the first optimization module comprises: presetting a number of first winding layers; and adding the first initially optimized feature map and the corresponding target original feature map to obtain the first optimized feature map.

Further, performing a second optimization process on the highest-dimensional original feature map in the multi-scale original feature maps includes: optimizing the highest-dimensional original feature map by using a second optimization module to obtain an optimized weight, wherein the second optimization module comprises: the second convolution layer, the global pooling layer and the Sigmoid function processing layer; performing product operation on the optimization weight and the highest-dimensional original feature map to obtain a second initially optimized feature map; and adding the second initially optimized feature map and the highest-dimension original feature map to obtain the second optimized feature map.

Further, the strengthening processing of the part corresponding to the salient object in the multi-scale feature map includes: obtaining the initial position of the salient object according to the multi-scale feature map; according to the initial position, cutting the feature map with the highest dimension in the multi-scale feature map with at least two different expansion scales to obtain a plurality of cut feature maps, wherein the plurality of cut feature maps comprise feature information of the salient object; taking one or more of the multi-scale feature maps as target feature maps, taking each target feature map as a current target feature map one by one, and calculating the correlation degree between the plurality of cutting feature maps and the current target feature map to obtain a plurality of correlation degree feature maps of the current target feature map, which are in one-to-one correspondence with the plurality of cutting feature maps; and obtaining a reinforced feature map corresponding to the current target feature map according to the plurality of correlation feature maps and the current target feature map.

Further, obtaining the initial position of the salient object according to the multi-scale feature map comprises: performing dimensionality reduction processing on the feature map with the highest dimension in the multi-scale feature map to obtain a single-channel feature map; carrying out binarization processing on the single-channel feature map to obtain a single-channel binarization feature map; and determining the initial position of the salient object according to the binarization feature map of the single channel.

Further, according to the initial position, with at least two different expansion scales, the feature map of the highest dimension in the multi-scale feature map is clipped by: determining the pixel width and the pixel height of the salient object according to the initial position; determining an extended pixel width and an extended pixel height according to the extended scale, the pixel width and the pixel height; and in the feature map with the highest dimension, clipping is carried out along the position of the initial position after the expansion pixel width and the expansion pixel height are expanded.

Further, calculating the correlation between the plurality of clipping feature maps and the current target feature map comprises: zooming the plurality of cutting feature maps to a preset scale to obtain a plurality of cutting feature maps with the preset scale; sliding on the current target feature map by taking the preset scale as a sliding window; and respectively carrying out product operation on the feature graph contained in the sliding window after each sliding and the plurality of cutting feature graphs with the preset scale, and obtaining a plurality of relevancy feature graphs, corresponding to the plurality of cutting feature graphs, of the current target feature graph according to the result of the product operation.

Further, obtaining the enhanced feature map corresponding to the current target feature map according to the multiple relevancy feature maps and the current target feature map includes: performing product operation on each correlation degree feature map in the plurality of correlation degree feature maps and the current target feature map to obtain a plurality of first strengthened feature maps corresponding to the current target feature map; connecting the plurality of first enhanced feature maps with the current target feature map in series to obtain a second enhanced feature map corresponding to the current target feature map; and acquiring a position strengthening feature map corresponding to the current target feature map, and connecting the second strengthening feature map and the position strengthening feature map in series to obtain a strengthening feature map corresponding to the current target feature map, wherein the scale of the position strengthening feature map is the same as that of the second strengthening feature map.

Further, the obtaining of the position reinforced feature map corresponding to the current target feature map includes: determining a central line of the salient object in an X direction and a central line of the salient object in a Y direction based on the initial position of the salient object; setting the central line in the Y direction as a first target value, and linearly converting the central line in the Y direction into a second target value along the X direction to obtain a position strengthening feature map in the X direction; setting the central line in the X direction as the first target value, and linearly converting the central line in the X direction into the second target value along the Y direction to obtain a position strengthening feature map in the Y direction; and taking the position strengthening characteristic diagram in the X direction and the position strengthening characteristic diagram in the Y direction as the position strengthening characteristic diagrams.

Further, the image restoration of the multi-scale enhanced feature map includes: and performing up-sampling on the multi-scale enhanced feature map to obtain a significant object mask corresponding to the image to be processed.

In a second aspect, an embodiment of the present invention further provides an image processing apparatus, including: the device comprises a characteristic extraction unit, a feature extraction unit and a feature extraction unit, wherein the characteristic extraction unit is used for acquiring an image to be processed and extracting the characteristics of the image to be processed to obtain a multi-scale characteristic map; the strengthening processing unit is used for strengthening the part corresponding to the salient object in the multi-scale feature map to obtain a multi-scale strengthening feature map; and the image restoration unit is used for carrying out image restoration on the multi-scale enhanced characteristic image to obtain a significant object mask corresponding to the image to be processed.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to any one of the above first aspects when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable medium having non-volatile program code executable by a processor, where the program code causes the processor to perform the steps of the method according to any one of the first aspect.

In the embodiment of the invention, firstly, an image to be processed is obtained, and the image to be processed is subjected to feature extraction to obtain a multi-scale feature map; then, strengthening processing is carried out on the part corresponding to the salient object in the multi-scale characteristic diagram to obtain a multi-scale strengthened characteristic diagram; and finally, carrying out image restoration on the multi-scale enhanced characteristic image to obtain a significant object mask corresponding to the image to be processed. According to the description, after the part corresponding to the salient object in the multi-scale feature map is subjected to strengthening treatment, the feature map corresponding to the salient object is more prominent in the obtained multi-scale strengthening feature map, and finally, after the multi-scale strengthening feature map is subjected to image restoration, the salient object mask obtained by segmentation is more accurate, so that the technical problem of poor precision when the image is processed by the existing salient object segmentation method is solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1(a) is a schematic diagram of an image to be processed according to an embodiment of the present invention;

fig. 1(b) is a schematic diagram of a segmentation result of a salient object corresponding to fig. 1(a) according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an electronic device according to an embodiment of the present invention;

FIG. 3 is a flowchart of an image processing method according to an embodiment of the present invention;

FIG. 4 is a general diagram illustrating an image processing method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a first optimization process provided by an embodiment of the present invention;

FIG. 6 is a diagram illustrating a second optimization process according to an embodiment of the present invention;

fig. 7 is a flowchart illustrating an enhancement process performed on a portion corresponding to a salient object in a multi-scale feature map according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating a method for determining an enhanced feature map corresponding to a current target feature map according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a location-enhanced feature map provided in accordance with an embodiment of the present invention;

fig. 10 is a schematic diagram illustrating a strengthening process performed on a portion corresponding to a salient object in a multi-scale feature map according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating the results of training and testing on a plurality of public data sets by the image processing method of the present invention and the conventional salient object segmentation method according to the embodiment of the present invention;

fig. 12 is a schematic view of a visualization result obtained after an image to be processed is processed by an image processing method according to the present invention and a conventional salient object segmentation method provided in an embodiment of the present invention;

fig. 13 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

first, an electronic device 100 for implementing an embodiment of the present invention, which can be used to execute the image processing method of embodiments of the present invention, is described with reference to fig. 2.

As shown in FIG. 2, electronic device 100 includes one or more processors 102, one or more memories 104, an input device 106, an output device 108, and a camera 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 2 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 102 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and an asic (application Specific Integrated circuit), and the processor 102 may be a Central Processing Unit (CPU) or other form of Processing Unit having data Processing capability and/or instruction execution capability, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The camera 110 is configured to capture an image to be processed, where the image to be processed captured by the camera is processed by the image processing method to obtain a salient object mask corresponding to the image to be processed, for example, the camera may capture an image (e.g., a photo, a video, etc.) desired by a user, and then process the image by the image processing method to obtain a salient object mask corresponding to the image to be processed, and the camera may further store the captured image in the memory 104 for use by other components.

Exemplarily, an electronic device for implementing an image processing method according to an embodiment of the present invention may be implemented as a smart mobile terminal such as a smartphone, a tablet computer, or the like.

Example 2:

according to an embodiment of the present invention, there is provided an embodiment of an image processing method, it should be noted that the steps shown in the flowchart of the drawings may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that here.

Fig. 3 is a flowchart of an image processing method according to an embodiment of the present invention, as shown in fig. 3, the method including the steps of:

step S302, acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a multi-scale feature map;

in an embodiment of the present invention, the multi-scale feature map represents feature maps with different sizes (i.e., different heights and widths). The feature extraction can be multi-layer convolution downsampling processing, each layer of downsampling is performed on an image to be processed, a feature map of one scale can be obtained, the feature map of the scale comprises a plurality of sub-feature maps, the feature map is actually a multi-channel matrix (the matrix of each channel is a two-dimensional matrix), the matrix of each channel can correspond to one sub-feature map, the number of elements of each row of the matrix represents the width of the sub-feature map corresponding to the channel, and the number of elements of each column of the matrix represents the height of the sub-feature map corresponding to the channel.

Step S304, performing strengthening treatment on the part corresponding to the salient object in the multi-scale feature map to obtain a multi-scale strengthened feature map;

the inventor considers that the salient object is only related to the pixel values of the neighborhood of the salient object, and the more the salient object is in the image, the higher the accuracy of the mask of the salient object obtained by segmentation. Therefore, after the multi-scale feature map is obtained, the part corresponding to the salient object in the multi-scale feature map is subjected to strengthening treatment, and the characteristic of the part corresponding to the salient object is more prominent after the strengthening treatment, so that the obtained salient object mask is more accurate after the image restoration is carried out on the multi-scale strengthened feature map.

The strengthening process will be described in detail below, and will not be described herein again.

And S306, carrying out image restoration on the multi-scale enhanced characteristic image to obtain a significant object mask corresponding to the image to be processed.

Specifically, the enhanced feature maps of multiple scales are up-sampled, and in the up-sampling process, the enhanced feature maps of different scales are fused, so that a significant object mask corresponding to the image to be processed is obtained.

The foregoing briefly introduces the image processing method of the present invention, and the details thereof are described in detail below.

In an optional embodiment of the present invention, in step S302, the step of performing feature extraction on the image to be processed includes the following processes (1) to (2):

(1) carrying out multi-layer down-sampling processing on an image to be processed to obtain a multi-scale original characteristic diagram;

specifically, referring to fig. 4, the multi-layer down-sampling may be multi-layer convolution down-sampling, and each layer of down-sampling can obtain an original feature map of one scale, so that an original feature map of multiple scales can be obtained.

It is found through experiments that the accuracy of image processing is greatly affected by a high-dimensional original feature map (for example, an original feature map obtained by down-sampling at a third layer or higher) in a multi-scale original feature map, the accuracy of image processing is not greatly affected by a low-dimensional original feature map (for example, an original feature map obtained by down-sampling at a first layer and a second layer), and in order to improve the efficiency of image processing, the results of the down-sampling at the first layer and the down-sampling at the second layer do not need to be considered, that is, the original feature maps obtained by the down-sampling at the first layer and the down-sampling at the second layer are not processed in a subsequent process, as shown in fig. 4.

(2) And optimizing the high-dimensional original feature map in the multi-scale original feature map to obtain the multi-scale feature map.

The method specifically comprises the following steps:

21) performing first optimization processing on a target original feature map in the multi-scale original feature map to obtain a first optimized feature map, wherein the target original feature map is a high-dimensional original feature map except for a highest-dimensional original feature map in the multi-scale original feature map;

referring to fig. 4, a first optimization process, i.e., the SRB process in fig. 4, is performed on all the high-dimensional original feature maps except the highest-dimensional original feature map among the multi-scale original feature maps. The specific process of the first optimization processing (SRB processing) is: optimizing the target original feature map by using a first optimization module to obtain a first initially optimized feature map, wherein the first optimization module comprises: presetting a number of first winding layers; and adding the first initially optimized feature map and the corresponding target original feature map to obtain a first optimized feature map.

In the embodiment of the present invention, the preset number of first convolution layers may be two convolution layers of 3x3, referring to fig. 5, after the target original feature map passes through two convolution layers of 3x3 connected in series, a first initially optimized feature map is obtained, and then the first initially optimized feature map and the corresponding target original feature map are summed to obtain a first optimized feature map.

The fact that the feature map is actually a multi-dimensional matrix has been explained above, so the process of adding the first initially optimized feature map and the corresponding target original feature map is actually the process of adding corresponding elements in the multi-dimensional matrix.

22) Performing second optimization processing on the highest-dimensional original feature map in the multi-scale original feature map to obtain a second optimized feature map;

referring to fig. 4, a second optimization process, i.e., the GRB process in fig. 4, is performed on the highest-dimensional original feature map among the multi-scale original feature maps. The specific process of the second optimization processing (GRB processing) is as follows: and optimizing the highest-dimensional original feature map by using a second optimization module to obtain an optimization weight, wherein the second optimization module comprises: the second convolution layer, the global pooling layer and the Sigmoid function processing layer; performing product operation on the optimized weight and the highest-dimensional original feature map to obtain a second initially optimized feature map; and adding the second initially optimized feature map and the highest-dimension original feature map to obtain a second optimized feature map.

In the embodiment of the present invention, the second convolutional layer may be a 1x1 convolutional layer, referring to fig. 6, after sequentially passing through the 1x1 convolutional layer, the global pooling layer, the 1x1 convolutional layer, and the Sigmoid function, the highest dimensional original feature map obtains an optimization weight, the optimization weight is then multiplied by the highest dimensional original feature map, and the result obtained by the operation is then added to the highest dimensional original feature map to obtain a second optimized feature map.

Similarly, the product operation is a product operation process of optimizing the weight and the elements in the two-dimensional matrix, and the sum operation is a process of adding and operating corresponding elements in the two-dimensional matrix.

23) And taking the first optimized feature map and the second optimized feature map as multi-scale feature maps.

The process of performing the enhancement processing on the portion corresponding to the salient object in the multi-scale feature map is described in detail below.

In an alternative embodiment of the present invention, referring to fig. 7, in step S304, the step of performing enhancement processing on the portion corresponding to the salient object in the multi-scale feature map includes:

step S701, obtaining an initial position of a salient object according to a multi-scale feature map;

the method specifically comprises the following steps: carrying out dimensionality reduction processing on the feature map with the highest dimension in the multi-scale feature map to obtain a feature map with a single channel; carrying out binarization processing on the single-channel feature map to obtain a single-channel binarization feature map; and determining the initial position of the salient object according to the binarization feature map of the single channel.

Referring to fig. 4, the second optimized feature map (i.e., the feature map with the highest dimension in the multi-scale feature map) after the GRB processing (the second optimization processing) is subjected to the dimension reduction processing by the convolutional layer to obtain a single-channel feature map (N × M, the number of channels is 1), and then the feature map with the single channel is subjected to the binarization processing to obtain a single-channel binarization feature map (e.g., an image indicated below the GRB in fig. 4), where the portion represented by 1 in the single-channel binarization feature map is the portion corresponding to the salient object, and the initial position of the salient object can be determined according to the position of 1 in the single-channel binarization feature map. The initial position may be a position determined by leftmost 1, rightmost 1, uppermost 1 and lowermost 1 in the binarized feature map of the single channel.

Step S702, according to the initial position, cutting the feature map with the highest dimension in the multi-scale feature map with at least two different expansion scales to obtain a plurality of cutting feature maps, wherein the plurality of cutting feature maps contain feature information of a significant object;

the method specifically comprises the following steps: determining the pixel width and the pixel height of the salient object according to the initial position; determining an extended pixel width and an extended pixel height according to the extended scale, the pixel width and the pixel height; in the feature map of the highest dimension, clipping is performed along the positions where the initial positions are extended by the extended pixel width and the extended pixel height. The cutting feature map is cut out by performing certain expansion with the initial position of the object as the center, and therefore includes salient objects, that is, feature information of the salient objects.

For example, when the extended scale is 10%, if the pixel width and the pixel height of the saliency object are 30 × 30, the extended pixel width (i.e., 10% of 30) is 3 pixels, and the extended pixel height (i.e., 10% of 30) is 3 pixels, that is, in the feature map of the highest dimension, the initial position is extended (i.e., increased) by 3 background pixels toward the left and right directions, and the initial position is extended by 3 background pixels toward the up and down directions, and then clipping is performed.

The expansion scale may also be 30%, 50%, etc., and the embodiment of the present invention does not specifically limit the expansion scale. And cutting different cutting feature maps according to different extension scales, so that the number of the extension scales is equal to the number of the cutting feature maps.

Step S703, using one or more of the multi-scale feature maps as target feature maps, using each target feature map as a current target feature map one by one, and calculating the correlation between the plurality of cutting feature maps and the current target feature map to obtain a plurality of correlation feature maps of the current feature map, which are in one-to-one correspondence with the plurality of cutting feature maps;

the target feature map may be one or more of multi-scale feature maps, which is not particularly limited by the embodiment of the present invention.

The method specifically comprises the following steps: zooming the plurality of cutting feature maps to a preset scale to obtain a plurality of cutting feature maps with the preset scale; taking each target feature graph as a current target feature graph one by one, and sliding on the current target feature graph by taking a preset scale as a sliding window; and performing multiplication operation on the feature graph contained in the sliding window after each sliding and each cutting feature graph in the plurality of cutting feature graphs with preset scales, and obtaining a plurality of relevancy feature graphs, corresponding to the plurality of cutting feature graphs, of the current target feature graph one to one according to the result of the multiplication operation.

In order to better understand the process, the above process is described below with a specific example: if the cut feature map with the preset scale is a B map of 32 x 64, the current target feature map is an A map of 64 x 128.

For simplifying the description, firstly, B1 diagram of 32 × 1 and a1 diagram of 64 × 1 are explained, sliding is performed on the a1 diagram in sequence by using 32 × 32 as a sliding window according to a preset sliding step, each sliding operation is performed to obtain a small block of 32 × 32, the small block is multiplied by the B1 diagram of 32 × 32 to obtain a new small block of 32 × 32, and all sliding operations are completed to obtain the correlation characteristic diagrams of the a1 diagram and the B1 diagram, wherein the size of the correlation characteristic diagram is 64 × 1;

when calculating the correlation characteristic maps of the B2 graph of 32 × 1 and the a2 graph of 64 × 128, sequentially sliding each channel of the a2 graph by using 32 × 32 as a sliding window according to a preset sliding step, obtaining a small block of 32 × 32 after each sliding, performing product operation on the small block and the B2 graph of 32 × 32 to obtain a new small block of 32 × 32, and obtaining the correlation characteristic maps of the a2 graph and the B2 graph after all the channels are completely slid, wherein the size of the correlation characteristic map is 64 × 128;

when calculating the correlation characteristic maps of the B diagram of 32 × 64 and the a diagram of 64 × 128, each dimension of 32 × 64 is calculated with 64 × 128 according to the above process to obtain a correlation characteristic map of 64 × 128, and then the 64 correlation characteristic maps are added together to obtain a final correlation characteristic map of 64 × 128, that is, the correlation characteristic maps of the a diagram and the B diagram.

Step S704, obtaining a strengthened feature map corresponding to the current target feature map according to the multiple correlation feature maps and the current target feature map.

Referring to fig. 8, the following process is specifically included:

step S801, performing product operation on each correlation degree feature map in the multiple correlation degree feature maps and a current target feature map to obtain multiple first strengthened feature maps corresponding to the current target feature map;

after the product operation is carried out, the part corresponding to the salient object in the current target characteristic diagram is strengthened.

Step S802, connecting a plurality of first enhanced feature maps with the current target feature map in series to obtain a second enhanced feature map corresponding to the current target feature map;

for example: the first enhanced feature maps are two feature maps 64 × 128 (corresponding to the width W × height H × the number of channels C of the map, respectively), and are connected in series with the target feature maps 64 × 128 to obtain second enhanced feature maps 64 × 384, and the series connection is the summation of the number of channels.

Step S803, a position enhanced feature map corresponding to the current target feature map is obtained, and the second enhanced feature map and the position enhanced feature map are connected in series to obtain an enhanced feature map corresponding to the target feature map, where a scale of the position enhanced feature map is the same as a scale of the second enhanced feature map.

The specific process is as follows:

a) determining a central line of the salient object in the X direction and a central line of the salient object in the Y direction based on the initial position of the salient object on a target matrix, wherein the target matrix is a single-channel matrix with the same scale as the second enhanced feature map, and the value of each element of the target matrix can be 0;

b) setting the central line of the target matrix in the Y direction as a first target value, and linearly converting the central line into a second target value along the X direction to obtain a position strengthening characteristic diagram in the X direction;

c) setting the central line of the target matrix in the X direction as a first target value, and linearly converting the central line into a second target value along the Y direction to obtain a position strengthening characteristic diagram in the Y direction;

d) and taking the position strengthening characteristic diagram in the X direction and the position strengthening characteristic diagram in the Y direction as position strengthening characteristic diagrams.

The first target value may be 1, and the second target value may be 0. A schematic diagram of deriving a location-enhanced feature map based on the initial location of a salient object is shown in fig. 9.

Referring to fig. 10, the process of the enhancement processing (represented by LCB) may determine, according to a single-channel binarization feature map, positioning information (including an initial position, a pixel width, a pixel height, a center line in the X direction, and a center line in the Y direction) of a corresponding portion of the saliency object, further crop, according to the positioning information, a feature map in a highest dimension of the multi-scale feature map at least two different expansion scales to obtain a plurality of cropped feature maps, determine, according to the positioning information, a position enhancement feature map, then calculate a correlation feature map between the plurality of cropped feature maps and a current target feature map in the multi-scale feature map, perform a product operation on the correlation feature map and the current target feature map, and connect an obtained result with the current target feature map and a position enhancement feature map corresponding to the current target feature map in series to obtain an enhancement feature map corresponding to the current target feature map, and obtaining the reinforced characteristic diagram corresponding to each target characteristic diagram.

The reinforced feature map greatly reinforces the feature part of the salient object, so that the mask of the salient object obtained by segmentation has higher accuracy.

In an optional embodiment of the present invention, the upsampling the multi-scale enhanced feature map to obtain a significant object mask corresponding to the image to be processed includes: and fusing the enhanced characteristic images with different scales to further obtain a significant object mask corresponding to the image to be processed.

In the embodiment of the present invention, the process of upsampling fusion may be: as shown in fig. 4, the enhanced feature map of the layer 5 (i.e., the feature map obtained after the layer 5 downsampling is subjected to GRB processing and LCB processing) is firstly subjected to SRB processing (SRB processing is already described above and is not described here again), after SRB processing, upsampling is performed, the upsampled feature map is added to the enhanced feature map of the layer 4, then SRB processing is performed, then upsampling is performed, after the upsampled feature map is added to the enhanced feature map of the layer 3, and finally, after SRB processing, 4-fold amplification is performed to obtain the significant object mask.

The inventor trains and tests the image processing method (represented by LCANet) of the invention and the conventional salient object segmentation method on a plurality of public data sets (respectively comprising a DUTS-TE data set, an ECSSD data set, a HKU-IS data set, a PASCAL-S data set and a DUT-OM data set), and the result IS shown in FIG. 11, which shows that when the image processing method of the invention IS used for processing the image, the accuracy of the obtained salient object mask IS higher (in FIG. 11, the larger the value of the maxF parameter IS, the higher the accuracy IS, and the smaller the value of the MAE parameter IS, the higher the accuracy IS). In addition, referring to fig. 12, it can be seen from the result of the visualization that the accuracy of the image processing method of the present invention is better. In fig. 12, GT represents the segmentation result of the artificially labeled salient object, LCANet column represents the segmentation result of the salient object of the present invention, and the other columns represent the segmentation results of the salient object obtained by other methods (the corresponding method is labeled below each column).

Example 3:

an embodiment of the present invention further provides an image processing apparatus, which is mainly used for executing the image processing method provided by the foregoing content of the embodiment of the present invention, and the image processing apparatus provided by the embodiment of the present invention is specifically described below.

Fig. 13 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention, which mainly includes, as shown in fig. 13: a feature extraction unit 10, an enhancement processing unit 20 and an image restoration unit 30, wherein:

the characteristic extraction unit is used for acquiring an image to be processed and extracting the characteristics of the image to be processed to obtain a multi-scale characteristic diagram;

the strengthening processing unit is used for strengthening the part corresponding to the salient object in the multi-scale characteristic diagram to obtain the multi-scale strengthening characteristic diagram;

and the image restoration unit is used for carrying out image restoration on the multi-scale enhanced characteristic image to obtain a significant object mask corresponding to the image to be processed.

Optionally, the feature extraction unit is further configured to: carrying out multi-layer down-sampling processing on an image to be processed to obtain a multi-scale original characteristic diagram; and optimizing the multi-scale original characteristic diagram to obtain the multi-scale characteristic diagram.

Optionally, the feature extraction unit is further configured to: performing first optimization processing on a target original feature map in the multi-scale original feature map to obtain a first optimized feature map, wherein the target original feature map is a feature map in the multi-scale original feature map except for the highest-dimension original feature map; performing second optimization processing on the highest-dimensional original feature map in the multi-scale original feature map to obtain a second optimized feature map; and taking the first optimized feature map and the second optimized feature map as multi-scale feature maps.

Optionally, the feature extraction unit is further configured to: optimizing the target original feature map by using a first optimization module to obtain a first initially optimized feature map, wherein the first optimization module comprises: presetting a number of first winding layers; and adding the first initially optimized feature map and the corresponding target original feature map to obtain a first optimized feature map.

Optionally, the feature extraction unit is further configured to: and optimizing the highest-dimensional original feature map by using a second optimization module to obtain an optimization weight, wherein the second optimization module comprises: the second convolution layer, the global pooling layer and the Sigmoid function processing layer; performing product operation on the optimized weight and the highest-dimensional original feature map to obtain a second initially optimized feature map; and adding the second initially optimized feature map and the highest-dimension original feature map to obtain a second optimized feature map.

Optionally, the reinforcement processing unit is further configured to: obtaining the initial position of the salient object according to the multi-scale feature map; according to the initial position, cutting the feature map with the highest dimension in the multi-scale feature map with at least two different expansion scales to obtain a plurality of cutting feature maps, wherein the plurality of cutting feature maps comprise feature information of a salient object; taking one or more of the multi-scale feature maps as target feature maps, taking each target feature map as a current target feature map one by one, and calculating the correlation degrees of the plurality of cutting feature maps and the current target feature map to obtain a plurality of correlation degree feature maps of the current target feature map, which are in one-to-one correspondence with the plurality of cutting feature maps; and obtaining a reinforced characteristic diagram corresponding to the current target characteristic diagram according to the plurality of correlation characteristic diagrams and the current target characteristic diagram.

Optionally, the reinforcement processing unit is further configured to: carrying out dimensionality reduction processing on the feature map with the highest dimension in the multi-scale feature map to obtain a feature map with a single channel; carrying out binarization processing on the single-channel feature map to obtain a single-channel binarization feature map; and determining the initial position of the salient object according to the binarization feature map of the single channel.

Optionally, the reinforcement processing unit is further configured to: determining the pixel width and the pixel height of the salient object according to the initial position; determining an extended pixel width and an extended pixel height according to the extended scale, the pixel width and the pixel height; in the feature map of the highest dimension, clipping is performed along the positions where the initial positions are extended by the extended pixel width and the extended pixel height.

Optionally, the reinforcement processing unit is further configured to: zooming the plurality of cutting feature maps to a preset scale to obtain a plurality of cutting feature maps with the preset scale; sliding on the current target characteristic diagram by taking a preset scale as a sliding window; and respectively carrying out product operation on the feature graph contained in the sliding window after each sliding and the plurality of cutting feature graphs with preset scales, and obtaining a plurality of relevancy feature graphs, corresponding to the plurality of cutting feature graphs, of the current target feature graph according to the result of the product operation.

Optionally, the reinforcement processing unit is further configured to: performing product operation on each correlation degree feature map in the multiple correlation degree feature maps and the current target feature map to obtain multiple first strengthened feature maps corresponding to the current target feature map; connecting a plurality of first enhanced feature maps with the current target feature map in series to obtain a second enhanced feature map corresponding to the current target feature map; and acquiring a position strengthening feature map corresponding to the current target feature map, and connecting the second strengthening feature map and the position strengthening feature map in series to obtain a strengthening feature map corresponding to the current target feature map, wherein the scale of the position strengthening feature map is the same as that of the second strengthening feature map.

Optionally, the reinforcement processing unit is further configured to: determining a central line of the salient object in the X direction and a central line of the salient object in the Y direction based on the initial position of the salient object; setting a central line in the Y direction as a first target value, and linearly converting the central line in the Y direction into a second target value along the X direction to obtain a position strengthening characteristic diagram in the X direction; setting the central line in the X direction as a first target value, and linearly converting the central line in the X direction into a second target value along the Y direction to obtain a position strengthening characteristic diagram in the Y direction; and taking the position strengthening characteristic diagram in the X direction and the position strengthening characteristic diagram in the Y direction as position strengthening characteristic diagrams.

Optionally, the image restoration unit is further configured to: and performing up-sampling on the multi-scale enhanced characteristic image to obtain a significant object mask corresponding to the image to be processed.

The image processing apparatus provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiment in the foregoing embodiment 2, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiment for the part where the embodiment of the apparatus is not mentioned.

In another embodiment, there is also provided a computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the steps of the method as set forth in any of the above embodiments 2.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

acquiring an image to be processed, and performing feature extraction on the image to be processed to obtain a multi-scale feature map;

performing reinforcement processing on a part corresponding to the salient object in the multi-scale feature map to obtain a multi-scale reinforced feature map;

and carrying out image restoration on the multi-scale enhanced feature map to obtain a significant object mask corresponding to the image to be processed.

2. The method of claim 1, wherein feature extracting the image to be processed comprises:

carrying out multilayer downsampling processing on the image to be processed to obtain a multi-scale original characteristic diagram;

and optimizing the multi-scale original characteristic diagram to obtain the multi-scale characteristic diagram.

3. The method of claim 1 or 2, wherein optimizing the multi-scale raw feature map comprises:

performing first optimization processing on a target original feature map in the multi-scale original feature map to obtain a first optimized feature map, wherein the target original feature map is a feature map in the multi-scale original feature map except for a highest-dimension original feature map;

performing second optimization processing on the highest-dimensional original feature map in the multi-scale original feature map to obtain a second optimized feature map;

and taking the first optimized feature map and the second optimized feature map as the multi-scale feature map.

4. The method of claim 3, wherein performing the first optimization process on the target original feature map in the multi-scale original feature map comprises:

optimizing the target original feature map by using a first optimization module to obtain a first initially optimized feature map, wherein the first optimization module comprises: presetting a number of first winding layers;

and adding the first initially optimized feature map and the corresponding target original feature map to obtain the first optimized feature map.

5. The method according to claim 3 or 4, wherein performing the second optimization process on the highest-dimensional original feature map in the multi-scale original feature maps comprises:

optimizing the highest-dimensional original feature map by using a second optimization module to obtain an optimized weight, wherein the second optimization module comprises: the second convolution layer, the global pooling layer and the Sigmoid function processing layer;

performing product operation on the optimization weight and the highest-dimensional original feature map to obtain a second initially optimized feature map;

and adding the second initially optimized feature map and the highest-dimension original feature map to obtain the second optimized feature map.

6. The method according to any one of claims 1 to 5, wherein the enhancing the portion of the multi-scale feature map corresponding to the salient object comprises:

obtaining the initial position of the salient object according to the multi-scale feature map;

according to the initial position, cutting the feature map with the highest dimension in the multi-scale feature map with at least two different expansion scales to obtain a plurality of cut feature maps, wherein the plurality of cut feature maps comprise feature information of the salient object;

taking one or more of the multi-scale feature maps as target feature maps, taking each target feature map as a current target feature map one by one, and calculating the correlation degree between the plurality of cutting feature maps and the current target feature map to obtain a plurality of correlation degree feature maps of the current target feature map, which are in one-to-one correspondence with the plurality of cutting feature maps;

and obtaining a reinforced feature map corresponding to the current target feature map according to the plurality of correlation feature maps and the current target feature map.

7. The method of claim 6, wherein obtaining the initial position of the salient object from the multi-scale feature map comprises:

performing dimensionality reduction processing on the feature map with the highest dimension in the multi-scale feature map to obtain a single-channel feature map;

carrying out binarization processing on the single-channel feature map to obtain a single-channel binarization feature map;

and determining the initial position of the salient object according to the binarization feature map of the single channel.

8. The method of claim 6 or 7, wherein cropping the highest-dimensional feature map of the multi-scale feature map at least two different expansion scales according to the initial position comprises:

determining the pixel width and the pixel height of the salient object according to the initial position;

determining an extended pixel width and an extended pixel height according to the extended scale, the pixel width and the pixel height;

and in the feature map with the highest dimension, clipping is carried out along the position of the initial position after the expansion pixel width and the expansion pixel height are expanded.

9. The method of any of claims 6-8, wherein calculating the relevance of the plurality of cropped feature maps to the current target feature map comprises:

zooming the plurality of cutting feature maps to a preset scale to obtain a plurality of cutting feature maps with the preset scale;

sliding on the current target feature map by taking the preset scale as a sliding window;

and respectively carrying out product operation on the feature graph contained in the sliding window after each sliding and the plurality of cutting feature graphs with the preset scale, and obtaining a plurality of relevancy feature graphs, corresponding to the plurality of cutting feature graphs, of the current target feature graph according to the result of the product operation.

10. The method according to any one of claims 6 to 9, wherein obtaining the enhanced feature map corresponding to the current target feature map according to the plurality of correlation feature maps and the current target feature map comprises:

performing product operation on each correlation degree feature map in the plurality of correlation degree feature maps and the current target feature map to obtain a plurality of first strengthened feature maps corresponding to the current target feature map;

connecting the plurality of first enhanced feature maps with the current target feature map in series to obtain a second enhanced feature map corresponding to the current target feature map;

and acquiring a position strengthening feature map corresponding to the current target feature map, and connecting the second strengthening feature map and the position strengthening feature map in series to obtain a strengthening feature map corresponding to the current target feature map, wherein the scale of the position strengthening feature map is the same as that of the second strengthening feature map.

11. The method of claim 10, wherein obtaining the location-enhanced feature map corresponding to the current target feature map comprises:

determining a central line of the salient object in an X direction and a central line of the salient object in a Y direction based on the initial position of the salient object;

setting the central line in the Y direction as a first target value, and linearly converting the central line in the Y direction into a second target value along the X direction to obtain a position strengthening feature map in the X direction;

setting the central line in the X direction as the first target value, and linearly converting the central line in the X direction into the second target value along the Y direction to obtain a position strengthening feature map in the Y direction;

and taking the position strengthening characteristic diagram in the X direction and the position strengthening characteristic diagram in the Y direction as the position strengthening characteristic diagrams.

12. The method of any one of claims 1-11, wherein image restoring the multi-scale enhanced feature map comprises:

and performing up-sampling on the multi-scale enhanced feature map to obtain a significant object mask corresponding to the image to be processed.

13. An image processing apparatus characterized by comprising:

the device comprises a characteristic extraction unit, a feature extraction unit and a feature extraction unit, wherein the characteristic extraction unit is used for acquiring an image to be processed and extracting the characteristics of the image to be processed to obtain a multi-scale characteristic map;

the strengthening processing unit is used for strengthening the part corresponding to the salient object in the multi-scale feature map to obtain a multi-scale strengthening feature map;

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of the preceding claims 1 to 12 are implemented when the computer program is executed by the processor.

15. A computer-readable medium having non-volatile program code executable by a processor, characterized in that the program code causes the processor to perform the steps of the method of any of the preceding claims 1 to 12.