CN112001931A - Image segmentation method, device, equipment and storage medium - Google Patents

Image segmentation method, device, equipment and storage medium Download PDF

Info

Publication number
CN112001931A
CN112001931A CN202010857319.1A CN202010857319A CN112001931A CN 112001931 A CN112001931 A CN 112001931A CN 202010857319 A CN202010857319 A CN 202010857319A CN 112001931 A CN112001931 A CN 112001931A
Authority
CN
China
Prior art keywords
image segmentation
feature map
pooling
characteristic diagram
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010857319.1A
Other languages
Chinese (zh)
Inventor
丁子凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eye Control Technology Co Ltd
Original Assignee
Shanghai Eye Control Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eye Control Technology Co Ltd filed Critical Shanghai Eye Control Technology Co Ltd
Priority to CN202010857319.1A priority Critical patent/CN112001931A/en
Publication of CN112001931A publication Critical patent/CN112001931A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The embodiment of the invention provides an image segmentation method, an image segmentation device, image segmentation equipment and a storage medium, wherein an initial characteristic diagram is extracted from an image to be processed through a base network of an image segmentation model; pooling the initial characteristic graph through an average pooling sub-model of the image segmentation model to obtain a first characteristic graph carrying short-distance dependency relationship information; processing the initial characteristic diagram through at least one branch sub-model to obtain at least one target characteristic diagram, wherein the target characteristic diagram comprises a second characteristic diagram carrying global dependency relationship information and/or a third characteristic diagram carrying long-distance dependency relationship information; and cascading the first characteristic diagram and the target characteristic diagram, and then convolving to obtain and output an image segmentation result. The invention arranges branch submodels in parallel with the average pooling submodel to obtain the characteristic diagram carrying the global dependency relationship information and/or carrying the long-distance dependency relationship information, and cascades the characteristic diagram carrying the short-distance dependency relationship information obtained by the average pooling submodel, thereby enhancing the characteristic representation capability and improving the accuracy of image segmentation.

Description

Image segmentation method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of image processing, in particular to an image segmentation method, an image segmentation device, image segmentation equipment and a storage medium.
Background
With the continuous development of AI technology and image processing technology, it has become a new trend to research AI technology to improve our lifestyle to serve humans using AI image technology. Image segmentation is an important and key image analysis technology, and aims to divide an image into regions with characteristics and extract interested parts, and the result of image segmentation is the basis of image understanding such as image feature extraction and recognition, so that the image segmentation has an important position in the field of computer vision, and also faces new challenges.
In the prior art, neural network models such as FCN, Segnet, Pspnet and the like are generally adopted for image segmentation, and although the methods such as FCN, Segnet, Pspnet and the like are helpful for capturing objects with different proportions by fusing context information during image segmentation, the relationship between the objects in a global view cannot be utilized, so that the segmentation effect is not good; when the convolution operation is performed on the neural network, a local receptive field is generated, which may cause that the corresponding characteristics of pixels of the same label may be different, and such difference may further cause inconsistency within the class, thereby affecting the accuracy of identification.
Disclosure of Invention
The embodiment of the invention provides an image segmentation method, an image segmentation device, image segmentation equipment and a storage medium, which are used for enhancing feature representation and improving the segmentation effect of an image.
A first aspect of an embodiment of the present invention provides an image segmentation method, including:
inputting an image to be processed into an image segmentation model, and extracting an initial characteristic map from the image to be processed through a base network of the image segmentation model;
pooling the initial characteristic graph through an average pooling sub-model of the image segmentation model to obtain a first characteristic graph carrying short-distance dependency relationship information;
processing the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map, wherein the at least one target feature map comprises a second feature map carrying global dependency relationship information and/or a third feature map carrying long-distance dependency relationship information;
and cascading the first characteristic diagram and the target characteristic diagram, performing convolution processing on a cascading result to obtain an image segmentation result, and outputting the image segmentation result.
In a possible implementation manner, the processing the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map includes:
processing the initial characteristic diagram through an attention mechanism sub-model of the image segmentation model to obtain a second characteristic diagram carrying global dependency relationship information; and/or
Pooling the initial characteristic diagram through a stripe pooling submodel of the image segmentation model to obtain a third characteristic diagram carrying long-distance dependency relationship information.
In a possible implementation manner, the processing the initial feature map by the attention mechanism sub-model of the image segmentation model to obtain a second feature map carrying global dependency relationship information includes:
performing convolution processing on the initial characteristic diagram through a convolution layer of the attention mechanism submodel to obtain an initial tensor, and performing reshaping processing on the initial tensor to convert the initial tensor from a third order to a second order to obtain an intermediate tensor;
acquiring an attention matrix according to the intermediate tensor, wherein the attention matrix is used for representing the correlation among the characteristics of the positions of the intermediate tensor;
and multiplying the intermediate tensor by the attention moment array to obtain a fourth characteristic diagram, and adding the initial characteristic diagram and the fourth characteristic diagram to obtain the second characteristic diagram.
In one possible implementation, the obtaining an attention matrix according to the intermediate tensor, where the attention matrix is used to represent a correlation between position features of the intermediate tensor, includes:
multiplying the transpose of the intermediate tensor by the intermediate tensor to obtain an eigen matrix;
and inputting the feature matrix into a Softmax layer of the attention mechanism submodel to acquire the attention matrix.
In a possible implementation manner, the pooling the initial feature map by the stripe pooling sub-model of the image segmentation model to obtain a third feature map carrying long-distance dependency relationship information includes:
performing horizontal stripe pooling and vertical stripe pooling on the initial characteristic map through the stripe pooling sub-model;
and performing convolution and up-sampling on the horizontal stripe pooling result and the vertical stripe pooling result respectively, and cascading the up-sampling results to obtain the third characteristic diagram.
In a possible implementation manner, the pooling the initial feature map by using the average pooling submodel of the image segmentation model to obtain the first feature map carrying the short-distance dependency relationship information includes:
pyramid pooling is carried out on the initial feature map through the average pooling sub-model;
and performing convolution and up-sampling on the pyramid pooling results respectively, and cascading the up-sampling results to obtain the first characteristic diagram.
In one possible implementation, the method further includes:
acquiring training data, wherein the training data are training images which are segmented and labeled by images;
carrying out transformation operation on the training image, wherein the training image after the transformation operation is also used as the training data, and the transformation operation comprises at least one of translation, scaling and rotation;
and training the image segmentation model according to the training data.
A second aspect of an embodiment of the present invention provides an image segmentation apparatus, including:
the input module is used for inputting the image to be processed into the image segmentation model;
the processing module is used for extracting an initial characteristic map from the image to be processed through a base network of the image segmentation model; pooling the initial characteristic graph through an average pooling sub-model of the image segmentation model to obtain a first characteristic graph carrying short-distance dependency relationship information; processing the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map, wherein the at least one target feature map comprises a second feature map carrying global dependency relationship information and/or a third feature map carrying long-distance dependency relationship information; cascading the first feature map and the target feature map, and performing convolution processing on a cascading result to obtain an image segmentation result;
and the output module is used for outputting the image segmentation result.
In a possible implementation manner, when the processing module processes the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map, the processing module is configured to:
processing the initial characteristic diagram through an attention mechanism sub-model of the image segmentation model to obtain a second characteristic diagram carrying global dependency relationship information; and/or
Pooling the initial characteristic diagram through a stripe pooling submodel of the image segmentation model to obtain a third characteristic diagram carrying long-distance dependency relationship information.
In a possible implementation manner, when the processing module processes the initial feature map through an attention mechanism sub-model of the image segmentation model to obtain a second feature map carrying global dependency relationship information, the processing module is configured to:
performing convolution processing on the initial characteristic diagram through a convolution layer of the attention mechanism submodel to obtain an initial tensor, and performing reshaping processing on the initial tensor to convert the initial tensor from a third order to a second order to obtain an intermediate tensor;
acquiring an attention matrix according to the intermediate tensor, wherein the attention matrix is used for representing the correlation among the characteristics of the positions of the intermediate tensor;
and multiplying the intermediate tensor by the attention moment array to obtain a fourth characteristic diagram, and adding the initial characteristic diagram and the fourth characteristic diagram to obtain the second characteristic diagram.
In one possible implementation, the processing module, when obtaining an attention matrix from the intermediate tensor, the attention matrix being used to represent a correlation between the position features of the intermediate tensor, is configured to:
multiplying the transpose of the intermediate tensor by the intermediate tensor to obtain an eigen matrix;
and inputting the feature matrix into a Softmax layer of the attention mechanism submodel to acquire the attention matrix.
In a possible implementation manner, when the initial feature map is pooled through a stripe pooling sub-model of the image segmentation model and a third feature map carrying long-distance dependency relationship information is obtained, the processing module is configured to:
performing horizontal stripe pooling and vertical stripe pooling on the initial characteristic map through the stripe pooling sub-model;
and performing convolution and up-sampling on the horizontal stripe pooling result and the vertical stripe pooling result respectively, and cascading the up-sampling results to obtain the third characteristic diagram.
In a possible implementation manner, when the initial feature map is pooled through an average pooling sub-model of the image segmentation model and a first feature map carrying short-distance dependency relationship information is obtained, the processing module is configured to:
pyramid pooling is carried out on the initial feature map through the average pooling sub-model;
and performing convolution and up-sampling on the pyramid pooling results respectively, and cascading the up-sampling results to obtain the first characteristic diagram.
In one possible implementation, the processing module is further configured to:
acquiring training data, wherein the training data are training images which are segmented and labeled by images;
carrying out transformation operation on the training image, wherein the training image after the transformation operation is also used as the training data, and the transformation operation comprises at least one of translation, scaling and rotation;
and training the image segmentation model according to the training data.
A third aspect of embodiments of the present invention is to provide a computer device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of the first aspect.
A fourth aspect of embodiments of the present invention is to provide a computer-readable storage medium having stored thereon a computer program;
which when executed by a processor implements the method according to the first aspect.
According to the image segmentation method, the device, the equipment and the storage medium provided by the embodiment of the invention, the initial characteristic diagram of the image to be processed is extracted through the base network of the image segmentation model; pooling the initial characteristic graph through an average pooling sub-model of the image segmentation model to obtain a first characteristic graph carrying short-distance dependency relationship information; processing the initial characteristic diagram through at least one branch sub-model of the image segmentation model to obtain at least one target characteristic diagram, wherein the at least one target characteristic diagram comprises a second characteristic diagram carrying global dependency relationship information and/or a third characteristic diagram carrying long-distance dependency relationship information; and cascading the first characteristic diagram and the target characteristic diagram, performing convolution processing on a cascading result to obtain an image segmentation result, and outputting the image segmentation result. In the embodiment of the invention, the branch submodels are arranged in parallel in the average pooling submodel in the image segmentation model, the characteristic graph carrying the global dependency relationship information and/or the long-distance dependency relationship information is obtained and is cascaded with the characteristic graph carrying the short-distance dependency relationship information obtained by the average pooling submodel, so that the characteristic representation capability is effectively enhanced, and the accuracy of image segmentation is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram illustrating an image segmentation model according to an embodiment of the present invention;
FIG. 2 is a flowchart of an image segmentation method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an image segmentation model according to another embodiment of the present invention;
FIG. 4 is a flowchart of an image segmentation method according to another embodiment of the present invention;
FIG. 5 is a flowchart of an image segmentation method according to another embodiment of the present invention;
FIG. 6 is a schematic diagram of a stripe pooling process provided in accordance with an embodiment of the present invention;
FIG. 7 is a flowchart of an image segmentation method according to another embodiment of the present invention;
FIG. 8 is a block diagram of an image segmentation apparatus according to an embodiment of the present invention;
fig. 9 is a block diagram of a computer device for performing an image segmentation method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present invention without any creative efforts shall fall within the protection scope of the embodiments of the present invention.
In the prior art, neural network models such as FCN, Segnet, Pspnet and the like are generally adopted for image segmentation, and although the methods such as FCN, Segnet, Pspnet and the like are helpful for capturing objects with different proportions by fusing context information during image segmentation, the relationship between the objects in a global view cannot be utilized, so that the segmentation effect is not good; when the convolution operation is performed on the neural network, a local receptive field is generated, which may cause that the corresponding characteristics of pixels of the same label may be different, and such difference may further cause inconsistency within the class, thereby affecting the accuracy of identification.
In order to solve the above problems, considering that neural network models such as Pspnet can only capture short-distance dependency relationships between different positions in an image but cannot capture global feature dependency relationships, in the embodiment of the present invention, a global feature dependency relationship is introduced, a rich context relationship is established on the basis of a local feature, a branch network is added in parallel with an average Pooling layer (Pyramid Pooling Module) of the Pspnet network on the basis of the Pspnet network, so as to obtain a feature map carrying global dependency relationship information and/or a feature map carrying long-distance dependency relationship information, and concat cascade with the feature map obtained by the Pooling layer, thereby enhancing the feature representation capability and improving the accuracy of image segmentation and identification. The Attention mechanism (Attention) can well capture the global feature dependency relationship in the space, and establish rich context relationship on the local features, so that the Attention mechanism (Attention) can be used as a branch network in the embodiment of the invention; the stripe pooling can also well capture the long-distance dependency relationship between different positions, and can make the connection between the discretely distributed regions in the whole scene and the connection between the coding regions of the strip structure possible, and can also be used as a branch network in the embodiment of the present invention. Therefore, in the embodiment of the present invention, the attention mechanism model and/or the stripe pooling model may be selected as a branch network in the Pspnet network parallel to the average pooling layer, such as the network architecture shown in fig. 1, and the network branch parallel to the average pooling layer is the stripe pooling model, but of course, only the attention mechanism model may be parallel, or only the stripe pooling model may be parallel.
The image segmentation process is described in detail below with reference to specific embodiments.
Fig. 2 is a flowchart of an image segmentation method according to an embodiment of the present invention. The embodiment provides an image segmentation method, the execution subject of which is a computer device with processing capability, the image segmentation method comprises the following specific steps:
s201, inputting an image to be processed into an image segmentation model, and extracting an initial characteristic diagram from the image to be processed through a base network of the image segmentation model.
In this embodiment, the base network in the image segmentation model is used to extract a feature map from the to-be-processed image of the neural network, and may include a convolutional layer, a full connection layer, and the like, for example, may be a residual error network (Resnet), including Resnet18, Resnet50, Resnet101, and the like, which are not described in detail herein.
S202, pooling the initial feature map through an average pooling sub-model of the image segmentation model to obtain a first feature map carrying short-distance dependency relationship information.
In this embodiment, Average Pooling (Average Pooling) may be performed on the initial feature map, that is, feature points in a neighborhood are averaged, so that downsampling of the initial feature map is achieved, and the first feature map carrying short-distance dependency information is obtained. The average pooling in the embodiment can well capture the short-distance dependency relationship between different positions, and is especially necessary for applying the average pooling under the condition that the semantic regions are closely distributed.
Optionally, in this embodiment, the average pooling sub-model may adopt a pyramid pooling module (pyramid pooling module), and the pyramid pooling module performs pyramid pooling on the initial feature map through pooling cores of different pyramid levels to obtain features of different levels, performs convolution respectively, then performs upsampling to obtain feature maps of the same size as the initial feature map, and performs concatant cascade on the feature maps to obtain the first feature map, thereby fusing features of different pyramid scales and reducing information loss between different regions.
S203, processing the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map, wherein the at least one target feature map comprises a second feature map carrying global dependency relationship information and/or a third feature map carrying long-distance dependency relationship information.
In this embodiment, at least one branch sub-model is arranged in parallel with the average pooling sub-model in the image segmentation model, and a target feature map of one dimension, such as a global-dimension feature map or a long-distance-dimension feature map, may be obtained for every branch sub-model in the at least one branch sub-model, that is, the target feature map carries global dependency information or carries long-distance dependency information.
Optionally, in this embodiment, the initial feature map may be processed by an attention mechanism sub-model, and a second feature map carrying global dependency relationship information is obtained; and/or pooling the initial characteristic map through a stripe pooling sub-model to obtain a third characteristic map carrying long-distance dependency relationship information.
The attention mechanism can well capture the global feature dependency relationship in the space, establish rich context relationship on local features, and encode wider context information into the local features, thereby enhancing the representation capability of the local features; while stripe pooling may well capture long distance dependencies between different locations and may enable connections between discretely distributed regions throughout the scene and connections between coding regions of the strip structure.
And S204, cascading the first characteristic diagram and the target characteristic diagram, performing convolution processing on a cascading result to obtain an image segmentation result, and outputting the image segmentation result.
In this embodiment, concat concatenation is performed on the first feature map and the target feature map, convolution processing is performed on a concatenated result to obtain an image segmentation result, the feature representation is made more robust through concat concatenation, and finally, significant improvement of the segmentation effect is further achieved.
In the image segmentation method provided by the embodiment, an initial feature map is extracted from an image to be processed through a base network of an image segmentation model; pooling the initial characteristic graph through an average pooling sub-model of the image segmentation model to obtain a first characteristic graph carrying short-distance dependency relationship information; processing the initial characteristic diagram through at least one branch sub-model of the image segmentation model to obtain at least one target characteristic diagram, wherein the at least one target characteristic diagram comprises a second characteristic diagram carrying global dependency relationship information and/or a third characteristic diagram carrying long-distance dependency relationship information; and cascading the first characteristic diagram and the target characteristic diagram, performing convolution processing on a cascading result to obtain an image segmentation result, and outputting the image segmentation result. In the embodiment, the branch submodels are arranged in parallel in the average pooling submodel in the image segmentation model, the characteristic diagram carrying the global dependency relationship information and/or the long-distance dependency relationship information is obtained and is cascaded with the characteristic diagram carrying the short-distance dependency relationship information obtained by the average pooling submodel, so that the characteristic representation capability is effectively enhanced, and the accuracy of image segmentation is improved.
Based on the foregoing embodiment, at least one branch sub-model in the image segmentation model of this embodiment may be an attention mechanism sub-model and/or a streak pooling sub-model, and the image segmentation model is specifically shown in fig. 3, and further, in S203, the processing the initial feature map by the at least one branch sub-model of the image segmentation model to obtain at least one target feature map may include:
processing the initial characteristic diagram through an attention mechanism sub-model of the image segmentation model to obtain a second characteristic diagram carrying global dependency relationship information; and/or
Pooling the initial characteristic diagram through a stripe pooling submodel of the image segmentation model to obtain a third characteristic diagram carrying long-distance dependency relationship information.
On the basis of the foregoing embodiment, as shown in fig. 4, the processing the initial feature map through the attention mechanism sub-model of the image segmentation model to obtain a second feature map carrying global dependency information may include:
s301, performing convolution processing on the initial characteristic diagram through a convolution layer of the attention mechanism submodel to obtain an initial tensor, and performing remodeling processing on the initial tensor to convert the initial tensor from a third order to a second order to obtain an intermediate tensor;
s302, acquiring an attention matrix according to the intermediate tensor, wherein the attention matrix is used for expressing the correlation among the characteristics of each position of the intermediate tensor;
and S303, multiplying the intermediate tensor by the attention moment array to obtain a fourth feature map, and adding the initial feature map and the fourth feature map to obtain the second feature map.
In this embodiment, as shown in the part of the attention mechanism in fig. 3, an initial feature map X (C × H × W) is sent to the convolution layer to obtain an initial tensor, where X1, X2, and X3 are initial tensors of the same shape and all have the size of C × H × W, and the initial tensor is subjected to reshape reshaping processing to obtain an intermediate tensor of the size of C × N, where N ═ H × W, that is, the feature of the original H × W is given to a flitten to flatten, and then, the original H × W is reduced from three dimensions to two dimensions;
further, an attention matrix is obtained according to the intermediate tensor, and is used for expressing the correlation among the characteristics of each position of the intermediate tensor, specifically, the transposition of the intermediate tensor is multiplied by the intermediate tensor to obtain an characteristic matrix; and inputting the feature matrix into a Softmax layer of the attention mechanism submodel to acquire the attention matrix. That is, X1 and X2 are obtained after reshape
Figure BDA0002646878080000091
To pair
Figure BDA0002646878080000092
Is transposed and is combined with
Figure BDA0002646878080000101
Multiplying to obtain a feature matrix (adjacency matrix) of N × N, and marking as M, namely (H × W) × (H × W) size, sending the obtained feature matrix M of N × N into Softmax to obtain an attention matrix P (N × N), wherein the values in the attention matrix P reflect the phases of the features at two positionsRelevance, i.e. the more similar the representation of the features of two locations, the greater the correlation between them.
Wherein, the attention matrix calculated by Softmax can be specifically calculated by the following formula:
Figure BDA0002646878080000102
wherein, PijFor the values in the feature matrix M, i ∈ (0, N-1), j ∈ (0, N-1), SijIs a value in the attention matrix P for measuring the influence of the ith position on the jth position, i.e. the degree/correlation of the association between the ith position and the jth position, SijThe larger the i-th and j-th locations are.
After obtaining the attention matrix P, X3 is subjected to reshape
Figure BDA0002646878080000103
And multiplying the attention matrix P to obtain a fourth feature map T (C H W), and finally adding the fourth feature map T and the initial feature map X to obtain the second feature map for subsequent concat cascade connection with the first feature map.
In this embodiment, the attention mechanism described above employs a Position attention Module (Position attention Module), and captures the long-range dependency of a space (picture) based on a non-local mean filtering operation idea, when calculating the output of each pixel Position, the correlation is calculated with all positions in an image instead of only a neighborhood, and then the correlation is used as a weight to represent the similarity between other positions and the current Position to be calculated, so as to encode more extensive context information into a local feature, thereby enhancing the representation capability thereof.
On the basis of any of the above embodiments, as shown in fig. 5, the pooling the initial feature map by the stripe pooling sub-model of the image segmentation model to obtain a third feature map carrying long-distance dependency relationship information may include:
s401, performing horizontal stripe pooling and vertical stripe pooling on the initial characteristic diagram through the stripe pooling sub-model;
s402, convolution is respectively carried out on the horizontal stripe pooling result and the vertical stripe pooling result, then upsampling is carried out, and the upsampling results are cascaded to obtain the third characteristic diagram.
In this embodiment, the average Pooling in PSPnet may be changed to stripe Pooling by referring to a Pyramid Pooling Module in PSPnet, that is, after stripe Pooling, the pooled result is convolved and then upsampled, and then the upsampled result is cascaded.
For example, after the initial feature map X (C × H × W) is input into the streak pooling sub-model, horizontal streak pooling and vertical streak pooling are performed, as shown in fig. 6, where for more intuition, C ═ 1 is taken as an example, that is, the channel of the initial feature map X is 1, pooling results of H × 1 and 1 × W are obtained after the horizontal streak pooling and the vertical streak pooling are performed, then the feature map is obtained by one-dimensional convolution with a convolution kernel of 3, the size of C × H × W is recovered by performing upsampling, and finally, a third feature map is obtained by concat cascade for subsequent concat cascade with the first feature map.
On the basis of any of the above embodiments, as shown in fig. 7, the method further includes a training process for the image segmentation model, which is specifically as follows:
s501, acquiring training data, wherein the training data are training images which are segmented and labeled by images;
s502, carrying out transformation operation on the training image, wherein the training image after the transformation operation is also used as the training data, and the transformation operation comprises at least one of translation, scaling and rotation;
and S503, training the image segmentation model according to the training data.
In this embodiment, a predetermined number of images may be obtained, and image segmentation and labeling processes may be performed on the images to serve as training data, in order to expand the number of the training data, transformation operations, including but not limited to translation, scaling, rotation, and the like, may also be performed on the training images subjected to the image segmentation and labeling, and the training images subjected to the transformation operations are also used as the training data, so that the constructed image segmentation model may be trained according to the training data, and the training process is not described herein again.
Optionally, after the training data is obtained, the training data may be checked to check whether there is a training image with a label error or a missing label, and the checking process may be manual checking or checking through other approaches.
Furthermore, the trained image segmentation model can be applied to an actual scene, a determination logic can be set according to requirements, and the segmentation result output by the image segmentation model is determined, for example, the trained image segmentation model is applied to the traffic field, and the vehicle appearance, the vehicle type and the road condition analysis can be identified according to the image segmentation result.
Fig. 8 is a structural diagram of an image segmentation apparatus according to an embodiment of the present invention. The image segmentation apparatus provided in this embodiment can execute the processing procedure provided in the embodiment of the image segmentation method, and as shown in fig. 8, the image segmentation apparatus 80 includes an input module 81, a processing module 82, and an output module 83.
An input module 81, configured to input an image to be processed into an image segmentation model;
a processing module 82, configured to extract an initial feature map from the to-be-processed image through a base network of the image segmentation model; pooling the initial characteristic graph through an average pooling sub-model of the image segmentation model to obtain a first characteristic graph carrying short-distance dependency relationship information; processing the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map, wherein the at least one target feature map comprises a second feature map carrying global dependency relationship information and/or a third feature map carrying long-distance dependency relationship information; cascading the first feature map and the target feature map, and performing convolution processing on a cascading result to obtain an image segmentation result;
and an output module 83, configured to output the image segmentation result.
On the basis of the foregoing embodiment, when the processing module 82 processes the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map, the processing module is configured to:
processing the initial characteristic diagram through an attention mechanism sub-model of the image segmentation model to obtain a second characteristic diagram carrying global dependency relationship information; and/or
Pooling the initial characteristic diagram through a stripe pooling submodel of the image segmentation model to obtain a third characteristic diagram carrying long-distance dependency relationship information.
On the basis of any of the above embodiments, when the processing module 82 processes the initial feature map through the attention mechanism sub-model of the image segmentation model to obtain the second feature map carrying the global dependency relationship information, it is configured to:
performing convolution processing on the initial characteristic diagram through a convolution layer of the attention mechanism submodel to obtain an initial tensor, and performing reshaping processing on the initial tensor to convert the initial tensor from a third order to a second order to obtain an intermediate tensor;
acquiring an attention matrix according to the intermediate tensor, wherein the attention matrix is used for representing the correlation among the characteristics of the positions of the intermediate tensor;
and multiplying the intermediate tensor by the attention moment array to obtain a fourth characteristic diagram, and adding the initial characteristic diagram and the fourth characteristic diagram to obtain the second characteristic diagram.
On the basis of any of the foregoing embodiments, the processing module 82, when obtaining an attention matrix from the intermediate tensor, where the attention matrix is used to represent a correlation between the position features of the intermediate tensor, is configured to:
multiplying the transpose of the intermediate tensor by the intermediate tensor to obtain an eigen matrix;
and inputting the feature matrix into a Softmax layer of the attention mechanism submodel to acquire the attention matrix.
On the basis of any of the above embodiments, when the initial feature map is pooled by the stripe pooling sub-model of the image segmentation model and a third feature map carrying long-distance dependency relationship information is obtained, the processing module 82 is configured to:
performing horizontal stripe pooling and vertical stripe pooling on the initial characteristic map through the stripe pooling sub-model;
and performing convolution and up-sampling on the horizontal stripe pooling result and the vertical stripe pooling result respectively, and cascading the up-sampling results to obtain the third characteristic diagram.
On the basis of any of the above embodiments, when the initial feature map is pooled through the average pooling sub-model of the image segmentation model and the first feature map carrying short-distance dependency relationship information is obtained, the processing module 82 is configured to:
pyramid pooling is carried out on the initial feature map through the average pooling sub-model;
and performing convolution and up-sampling on the pyramid pooling results respectively, and cascading the up-sampling results to obtain the first characteristic diagram.
On the basis of any of the above embodiments, the processing module 82 is further configured to:
acquiring training data, wherein the training data are training images which are segmented and labeled by images;
carrying out transformation operation on the training image, wherein the training image after the transformation operation is also used as the training data, and the transformation operation comprises at least one of translation, scaling and rotation;
and training the image segmentation model according to the training data.
The image segmentation apparatus provided in the embodiment of the present invention may be specifically configured to execute the method embodiments provided in fig. 2, 4-5, and 7, and specific functions are not described herein again.
The image segmentation device provided by the embodiment of the invention extracts an initial characteristic diagram from an image to be processed through a base network of an image segmentation model; pooling the initial characteristic graph through an average pooling sub-model of the image segmentation model to obtain a first characteristic graph carrying short-distance dependency relationship information; processing the initial characteristic diagram through at least one branch sub-model of the image segmentation model to obtain at least one target characteristic diagram, wherein the at least one target characteristic diagram comprises a second characteristic diagram carrying global dependency relationship information and/or a third characteristic diagram carrying long-distance dependency relationship information; and cascading the first characteristic diagram and the target characteristic diagram, performing convolution processing on a cascading result to obtain an image segmentation result, and outputting the image segmentation result. In the embodiment, the branch submodels are arranged in parallel in the average pooling submodel in the image segmentation model, the characteristic diagram carrying the global dependency relationship information and/or the long-distance dependency relationship information is obtained and is cascaded with the characteristic diagram carrying the short-distance dependency relationship information obtained by the average pooling submodel, so that the characteristic representation capability is effectively enhanced, and the accuracy of image segmentation is improved.
Fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present invention. The computer device provided in the embodiment of the present invention may execute the processing flow provided in the embodiment of the image segmentation method, as shown in fig. 9, the computer device 90 includes a memory 91, a processor 92, a computer program, and a communication interface 93; wherein a computer program is stored in the memory 91 and is configured to execute the image segmentation method described in the above embodiments by the processor 92.
The computer device of the embodiment shown in fig. 9 can be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
In addition, the present embodiment also provides a computer-readable storage medium on which a computer program is stored, the computer program being executed by a processor to implement the image segmentation method described in the above embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention, and are not limited thereto; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An image segmentation method, comprising:
inputting an image to be processed into an image segmentation model, and extracting an initial characteristic map from the image to be processed through a base network of the image segmentation model;
pooling the initial characteristic graph through an average pooling sub-model of the image segmentation model to obtain a first characteristic graph carrying short-distance dependency relationship information;
processing the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map, wherein the at least one target feature map comprises a second feature map carrying global dependency relationship information and/or a third feature map carrying long-distance dependency relationship information;
and cascading the first characteristic diagram and the target characteristic diagram, performing convolution processing on a cascading result to obtain an image segmentation result, and outputting the image segmentation result.
2. The method of claim 1, wherein the processing the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map comprises:
processing the initial characteristic diagram through an attention mechanism sub-model of the image segmentation model to obtain a second characteristic diagram carrying global dependency relationship information; and/or
Pooling the initial characteristic diagram through a stripe pooling submodel of the image segmentation model to obtain a third characteristic diagram carrying long-distance dependency relationship information.
3. The method of claim 2, wherein the processing the initial feature map through an attention mechanism submodel of the image segmentation model to obtain a second feature map carrying global dependency information comprises:
performing convolution processing on the initial characteristic diagram through a convolution layer of the attention mechanism submodel to obtain an initial tensor, and performing reshaping processing on the initial tensor to convert the initial tensor from a third order to a second order to obtain an intermediate tensor;
acquiring an attention matrix according to the intermediate tensor, wherein the attention matrix is used for representing the correlation among the characteristics of the positions of the intermediate tensor;
and multiplying the intermediate tensor by the attention moment array to obtain a fourth characteristic diagram, and adding the initial characteristic diagram and the fourth characteristic diagram to obtain the second characteristic diagram.
4. The method of claim 3, wherein obtaining an attention matrix from the intermediate tensor, the attention matrix being used to represent a correlation between features at locations of the intermediate tensor comprises:
multiplying the transpose of the intermediate tensor by the intermediate tensor to obtain an eigen matrix;
and inputting the feature matrix into a Softmax layer of the attention mechanism submodel to acquire the attention matrix.
5. The method according to claim 2, wherein pooling the initial feature map by a stripe pooling submodel of the image segmentation model to obtain a third feature map carrying long-distance dependency information comprises:
performing horizontal stripe pooling and vertical stripe pooling on the initial characteristic map through the stripe pooling sub-model;
and performing convolution and up-sampling on the horizontal stripe pooling result and the vertical stripe pooling result respectively, and cascading the up-sampling results to obtain the third characteristic diagram.
6. The method of claim 1, wherein pooling the initial feature map by an average pooling submodel of the image segmentation model to obtain a first feature map carrying short-distance dependency information comprises:
pyramid pooling is carried out on the initial feature map through the average pooling sub-model;
and performing convolution and up-sampling on the pyramid pooling results respectively, and cascading the up-sampling results to obtain the first characteristic diagram.
7. The method of claim 1, further comprising:
acquiring training data, wherein the training data are training images which are segmented and labeled by images;
carrying out transformation operation on the training image, wherein the training image after the transformation operation is also used as the training data, and the transformation operation comprises at least one of translation, scaling and rotation;
and training the image segmentation model according to the training data.
8. An image segmentation apparatus, comprising:
the input module is used for inputting the image to be processed into the image segmentation model;
the processing module is used for extracting an initial characteristic map from the image to be processed through a base network of the image segmentation model; pooling the initial characteristic graph through an average pooling sub-model of the image segmentation model to obtain a first characteristic graph carrying short-distance dependency relationship information; processing the initial feature map through at least one branch sub-model of the image segmentation model to obtain at least one target feature map, wherein the at least one target feature map comprises a second feature map carrying global dependency relationship information and/or a third feature map carrying long-distance dependency relationship information; cascading the first feature map and the target feature map, and performing convolution processing on a cascading result to obtain an image segmentation result;
and the output module is used for outputting the image segmentation result.
9. A computer device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of any one of claims 1-7.
10. A computer-readable storage medium, having stored thereon a computer program;
the computer program, when executed by a processor, implementing the method of any one of claims 1-7.
CN202010857319.1A 2020-08-24 2020-08-24 Image segmentation method, device, equipment and storage medium Pending CN112001931A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010857319.1A CN112001931A (en) 2020-08-24 2020-08-24 Image segmentation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010857319.1A CN112001931A (en) 2020-08-24 2020-08-24 Image segmentation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112001931A true CN112001931A (en) 2020-11-27

Family

ID=73470528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010857319.1A Pending CN112001931A (en) 2020-08-24 2020-08-24 Image segmentation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112001931A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598003A (en) * 2020-12-18 2021-04-02 燕山大学 Real-time semantic segmentation method based on data expansion and full-supervision preprocessing
CN113229767A (en) * 2021-04-12 2021-08-10 佛山市顺德区美的洗涤电器制造有限公司 Method for processing image, processor, control device and household appliance
CN113326851A (en) * 2021-05-21 2021-08-31 中国科学院深圳先进技术研究院 Image feature extraction method and device, electronic equipment and storage medium
CN113689434A (en) * 2021-07-14 2021-11-23 淮阴工学院 Image semantic segmentation method based on strip pooling
CN116385814A (en) * 2023-03-07 2023-07-04 广州市妇女儿童医疗中心 Ultrasonic screening method, system, device and medium for detection target

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598003A (en) * 2020-12-18 2021-04-02 燕山大学 Real-time semantic segmentation method based on data expansion and full-supervision preprocessing
CN113229767A (en) * 2021-04-12 2021-08-10 佛山市顺德区美的洗涤电器制造有限公司 Method for processing image, processor, control device and household appliance
CN113229767B (en) * 2021-04-12 2022-08-19 佛山市顺德区美的洗涤电器制造有限公司 Method for processing image, processor, control device and household appliance
CN113326851A (en) * 2021-05-21 2021-08-31 中国科学院深圳先进技术研究院 Image feature extraction method and device, electronic equipment and storage medium
CN113326851B (en) * 2021-05-21 2023-10-27 中国科学院深圳先进技术研究院 Image feature extraction method and device, electronic equipment and storage medium
CN113689434A (en) * 2021-07-14 2021-11-23 淮阴工学院 Image semantic segmentation method based on strip pooling
CN113689434B (en) * 2021-07-14 2022-05-27 淮阴工学院 Image semantic segmentation method based on strip pooling
CN116385814A (en) * 2023-03-07 2023-07-04 广州市妇女儿童医疗中心 Ultrasonic screening method, system, device and medium for detection target
CN116385814B (en) * 2023-03-07 2023-12-05 广州市妇女儿童医疗中心 Ultrasonic screening method, system, device and medium for detection target

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN112001931A (en) Image segmentation method, device, equipment and storage medium
CN112001914A (en) Depth image completion method and device
CN112115783A (en) Human face characteristic point detection method, device and equipment based on deep knowledge migration
CN113343982B (en) Entity relation extraction method, device and equipment for multi-modal feature fusion
CN113537254B (en) Image feature extraction method and device, electronic equipment and readable storage medium
CN113554032B (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN114429637B (en) Document classification method, device, equipment and storage medium
CN113642585B (en) Image processing method, apparatus, device, storage medium, and computer program product
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN115082675A (en) Transparent object image segmentation method and system
CN113343981A (en) Visual feature enhanced character recognition method, device and equipment
Xu et al. Missing data reconstruction in VHR images based on progressive structure prediction and texture generation
CN116189162A (en) Ship plate detection and identification method and device, electronic equipment and storage medium
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN114998756A (en) Yolov 5-based remote sensing image detection method and device and storage medium
CN115272691A (en) Training method, recognition method and equipment for steel bar binding state detection model
CN114612681A (en) GCN-based multi-label image classification method, model construction method and device
CN114049491A (en) Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium
Lu et al. Multi-scale enhanced deep network for road detection
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism
CN116796287A (en) Pre-training method, device, equipment and storage medium for graphic understanding model
CN115810152A (en) Remote sensing image change detection method and device based on graph convolution and computer equipment
CN114529450B (en) Face image super-resolution method based on improved depth iteration cooperative network
CN112989919B (en) Method and system for extracting target object from image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination