CN113361537A - Image semantic segmentation method and device based on channel attention - Google Patents

Image semantic segmentation method and device based on channel attention Download PDF

Info

Publication number
CN113361537A
CN113361537A CN202110837049.2A CN202110837049A CN113361537A CN 113361537 A CN113361537 A CN 113361537A CN 202110837049 A CN202110837049 A CN 202110837049A CN 113361537 A CN113361537 A CN 113361537A
Authority
CN
China
Prior art keywords
image
input
channel
semantic segmentation
channel attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110837049.2A
Other languages
Chinese (zh)
Other versions
CN113361537B (en
Inventor
郭俊波
郭筱凤
靳国庆
马凌峰
谢洪涛
张勇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Research Institute
Konami Sports Club Co Ltd
Original Assignee
Beijing Zhongke Research Institute
People Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Research Institute, People Co Ltd filed Critical Beijing Zhongke Research Institute
Priority to CN202110837049.2A priority Critical patent/CN113361537B/en
Publication of CN113361537A publication Critical patent/CN113361537A/en
Application granted granted Critical
Publication of CN113361537B publication Critical patent/CN113361537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image semantic segmentation method and device based on channel attention, wherein the method comprises the following steps: preprocessing an original image to obtain an input image to be segmented; inputting the input image into a feature extraction network of an image semantic segmentation model, and performing feature extraction on the input image by using the feature extraction network to obtain a feature image; the system comprises a characteristic extraction network, a channel attention module, a channel filtering module and a channel filtering module, wherein the channel attention module is inserted into each bottleneck module of each stage of the characteristic extraction network and is used for performing horizontal pooling and vertical pooling on input characteristics to obtain two matrixes; using 1-dimensional convolution to process the two matrixes in parallel from the channel angle, carrying out normalization processing, and then averaging to obtain channel weight; and inputting the feature map into a decoder of the image semantic segmentation model, and processing the feature map by using the decoder to obtain a prediction segmentation map. The invention improves the performance of the encoder through a channel attention mechanism, can be conveniently transplanted to various segmentation networks, and can obviously improve the performance.

Description

Image semantic segmentation method and device based on channel attention
Technical Field
The invention relates to the technical field of computer vision, in particular to an image semantic segmentation method and device based on channel attention.
Background
Image semantic segmentation is one of basic subjects of computer vision, can be applied to many fields, and aims to provide a label for each pixel point in an image. Currently, most semantic segmentation models are based on the Full Convolution Network (FCN) framework for research. They can be divided into two parts, an encoder part and a decoder part, wherein the encoder part usually uses a pre-trained classification network to extract features, and the decoder structure is different and is used for processing the features to calculate a final prediction result graph. The existing work of semantic segmentation direction is often directed to propose updating more complex decoder structure to obtain better segmentation effect, but the improvement of the encoder is neglected. Therefore, the present invention is directed to effectively improve the precision of semantic segmentation from the viewpoint of improving the encoder.
Disclosure of Invention
In view of the above, the present invention has been made to provide a method and apparatus for semantic segmentation of images based on channel attention that overcomes or at least partially solves the above-mentioned problems.
According to one aspect of the invention, an image semantic segmentation method based on channel attention is provided, which comprises the following steps:
preprocessing an original image to obtain an input image to be segmented;
inputting the input graph into a feature extraction network of an image semantic segmentation model, and performing feature extraction on the input graph by using the feature extraction network to obtain a feature graph corresponding to the input graph; wherein, a channel attention module is inserted into each bottleneck module of each stage of the feature extraction network; for the input features input to the channel attention module, the processing procedure of the channel attention module is as follows: performing horizontal pooling and vertical pooling on the input features to obtain two matrixes; using 1-dimensional convolution to process the two matrixes in parallel from the channel angle, carrying out normalization processing, and then averaging to obtain channel weight;
and inputting the feature map into a decoder of an image semantic segmentation model, and processing the feature map by using the decoder to obtain a prediction segmentation map.
According to another aspect of the present invention, there is provided an image semantic segmentation apparatus based on channel attention, including:
the preprocessing module is used for preprocessing the original image to obtain an input image to be segmented;
the encoder is used for receiving the input graph and extracting the features of the input graph by using the feature extraction network to obtain a feature graph corresponding to the input graph; wherein, a channel attention module is inserted into each bottleneck module of each stage of the feature extraction network;
the decoder is used for receiving the characteristic diagram and processing the characteristic diagram to obtain a prediction segmentation diagram;
wherein the channel attention module is to:
performing horizontal pooling and vertical pooling on the input features to obtain two matrixes;
and (3) from the channel angle, using 1-dimensional convolution to process the two matrixes in parallel, carrying out normalization processing, and then averaging to obtain channel weight.
According to yet another aspect of the present invention, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the image semantic segmentation method based on the channel attention.
According to still another aspect of the present invention, a computer storage medium is provided, where at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform an operation corresponding to the above-mentioned channel attention-based image semantic segmentation method.
The invention applies a channel attention mechanism to an image semantic segmentation task. Compared with the existing method which mostly focuses on designing exquisite decoder modules, the method improves the performance of the encoder through a channel attention mechanism, can be conveniently transplanted to various segmentation networks, and can obviously improve the performance.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow diagram illustrating a method for semantic segmentation of images based on channel attention according to an embodiment of the present invention;
FIG. 2 is a network architecture diagram illustrating an end-to-end image semantic segmentation model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a bottleneck structure of ResNet and a channel attention module insertion position according to an embodiment of the present invention;
FIG. 4 shows a schematic structural diagram of a channel attention module of an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an image semantic segmentation apparatus based on channel attention according to an embodiment of the present invention;
FIG. 6 shows a schematic structural diagram of a computing device according to one embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
FIG. 1 is a flow chart diagram illustrating a method for semantic segmentation of an image based on channel attention according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
step S101, preprocessing an original image to obtain an input image to be segmented;
step S102, inputting an input image into a feature extraction network of an image semantic segmentation model, and performing feature extraction on the input image by using the feature extraction network to obtain a feature image corresponding to the input image; wherein, a channel attention module is inserted into each bottleneck module of each stage of the feature extraction network;
for the input features input to the channel attention module, the processing procedure of the channel attention module is as follows: performing horizontal pooling and vertical pooling on the input features to obtain two matrixes; and (3) from the channel angle, using 1-dimensional convolution to process the two matrixes in parallel, carrying out normalization processing, and then averaging to obtain channel weight.
And step S103, inputting the feature map into a decoder of the image semantic segmentation model, and processing the feature map by using the decoder to obtain a prediction segmentation map.
Fig. 2 is a schematic diagram illustrating a network structure of an end-to-end image semantic segmentation model in the embodiment of the present invention, and as shown in fig. 2, in the embodiment of the present invention, a network body framework of the image semantic segmentation model is still based on an encoder-decoder structure. Wherein the encoder body is a feature extraction network composed of a convolutional network (CNN). Optionally, the feature extraction network employs ResNet101 pre-trained on ImageNet. In one specific example, the feature extraction network is divided into 5 stages (stages), where the 0 th stage is composed of three layers of 3 × 3 convolutions, and the other four stages are respectively composed of different numbers of bottleneck structures (bottleecks). The decoder structure is derived from deplabv 3+, is composed of a cavity convolution pyramid pooling module, an upsampling module and a series of convolutions, belongs to a general structure, and is not described herein any more.
Unlike the prior art, the embodiment of the present invention proposes a novel channel attention module, which is applied in each bottleneck module, from the viewpoint of improving the encoder. Fig. 3 shows a schematic diagram of a bottleneck structure and a channel attention module insertion position of ResNet according to an embodiment of the present invention.
For the input features input to the channel attention module, the processing of the channel attention module is as follows: performing horizontal pooling and vertical pooling on input characteristics to obtain two matrixes; and (3) from the channel angle, performing parallel processing on the two matrixes by using 1-dimensional convolution, performing normalization processing, and then averaging to obtain channel weight.
Fig. 4 shows a schematic structural diagram of a channel attention module of an embodiment of the present invention. As shown in FIG. 4, assume that the input to the channel attention module is characterized by
Figure BDA0003177529770000051
Wherein C is the number of channels, W is the width, and H is the height. Performing horizontal pooling and vertical pooling on input features to obtain two matrices
Figure BDA0003177529770000052
Wherein the content of the first and second substances,
Figure BDA0003177529770000053
is the calculation of the H x W matrix in the k channelThe average number of each column is obtained,
Figure BDA0003177529770000054
the H × W matrix in the k-th channel is obtained by calculating the average of each row, where k is 1,2, …, C.
Specifically, the calculation formula of the k-th row of the two matrices is as follows:
Figure BDA0003177529770000055
Figure BDA0003177529770000056
then, from the channel perspective, two branches are processed by using 1-dimensional convolution, then normalization processing is performed by using an activation function (e.g., Sigmoid), and then averaging is performed to obtain a channel weight ω, and the value of each weight element is still between (0 and 1), where the calculation formula of the k-th element is as follows, where k is 1,2, …, C:
Figure BDA0003177529770000057
wherein, the sigma is a Sigmoid function,
Figure BDA0003177529770000058
in the form of a convolution window region, the convolution window region,
Figure BDA0003177529770000059
and
Figure BDA00031775297700000510
are convolution kernel parameters. The channel weight obtained by the method is a vector of dimension C, and each element of the vector is multiplied by the corresponding channel of the original feature.
The training process of the image semantic segmentation model is described below.
First, data is prepared, and an image semantic segmentation data set comprises two parts, namely an original image and a pixel-level fine segmentation label. Taking the public dataset cityscaps as an example, the dataset is labeled with 19 classes for training and evaluation. The resolution of each image in the data set is 2048 × 1024, the corresponding label is also represented by the image with the same size, and the pixel value of each point in the label image is equal to the category to which the corresponding position point of the original image belongs. The data set contains 5000 images with fine labels, and the number of images divided into training set, verification set and test set is 2975, 500 and 1525 respectively.
The training method performs distributed training on 4 GPU cards, and 8 images are selected each time. The original image needs to be preprocessed before being sent to the network. Specifically, the original image is subjected to random scaling processing; transforming the original image subjected to random scaling by adopting a data augmentation strategy of random rotation, Gaussian blur and random horizontal rotation; and randomly cutting out a region with a preset size in the transformed image to serve as an input image to be segmented.
In one example, the original image is first randomly scaled to between 0.5 and 2 times the original size, then a data augmentation strategy of random rotation, gaussian blur and random horizontal flipping is used, and then a block of 769 × 769 sized area is randomly cropped in the transformed image and fed into the model as an input map.
Assuming that each input image is a tensor of 3 × H × W in size, the network structure of fig. 2 is downsampled 3 times at the decoder stage, and therefore the minimum feature image length and width are one eighth of the original image. And as for the decoder part, the prediction partition map Y with the size of C multiplied by H multiplied by W is obtained through the processing of the cavity pyramid pooling module, the splicing of the prediction partition map Y and the low-level features of the stage 1 and the convolution and the upsampling. Where C is the number of classes 19. The loss function is a universal cross entropy loss, labeled as L size H × W:
Figure BDA0003177529770000061
in the formula, the output Y is warped into a matrix of HW × C, and the label L is also warped into a vector of HW dimension.
After the model is trained, in a testing stage, image blocks with preset sizes are intercepted from an original image by adopting a sliding window, and the image blocks are used as input images to be segmented and sent to the model to obtain prediction output at the same position. For example, a sliding window mode is adopted to intercept 769 × 769 image blocks on the test image and send the image blocks into the model to obtain the prediction output at the same position.
The method successfully applies the channel attention mechanism to the image semantic segmentation task. Compared with the existing method which mostly focuses on designing an exquisite decoder module, the method aims to improve the performance of the encoder through a channel attention mechanism, can be conveniently transplanted to various segmentation networks, and can obviously improve the performance. In addition, as is obvious from the structure, the increased parameter quantity of the channel attention module provided by the method is little, and only 0.02% of the parameter quantity is increased through statistics. The module proposed by the method is a plug-and-play, lightweight, efficient structure. The accuracy can reach 81.4% on the Cityscapes data set.
The method provided by the embodiment of the invention can be deployed in a computer or a server, automatically performs pixel-level segmentation on the image, and can be applied to various semantic segmentation-based fields such as automatic driving, automatic diagnosis of medical images, robot perception and the like.
Fig. 5 is a schematic structural diagram of an image semantic segmentation apparatus based on channel attention according to an embodiment of the present invention. As shown in fig. 5, the apparatus includes: a pre-processing module 501, an encoder 502 and a decoder 503.
The preprocessing module 501 is configured to preprocess an original image to obtain an input image to be segmented;
the encoder 502 is configured to receive the input graph, perform feature extraction on the input graph by using the feature extraction network, and obtain a feature graph corresponding to the input graph; wherein, a channel attention module is inserted into each bottleneck module of each stage of the feature extraction network;
the decoder 503 is configured to receive the feature map, and process the feature map to obtain a prediction segmentation map.
The channel attention module is to: performing horizontal pooling and vertical pooling on the input features to obtain two matrixes; and (3) from the channel angle, using 1-dimensional convolution to process the two matrixes in parallel, carrying out normalization processing, and then averaging to obtain channel weight.
In an alternative embodiment, the channel attention module is specifically configured to process the input features using the following formula:
Figure BDA0003177529770000071
Figure BDA0003177529770000072
wherein the input features are
Figure BDA0003177529770000073
C is the number of channels, W is the width, and H is the height; k is 1,2, …, C; thereby obtaining two matrixes
Figure BDA0003177529770000074
In an alternative embodiment, the channel attention module is specifically configured to derive the channel weights using the following formula:
Figure BDA0003177529770000075
where, σ is the activation function,
Figure BDA0003177529770000076
in the form of a convolution window region, the convolution window region,
Figure BDA0003177529770000077
and
Figure BDA0003177529770000078
are convolution kernel parameters.
In the training phase, the preprocessing module 501 is specifically configured to: carrying out random scaling processing on an original image; transforming the original image subjected to random scaling by adopting a data augmentation strategy of random rotation, Gaussian blur and random horizontal rotation; and randomly cutting out a region with a preset size in the transformed image to serve as an input image to be segmented.
In the testing phase, the preprocessing module 501 is specifically configured to: and intercepting image blocks with preset sizes on the original image by adopting a sliding window to serve as an input image to be segmented.
The embodiment of the application also provides a nonvolatile computer storage medium, wherein the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the image semantic segmentation method based on the channel attention in any method embodiment.
Fig. 6 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and a specific embodiment of the present invention does not limit a specific implementation of the computing device.
As shown in fig. 6, the computing device may include: a processor (processor), a Communications Interface (Communications Interface), a memory (memory), and a Communications bus.
Wherein: the processor, the communication interface, and the memory communicate with each other via a communication bus. A communication interface for communicating with network elements of other devices, such as clients or other servers. And the processor is used for executing a program, and particularly can execute related steps in the embodiment of the image semantic segmentation method based on the channel attention for the computing device.
In particular, the program may include program code comprising computer operating instructions.
The processor may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And the memory is used for storing programs. The memory may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program may in particular be adapted to cause a processor to perform the method for semantic segmentation of images based on channel attention in any of the method embodiments described above. For specific implementation of each step in the program, reference may be made to corresponding steps and corresponding descriptions in units in the above image semantic segmentation embodiment based on channel attention, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims (10)

1. A method for semantic segmentation of an image based on channel attention, the method comprising:
preprocessing an original image to obtain an input image to be segmented;
inputting the input graph into a feature extraction network of an image semantic segmentation model, and performing feature extraction on the input graph by using the feature extraction network to obtain a feature graph corresponding to the input graph; wherein, a channel attention module is inserted into each bottleneck module of each stage of the feature extraction network; for the input features input to the channel attention module, the processing procedure of the channel attention module is as follows: performing horizontal pooling and vertical pooling on the input features to obtain two matrixes; using 1-dimensional convolution to process the two matrixes in parallel from the channel angle, carrying out normalization processing, and then averaging to obtain channel weight;
and inputting the feature map into a decoder of an image semantic segmentation model, and processing the feature map by using the decoder to obtain a prediction segmentation map.
2. The image semantic segmentation method according to claim 1, wherein the obtaining of the two matrices by horizontally pooling and vertically pooling the input features is specifically processing the input features by using the following formula:
Figure FDA0003177529760000011
Figure FDA0003177529760000012
wherein the input features are
Figure FDA0003177529760000013
C is the number of channels, W is the width, and H is the height; k is 1,2, …, C; thereby obtaining two matrixes
Figure FDA0003177529760000014
3. The image semantic segmentation method according to claim 2, wherein the two matrices are processed in parallel by using 1-dimensional convolution from a channel angle, and the two matrices are averaged after normalization processing, and obtaining the channel weight specifically comprises obtaining the channel weight by using the following formula:
Figure FDA0003177529760000015
where, σ is the activation function,
Figure FDA0003177529760000021
in the form of a convolution window region, the convolution window region,
Figure FDA0003177529760000022
and
Figure FDA0003177529760000023
are convolution kernel parameters.
4. The image semantic segmentation method according to claim 1, wherein the preprocessing the original image to obtain the input image to be segmented further comprises:
in the training stage, the original image is subjected to random scaling treatment;
transforming the original image subjected to random scaling by adopting a data augmentation strategy of random rotation, Gaussian blur and random horizontal rotation;
and randomly cutting out a region with a preset size in the transformed image to serve as an input image to be segmented.
5. The image semantic segmentation method according to claim 1, wherein the preprocessing the original image to obtain the input image to be segmented further comprises:
in the testing stage, a sliding window is adopted to intercept image blocks with preset sizes on an original image to serve as an input image to be segmented.
6. An apparatus for semantic segmentation of images based on channel attention, the apparatus comprising:
the preprocessing module is used for preprocessing the original image to obtain an input image to be segmented;
the encoder is used for receiving the input graph and extracting the features of the input graph by using the feature extraction network to obtain a feature graph corresponding to the input graph; wherein, a channel attention module is inserted into each bottleneck module of each stage of the feature extraction network;
the decoder is used for receiving the characteristic diagram and processing the characteristic diagram to obtain a prediction segmentation diagram;
wherein the channel attention module is to:
performing horizontal pooling and vertical pooling on the input features to obtain two matrixes;
and (3) from the channel angle, using 1-dimensional convolution to process the two matrixes in parallel, carrying out normalization processing, and then averaging to obtain channel weight.
7. The image semantic segmentation device according to claim 6, wherein the channel attention module is specifically configured to process the input features by using the following formula:
Figure FDA0003177529760000024
Figure FDA0003177529760000031
wherein the input features are
Figure FDA0003177529760000032
C is the number of channels, W is the width, and H is the height; k is 1,2, …, C; thereby obtaining two matrixes
Figure FDA0003177529760000033
8. The image semantic segmentation apparatus according to claim 7, wherein the channel attention module is specifically configured to obtain a channel weight by using the following formula:
Figure FDA0003177529760000034
where, σ is the activation function,
Figure FDA0003177529760000035
in the form of a convolution window region, the convolution window region,
Figure FDA0003177529760000036
and
Figure FDA0003177529760000037
are convolution kernel parameters.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the image semantic segmentation method based on the channel attention according to any one of claims 1-5.
10. A computer storage medium having stored therein at least one executable instruction that causes a processor to perform operations corresponding to the channel attention based image semantic segmentation method according to any one of claims 1-5.
CN202110837049.2A 2021-07-23 2021-07-23 Image semantic segmentation method and device based on channel attention Active CN113361537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110837049.2A CN113361537B (en) 2021-07-23 2021-07-23 Image semantic segmentation method and device based on channel attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110837049.2A CN113361537B (en) 2021-07-23 2021-07-23 Image semantic segmentation method and device based on channel attention

Publications (2)

Publication Number Publication Date
CN113361537A true CN113361537A (en) 2021-09-07
CN113361537B CN113361537B (en) 2022-05-10

Family

ID=77540219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110837049.2A Active CN113361537B (en) 2021-07-23 2021-07-23 Image semantic segmentation method and device based on channel attention

Country Status (1)

Country Link
CN (1) CN113361537B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887489A (en) * 2021-10-21 2022-01-04 西南交通大学 Carriage crowd counting method based on position enhancement and multi-scale fusion network
CN113963009A (en) * 2021-12-22 2022-01-21 中科视语(北京)科技有限公司 Local self-attention image processing method and model based on deformable blocks

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325751A (en) * 2020-03-18 2020-06-23 重庆理工大学 CT image segmentation system based on attention convolution neural network
US20200302214A1 (en) * 2019-03-20 2020-09-24 NavInfo Europe B.V. Real-Time Scene Understanding System
CN112508960A (en) * 2020-12-21 2021-03-16 华南理工大学 Low-precision image semantic segmentation method based on improved attention mechanism
US20210089807A1 (en) * 2019-09-25 2021-03-25 Samsung Electronics Co., Ltd. System and method for boundary aware semantic segmentation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200302214A1 (en) * 2019-03-20 2020-09-24 NavInfo Europe B.V. Real-Time Scene Understanding System
US20210089807A1 (en) * 2019-09-25 2021-03-25 Samsung Electronics Co., Ltd. System and method for boundary aware semantic segmentation
CN111325751A (en) * 2020-03-18 2020-06-23 重庆理工大学 CT image segmentation system based on attention convolution neural network
CN112508960A (en) * 2020-12-21 2021-03-16 华南理工大学 Low-precision image semantic segmentation method based on improved attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QIBIN HOU等: "Strip Pooling: Rethinking Spatial Pooling for Scene Parsing", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
席一帆等: "基于改进Deeplab V3+网络的语义分割", 《计算机系统应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887489A (en) * 2021-10-21 2022-01-04 西南交通大学 Carriage crowd counting method based on position enhancement and multi-scale fusion network
CN113963009A (en) * 2021-12-22 2022-01-21 中科视语(北京)科技有限公司 Local self-attention image processing method and model based on deformable blocks
CN113963009B (en) * 2021-12-22 2022-03-18 中科视语(北京)科技有限公司 Local self-attention image processing method and system based on deformable block

Also Published As

Publication number Publication date
CN113361537B (en) 2022-05-10

Similar Documents

Publication Publication Date Title
US20210350168A1 (en) Image segmentation method and image processing apparatus
Bulat et al. To learn image super-resolution, use a gan to learn how to do image degradation first
Shen et al. Deep semantic face deblurring
EP3923233A1 (en) Image denoising method and apparatus
CN111080660B (en) Image segmentation method, device, terminal equipment and storage medium
CN112446383B (en) License plate recognition method and device, storage medium and terminal
CN113361537B (en) Image semantic segmentation method and device based on channel attention
US20210352212A1 (en) Video image processing method and apparatus
CN109784372B (en) Target classification method based on convolutional neural network
DE112020004625T5 (en) TRANSPOSED CONVOLUTION WITH SYSTOLIC ARRAY
CN107730514B (en) Scene segmentation network training method and device, computing equipment and storage medium
Shen et al. Exploiting semantics for face image deblurring
CN110781923A (en) Feature extraction method and device
CN110245621B (en) Face recognition device, image processing method, feature extraction model, and storage medium
CN111160114B (en) Gesture recognition method, gesture recognition device, gesture recognition equipment and computer-readable storage medium
US20210034900A1 (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
CN111967478B (en) Feature map reconstruction method, system, storage medium and terminal based on weight overturn
CN117252890A (en) Carotid plaque segmentation method, device, equipment and medium
CN111027670B (en) Feature map processing method and device, electronic equipment and storage medium
DE112020006070T5 (en) HARDWARE ACCELERATOR WITH RECONFIGURABLE INSTRUCTION SET
CN113688783B (en) Face feature extraction method, low-resolution face recognition method and equipment
CN113496228B (en) Human body semantic segmentation method based on Res2Net, transUNet and cooperative attention
CN113408528B (en) Quality recognition method and device for commodity image, computing equipment and storage medium
CN112950638B (en) Image segmentation method, device, electronic equipment and computer readable storage medium
CN112288748B (en) Semantic segmentation network training and image semantic segmentation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant