CN116030260A - Surgical whole-scene semantic segmentation method based on long-strip convolution attention - Google Patents
Surgical whole-scene semantic segmentation method based on long-strip convolution attention Download PDFInfo
- Publication number
- CN116030260A CN116030260A CN202310304276.8A CN202310304276A CN116030260A CN 116030260 A CN116030260 A CN 116030260A CN 202310304276 A CN202310304276 A CN 202310304276A CN 116030260 A CN116030260 A CN 116030260A
- Authority
- CN
- China
- Prior art keywords
- convolution
- representing
- boundary
- segmentation
- surgical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 96
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000010586 diagram Methods 0.000 claims abstract description 14
- 238000005070 sampling Methods 0.000 claims abstract description 7
- 238000002674 endoscopic surgery Methods 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 21
- 238000010606 normalization Methods 0.000 claims description 9
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 14
- 238000001356 surgical procedure Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 235000016816 Pisum sativum subsp sativum Nutrition 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 244000088681 endo Species 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000002324 minimally invasive surgery Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a surgical whole-scene semantic segmentation method based on long-strip convolution attention, which comprises the following steps: acquiring image data of an endoscopic surgery video and a truth value label corresponding to the image data; constructing a full surgical field Jing Yuyi segmentation model; encoding the image data; the coding result passes through a long strip convolution attention module to output a characteristic diagram; performing up-sampling operation and splicing operation on the feature graphs corresponding to the coding results of each stage to obtain a segmentation result; convolving the feature map with the largest size to obtain a boundary map; setting a boundary guiding dividing head, and obtaining a target boundary diagram by the truth value label through the boundary guiding dividing head; calculating boundary loss according to the boundary diagram and the target boundary diagram; calculating segmentation loss according to the segmentation result and the truth value label; combining the boundary loss and the segmentation loss to construct a mixed loss function; the surgical whole-field Jing Yuyi segmentation model is optimized by a hybrid loss function. The method meets the precision requirement of the surgical scene segmentation on the regional boundary.
Description
Technical Field
The invention relates to the technical field of surgical scene segmentation, in particular to a surgical whole-scene semantic segmentation method based on long-strip convolution attention.
Background
The intelligent endoscopic surgical robot is a typical application of robot-assisted minimally invasive surgery, and can effectively improve the success rate of surgery, shorten the recovery period of surgery and improve the safety of patients. Automated surgical scene segmentation is a key technology for computer-assisted surgery and intelligent surgical robots. Its task is to segment the anatomical region and the medical device objects in the surgical scene and assign a class label to each pixel. The segmentation results may be used for a number of clinical tasks such as lesion tissue localization, surgical decision making, surgical navigation, and surgical skill assessment, among others.
Accurate interpretation of the entire scene in an endoscopic surgical video is a very challenging task. Compared with the traditional natural scene, the contrast of local features of the segmented target in the surgical scene is lower, and the feature similarity of the segmented target in the local region of different biological tissues or instruments is higher. Most of the work adopts an attention mechanism to combine the local semantic features of the target with the global features of the target, and captures long-range dependence to solve the related problems. DANet can adaptively combine local features with its global dependencies by using both positional and spatial attentions in parallel, but the computational complexity of positional attentions is high and feature modeling is based only on the information of the current feature map, which remains relatively limited for the segmentation surgery scenario. The Transformer based model associates richer cues with each pixel by self-attention, however, associating more views by self-attention will result in an increase in the secondary computational complexity relative to the number of pixel embeddings, and the self-attention mechanism only considers the adaptability of the spatial dimension but ignores the adaptability of the channel dimension, which is also important to the visual task.
Another non-negligible challenge is the precision requirements of the surgical scene segmentation on the regional edges. The clinician must pay attention to the cut boundaries and control errors at all times while performing the procedure, which requires that the network model be able to accurately resolve the tissue boundaries. GSCNN proposes a dual stream CNN architecture for semantic segmentation that links shape information into a single processing branch that can produce clearer predictions around object boundaries. The model may improve the performance of the baseline, but the multi-path branching operation increases the computational complexity of the network.
Disclosure of Invention
Based on the above, it is necessary to provide a surgical whole-scene semantic segmentation method based on long convolution attention, aiming at the existing problems.
The invention provides a surgical whole-scene semantic segmentation method based on long-strip convolution attention, which comprises the following steps:
s1: acquiring image data of an endoscopic surgery video and a truth value label corresponding to the image data; constructing a full surgical field Jing Yuyi segmentation model; the surgical whole-scene semantic segmentation model comprises an encoder, a strip convolution attention module and a segmentation module;
s2: the encoder encodes the image data and outputs an encoding result; the coding result comprises coding results of different stages;
s3: the coding result passes through the strip convolution attention module to output a feature map; the feature map comprises feature maps corresponding to the coding results of all stages;
s4: the segmentation module performs up-sampling operation and splicing operation on the feature graphs corresponding to the coding results of each stage to obtain segmentation results;
s5: convolving the feature map with the largest size to obtain a boundary map; setting a boundary guiding dividing head, and obtaining a target boundary diagram by the truth value label through the boundary guiding dividing head;
s6: calculating boundary loss according to the boundary map and the target boundary map; calculating segmentation loss according to the segmentation result and the truth value label; combining the boundary loss and the segmentation loss to construct a mixed loss function; optimizing the surgical whole-field Jing Yuyi segmentation model by the hybrid loss function;
s7: and inputting the image data to be segmented into the optimized full surgical field Jing Yuyi segmentation model, and outputting a final segmentation result.
Preferably, in S2, HRNetV2 is used as the encoder; the sizes of the encoding results of the stages outputted are different.
Preferably, in S3, the elongated convolution attention module includes a region feature extraction block and an instrument feature extraction block; based on the coding results of each stage, the regional characteristic extraction block and the instrument characteristic extraction block are extracted in parallel to obtain regional characteristics and instrument characteristics; and adding the coding results of each stage with the corresponding regional characteristics and the corresponding instrument characteristics to obtain a characteristic diagram corresponding to the coding results of each stage.
Preferably, the region feature extraction block includes a depth convolution, a first multi-branch depth banded convolution, a1×1 convolution; the extraction process of the regional characteristics comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
the first multi-branch depth banded convolution, the 1 x 1 convolution, obtains a first attention map based on the local information, the first attention map being noted as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
multiplying the first attention map with the local information to obtain the region feature; the calculation formula is as follows:
wherein ,representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />Represent the firstiRegional characteristics of the stage;k j representing the convolution kernel size; />Representing element-by-element multiplication.
Preferably, the instrument feature extraction block comprises a depth convolution, a second multi-branch depth banded convolution, a1×1 convolution; the process for extracting the instrument features comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
the second multi-branch depth banded convolution, the 1 x 1 convolution, obtains a second attention map based on the local information, the second attention map being noted as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows: />
Multiplying the second attention map with the local information to obtain the instrument feature; the calculation formula is as follows:
wherein ,representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />Represent the firstiInstrument characteristics of the stage;k j representing the convolution kernel size; />Representing element-by-element multiplication.
Preferably, in S4, the segmentation module screens out the feature map with the largest size, and upsamples the sizes of the rest feature maps to the same size as the feature map with the largest size; and splicing the up-sampled feature images, and up-sampling the splicing result by a set multiple to obtain a segmentation result.
Preferably, in S5, the calculation formula of the boundary map is:
wherein ,bm x representing a boundary map;UP ×4 representing a 4-fold upsampling operation;W 1×1 representing a1 x 1 convolution;W bg a convolution operation representing a combination of a Relu activation function and a batch normalization;a feature map having a maximum size;
the boundary guiding dividing head comprises three branches; each branch extracts the boundary feature of the truth label by applying the convolution network with the Laplace kernel, and each branchStep length is inconsistent; splicing boundary features extracted by each branch to obtain boundary information, and setting a fixed contour threshold to refine the boundary information to obtain a target boundary map; the target boundary map is recorded as:bm gt 。
preferably, in S6, the calculation formula of the boundary loss is:
the calculation formula of the segmentation result is as follows:
the calculation formula of the segmentation loss is as follows:
wherein ,L bg representing boundary loss;L dice () Representing a dice loss function;bm x representing a boundary map;bm gt representing a target boundary map;L ce () Representing a cross entropy loss function;αandβis a constant;L seg representing segmentation loss;x seg representing the segmentation result;gtrepresenting a truth value tag;UP ×4 representing a 4-fold upsampling operation;x maff and representing the splicing result.
Preferably, in S6, the calculation formula of the mixing loss function is:
wherein ,L joint representing a mixing loss function;L seg representing segmentation loss;L bg representing boundary loss;γandεis constant.
Preferably, the set multiple is 4 times.
The beneficial effects are that: the method provided by the invention overcomes the problem of low local feature contrast of the segmented target in the surgical scene through the constructed surgical whole scene semantic segmentation model, and also meets the precision requirement of the surgical scene segmentation on the region boundary.
Drawings
Exemplary embodiments of the present invention may be more fully understood by reference to the following drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the application, and not constitute a limitation of the invention. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a flow chart of a method provided according to an exemplary embodiment of the present application.
Fig. 2 is a schematic structural diagram of a full surgical field Jing Yuyi segmentation model provided according to an exemplary embodiment of the present application.
Fig. 3 is a schematic diagram of a long strip convolution attention module provided according to an exemplary embodiment of the present application.
Fig. 4 is a schematic diagram of a segmentation effect provided according to an exemplary embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
In addition, the terms "first" and "second" etc. are used to distinguish different objects and are not used to describe a particular order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
The embodiment of the application provides a surgical whole-scene semantic segmentation method based on long-strip convolution attention, and the method is described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, which are flowcharts illustrating a surgical whole-scene semantic segmentation method based on long convolution attention according to some embodiments of the present application, as shown in the drawings, the method may include the following steps:
s1: acquiring image data of an endoscopic surgery video and a truth value label corresponding to the image data; constructing a full surgical field Jing Yuyi segmentation model; the surgical whole-scene semantic segmentation model comprises an encoder, a strip convolution attention module (LSKA) and a segmentation module;
s2: the encoder encodes the image data and outputs an encoding result; the coding result comprises coding results of different stages;
in this embodiment, the encoder employs HRNetV2; the sizes of the encoding results of the stages outputted are different.
S3: the coding result passes through the strip convolution attention module to output a feature map; the feature map comprises feature maps corresponding to the coding results of all stages;
specifically, as shown in fig. 3, the elongated convolution attention module includes a region feature extraction block (an-block) and an instrument feature extraction block (Ins-block); based on the coding results of each stage, the regional characteristic extraction block and the instrument characteristic extraction block are extracted in parallel to obtain regional characteristics and instrument characteristics; and adding the coding results of each stage with the corresponding regional characteristics and the corresponding instrument characteristics to obtain a characteristic diagram corresponding to the coding results of each stage.
The regional feature extraction block comprises a depth convolution, a first multi-branch depth banded convolution and a1 multiplied by 1 convolution; the extraction process of the regional characteristics comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
the first multi-branch depth banded convolution, the 1 x 1 convolution, obtains a first attention map based on the local information, the first attention map being noted as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
multiplying the first attention map with the local information to obtain the region feature; the calculation formula is as follows:
wherein ,representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe result of the encoding of the phase is that,i∈{0,1,2,3};W 1×1 representing a1 x 1 convolution;jrepresent the firstjThe number of branches is chosen such that,j∈{0,1,2};represent the firstiRegional characteristics of the stage;k j the size of the convolution kernel is indicated,k j may be set to 11 or 21 or 31; />Representing element-by-element multiplication.
The instrument feature extraction block comprises a depth convolution, a second multi-branch depth banded convolution and a1 multiplied by 1 convolution; the process for extracting the instrument features comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
the second multi-branch depth banded convolution, the 1 x 1 convolution, obtains a second attention map based on the local information, the second attention map being noted as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
multiplying the second attention map with the local information to obtain the instrument feature; the calculation formula is as follows:
wherein ,representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />Represent the firstiInstrument characteristics of the stage;k j representing the convolution kernel size; />Representing element-by-element multiplication.
S4: the segmentation module performs up-sampling operation and splicing operation on the feature graphs corresponding to the coding results of each stage to obtain segmentation results;
specifically, the segmentation module screens out the feature map with the largest size, and upsamples the sizes of the rest feature maps to the same size as the feature map with the largest size; and splicing the up-sampled feature images, and up-sampling the splicing result by a set multiple to obtain a segmentation result.
In this embodiment, the set multiple is 4 times.
S5: convolving the feature map with the largest size to obtain a boundary map; setting a boundary guiding dividing head, and obtaining a target boundary diagram by the truth value label through the boundary guiding dividing head;
specifically, the calculation formula of the boundary map is:
wherein ,bm x a boundary map (a boundary map of a feature map of the largest size, i.e., a boundary map of the highest resolution branch);UP ×4 representing a 4-fold upsampling operation;W 1×1 representing a1 x 1 convolution;W bg a convolution operation representing a combination of a Relu activation function and a batch normalization;a feature map having a maximum size;
the boundary guiding dividing head comprises three branches; extracting boundary characteristics of a truth value tag by each branch through applying a convolution network with a Laplace kernel, wherein the step sizes of the branches are inconsistent; splicing boundary features extracted by each branch to obtain boundary information, and setting a fixed contour threshold to refine the boundary information to obtain a target boundary map; the target boundary map is recorded as:bm gt 。
in this embodiment, three branches are designed for the boundary guiding dividing head and different step sizes are set to obtain multi-size information; feature maps of different sizes are up-sampled to the same resolution (same size), and dynamically re-weighted by a stitching operation to obtain richer boundary information.
S6: calculating boundary loss according to the boundary map and the target boundary map; calculating segmentation loss according to the segmentation result and the truth value label; combining the boundary loss and the segmentation loss to construct a mixed loss function; optimizing the surgical whole-field Jing Yuyi segmentation model by the hybrid loss function;
specifically, the calculation formula of the boundary loss is:
the calculation formula of the segmentation result is as follows:
the calculation formula of the segmentation loss is as follows:
wherein ,L bg representing boundary loss;L dice () Representing a dice loss function;bm x representing a boundary map;bm gt representing a target boundary map;L ce () Representing a cross entropy loss function;αandβis constant, in the present embodimentαAndβdefault to 1;L seg representing segmentation loss;x seg representing the segmentation result;gtrepresenting a truth value tag;UP ×4 representing a 4-fold upsampling operation;x maff and representing the splicing result.
The calculation formula of the mixing loss function is as follows:
wherein ,L joint representing a mixing loss function;L seg representing segmentation loss;L bg representing boundary loss;γandεis constant, in the present embodimentγAndεdefault to 1.
S7: and inputting the image data to be segmented into the optimized full surgical field Jing Yuyi segmentation model, and outputting a final segmentation result.
As shown in fig. 4, fig. 4 is a schematic diagram of a segmentation effect, in which (a) is the 151 st picture of the 1 st test sequence in the endos 2018 dataset, (a 1) is the truth label of (a), and (a 2) is the segmentation result of (a) by the method provided in this embodiment; (b) A 150 th picture of the 2 nd test sequence in the Endovis2018 dataset, (b 1) a truth label of (b), and (b 2) a segmentation result of (b) by this method provided in this example; (c) 153 th picture of the 3 rd test sequence in the Endovis2018 dataset, (c 1) is the truth label of (c), and (c 2) is the segmentation result of (c) by this method provided in this example. As can be seen from the graph, the segmentation result of the method provided by the embodiment is very similar to the truth label, so that the method provided by the embodiment can be demonstrated to have a better segmentation effect.
The method provided by the embodiment can be used for adaptively learning the regional characteristics and the instrument characteristics through the long-strip convolution attention module. The long-strip convolution attention module extracts the block-shaped characteristics of an operation area and the strip-shaped characteristics of an operation instrument respectively by using cascade and addition of deep strip convolution in parallel, and establishes the connection of pixel long distances through convolution kernels with the maximum size of 31 x 31, so that the receptive field of a network is increased, and the false recognition caused by the similarity of the area characteristics is reduced; and a boundary segmentation head is designed as depth supervision, so that the model is guided to learn boundary characteristics, and the capability of the model for distinguishing the operation boundary is improved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the embodiments, and are intended to be included within the scope of the claims and description.
Claims (10)
1. The surgical whole-scene semantic segmentation method based on the long-strip convolution attention is characterized by comprising the following steps of:
s1: acquiring image data of an endoscopic surgery video and a truth value label corresponding to the image data; constructing a full surgical field Jing Yuyi segmentation model; the surgical whole-scene semantic segmentation model comprises an encoder, a strip convolution attention module and a segmentation module;
s2: the encoder encodes the image data and outputs an encoding result; the coding result comprises coding results of different stages;
s3: the coding result passes through the strip convolution attention module to output a feature map; the feature map comprises feature maps corresponding to the coding results of all stages;
s4: the segmentation module performs up-sampling operation and splicing operation on the feature graphs corresponding to the coding results of each stage to obtain segmentation results;
s5: convolving the feature map with the largest size to obtain a boundary map; setting a boundary guiding dividing head, and obtaining a target boundary diagram by the truth value label through the boundary guiding dividing head;
s6: calculating boundary loss according to the boundary map and the target boundary map; calculating segmentation loss according to the segmentation result and the truth value label; combining the boundary loss and the segmentation loss to construct a mixed loss function; optimizing the surgical whole-field Jing Yuyi segmentation model by the hybrid loss function;
s7: and inputting the image data to be segmented into the optimized full surgical field Jing Yuyi segmentation model, and outputting a final segmentation result.
2. The surgical whole-scene semantic segmentation method based on long-strip convolution attention according to claim 1, wherein in S2, the encoder adopts HRNetV2; the sizes of the encoding results of the stages outputted are different.
3. The surgical whole-scene semantic segmentation method based on long-strip convolution attention according to claim 2, wherein in S3, the long-strip convolution attention module comprises a region feature extraction block and an instrument feature extraction block; based on the coding results of each stage, the regional characteristic extraction block and the instrument characteristic extraction block are extracted in parallel to obtain regional characteristics and instrument characteristics; and adding the coding results of each stage with the corresponding regional characteristics and the corresponding instrument characteristics to obtain a characteristic diagram corresponding to the coding results of each stage.
4. The long-strip convolution attention-based surgical whole-scene semantic segmentation method according to claim 3, wherein the region feature extraction block comprises a depth convolution, a first multi-branch depth banded convolution, a1 x 1 convolution; the extraction process of the regional characteristics comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
the first multi-branch depth banded convolution, the 1 x 1 convolution, obtains a first attention map based on the local information, the first attention map being noted as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
multiplying the first attention map with the local information to obtain the region feature; the calculation formula is as follows:
wherein ,representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />Represent the firstiRegional characteristics of the stage;k j representing the convolution kernel size; />Representing element-by-element multiplication.
5. The long-strip convolution attention-based surgical whole-scene semantic segmentation method according to claim 4, wherein the instrument feature extraction block comprises a depth convolution, a second multi-branch depth banded convolution, a1 x 1 convolution; the process for extracting the instrument features comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
the second multi-branch depth banded convolution, the 1 x 1 convolution, obtains a second attention map based on the local information, the second attention map being noted as:the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
multiplying the second attention map with the local information to obtain the instrument feature; the calculation formula is as follows:
wherein ,representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />Represent the firstiInstrument characteristics of the stage;k j representing the convolution kernel size; />Representing element-by-element multiplication.
6. The method for semantic segmentation of a surgical whole scene based on long-strip convolution attention according to claim 5, wherein in S4, the segmentation module screens out feature images with the largest size and upsamples the sizes of the rest feature images to the same size as the feature images with the largest size; and splicing the up-sampled feature images, and up-sampling the splicing result by a set multiple to obtain a segmentation result.
7. The surgical whole-scene semantic segmentation method based on long-strip convolution attention as set forth in claim 6, wherein in S5, a calculation formula of the boundary map is:
wherein ,bm x representing a boundary map;UP ×4 representing a 4-fold upsampling operation;W 1×1 representing a1 x 1 convolution;W bg a convolution operation representing a combination of a Relu activation function and a batch normalization;a feature map having a maximum size;
the boundary guiding dividing head comprises three branches; extracting boundary characteristics of a truth value tag by each branch through applying a convolution network with a Laplace kernel, wherein the step sizes of the branches are inconsistent; splicing boundary features extracted by each branch to obtain boundary information, and setting a fixed contour threshold to refine the boundary information to obtain a target boundary map; the target boundary map is recorded as:bm gt 。
8. the surgical whole-scene semantic segmentation method based on long-strip convolution attention according to claim 7, wherein in S6, a calculation formula of boundary loss is:
the calculation formula of the segmentation result is as follows:
the calculation formula of the segmentation loss is as follows:
wherein ,L bg representing boundary loss;L dice () Representing a dice loss function;bm x representing a boundary map;bm gt representing a target boundary map;L ce () Representing a cross entropy loss function;αandβis a constant;L seg representing segmentation loss;x seg representing the segmentation result;gtrepresenting a truth value tag;UP ×4 representing a 4-fold upsampling operation;x maff and representing the splicing result.
9. The surgical whole-scene semantic segmentation method based on the long-strip convolution attention according to claim 8, wherein in S6, a calculation formula of the mixing loss function is:
wherein ,L joint representing a mixing loss function;L seg representing segmentation loss;L bg representing boundary loss;γandεis constant.
10. The method for semantic segmentation of surgical whole-scene based on long-strip convolution attention according to claim 6, wherein the set multiple is 4 times.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310304276.8A CN116030260B (en) | 2023-03-27 | 2023-03-27 | Surgical whole-scene semantic segmentation method based on long-strip convolution attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310304276.8A CN116030260B (en) | 2023-03-27 | 2023-03-27 | Surgical whole-scene semantic segmentation method based on long-strip convolution attention |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116030260A true CN116030260A (en) | 2023-04-28 |
CN116030260B CN116030260B (en) | 2023-08-01 |
Family
ID=86077847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310304276.8A Active CN116030260B (en) | 2023-03-27 | 2023-03-27 | Surgical whole-scene semantic segmentation method based on long-strip convolution attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116030260B (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3171297A1 (en) * | 2015-11-18 | 2017-05-24 | CentraleSupélec | Joint boundary detection image segmentation and object recognition using deep learning |
WO2018111940A1 (en) * | 2016-12-12 | 2018-06-21 | Danny Ziyi Chen | Segmenting ultrasound images |
CN111833273A (en) * | 2020-07-17 | 2020-10-27 | 华东师范大学 | Semantic boundary enhancement method based on long-distance dependence |
WO2021030629A1 (en) * | 2019-08-14 | 2021-02-18 | Genentech, Inc. | Three dimensional object segmentation of medical images localized with object detection |
CN112634279A (en) * | 2020-12-02 | 2021-04-09 | 四川大学华西医院 | Medical image semantic segmentation method based on attention Unet model |
WO2021104056A1 (en) * | 2019-11-27 | 2021-06-03 | 中国科学院深圳先进技术研究院 | Automatic tumor segmentation system and method, and electronic device |
CN113807355A (en) * | 2021-07-29 | 2021-12-17 | 北京工商大学 | Image semantic segmentation method based on coding and decoding structure |
CN114359102A (en) * | 2022-01-10 | 2022-04-15 | 天津大学 | Image depth restoration evidence obtaining method based on attention mechanism and edge guide |
CN114565628A (en) * | 2022-03-23 | 2022-05-31 | 中南大学 | Image segmentation method and system based on boundary perception attention |
CN114723669A (en) * | 2022-03-08 | 2022-07-08 | 同济大学 | Liver tumor two-point five-dimensional deep learning segmentation algorithm based on context information perception |
CN114972756A (en) * | 2022-05-30 | 2022-08-30 | 湖南大学 | Semantic segmentation method and device for medical image |
CN114998373A (en) * | 2022-06-15 | 2022-09-02 | 南京信息工程大学 | Improved U-Net cloud picture segmentation method based on multi-scale loss function |
CN115035295A (en) * | 2022-06-15 | 2022-09-09 | 湖北工业大学 | Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function |
US20220309674A1 (en) * | 2021-03-26 | 2022-09-29 | Nanjing University Of Posts And Telecommunications | Medical image segmentation method based on u-net |
CN115601549A (en) * | 2022-12-07 | 2023-01-13 | 山东锋士信息技术有限公司(Cn) | River and lake remote sensing image segmentation method based on deformable convolution and self-attention model |
CN115661462A (en) * | 2022-11-14 | 2023-01-31 | 郑州大学 | Medical image segmentation method based on convolution and deformable self-attention mechanism |
-
2023
- 2023-03-27 CN CN202310304276.8A patent/CN116030260B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3171297A1 (en) * | 2015-11-18 | 2017-05-24 | CentraleSupélec | Joint boundary detection image segmentation and object recognition using deep learning |
WO2018111940A1 (en) * | 2016-12-12 | 2018-06-21 | Danny Ziyi Chen | Segmenting ultrasound images |
WO2021030629A1 (en) * | 2019-08-14 | 2021-02-18 | Genentech, Inc. | Three dimensional object segmentation of medical images localized with object detection |
WO2021104056A1 (en) * | 2019-11-27 | 2021-06-03 | 中国科学院深圳先进技术研究院 | Automatic tumor segmentation system and method, and electronic device |
CN111833273A (en) * | 2020-07-17 | 2020-10-27 | 华东师范大学 | Semantic boundary enhancement method based on long-distance dependence |
CN112634279A (en) * | 2020-12-02 | 2021-04-09 | 四川大学华西医院 | Medical image semantic segmentation method based on attention Unet model |
US20220309674A1 (en) * | 2021-03-26 | 2022-09-29 | Nanjing University Of Posts And Telecommunications | Medical image segmentation method based on u-net |
CN113807355A (en) * | 2021-07-29 | 2021-12-17 | 北京工商大学 | Image semantic segmentation method based on coding and decoding structure |
CN114359102A (en) * | 2022-01-10 | 2022-04-15 | 天津大学 | Image depth restoration evidence obtaining method based on attention mechanism and edge guide |
CN114723669A (en) * | 2022-03-08 | 2022-07-08 | 同济大学 | Liver tumor two-point five-dimensional deep learning segmentation algorithm based on context information perception |
CN114565628A (en) * | 2022-03-23 | 2022-05-31 | 中南大学 | Image segmentation method and system based on boundary perception attention |
CN114972756A (en) * | 2022-05-30 | 2022-08-30 | 湖南大学 | Semantic segmentation method and device for medical image |
CN114998373A (en) * | 2022-06-15 | 2022-09-02 | 南京信息工程大学 | Improved U-Net cloud picture segmentation method based on multi-scale loss function |
CN115035295A (en) * | 2022-06-15 | 2022-09-09 | 湖北工业大学 | Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function |
CN115661462A (en) * | 2022-11-14 | 2023-01-31 | 郑州大学 | Medical image segmentation method based on convolution and deformable self-attention mechanism |
CN115601549A (en) * | 2022-12-07 | 2023-01-13 | 山东锋士信息技术有限公司(Cn) | River and lake remote sensing image segmentation method based on deformable convolution and self-attention model |
Non-Patent Citations (5)
Title |
---|
HAO DU: "SwinPA-Net: Swin Transformer-Based Multiscale Feature Pyramid Aggregation Network for Medical Image Segmentation", IEEE * |
MENG-HAO GUO: "SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation", ARXIV * |
严广宇;刘正熙;: "基于混合注意力的实时语义分割算法", 现代计算机, no. 10 * |
完美屁桃: "论文阅读-SegNeXt: 重新思考基于卷积注意力的语义分割", Retrieved from the Internet <URL:https://blog.csdn.net/qq_43687860/article/details/129122842> * |
狗熊会: "医学图像分割综述", Retrieved from the Internet <URL:https://roll.sohu.com/a/533482881_455817> * |
Also Published As
Publication number | Publication date |
---|---|
CN116030260B (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shvets et al. | Automatic instrument segmentation in robot-assisted surgery using deep learning | |
Münzer et al. | Content-based processing and analysis of endoscopic images and videos: A survey | |
US11699236B2 (en) | Systems and methods for the segmentation of multi-modal image data | |
Dorent et al. | CrossMoDA 2021 challenge: Benchmark of cross-modality domain adaptation techniques for vestibular schwannoma and cochlea segmentation | |
Reiter et al. | Appearance learning for 3D tracking of robotic surgical tools | |
US10366488B2 (en) | Image processing used to estimate abnormalities | |
Xu et al. | Class-incremental domain adaptation with smoothing and calibration for surgical report generation | |
CN112184579B (en) | Tissue lesion area image auxiliary restoration system and method | |
CN116563252A (en) | Esophageal early cancer lesion segmentation method based on attention double-branch feature fusion | |
JP2021533451A (en) | Systems and methods for automatic detection of visual objects in medical images | |
WO2013016113A1 (en) | Tool tracking during surgical procedures | |
Song et al. | An efficient deep learning based coarse-to-fine cephalometric landmark detection method | |
Oliva Maza et al. | An ORB-SLAM3-based approach for surgical navigation in ureteroscopy | |
Shi et al. | Attention gate based dual-pathway network for vertebra segmentation of X-ray spine images | |
CN116030260B (en) | Surgical whole-scene semantic segmentation method based on long-strip convolution attention | |
CN113813053A (en) | Operation process analysis method based on laparoscope endoscopic image | |
Lin et al. | CSwinDoubleU-Net: A double U-shaped network combined with convolution and Swin Transformer for colorectal polyp segmentation | |
Ali et al. | Towards robotic knee arthroscopy: multi-scale network for tissue-tool segmentation | |
CN116188486A (en) | Video segmentation method and system for laparoscopic liver operation | |
Hofman et al. | First‐in‐human real‐time AI‐assisted instrument deocclusion during augmented reality robotic surgery | |
CN115311317A (en) | Laparoscope image segmentation method and system based on ScaleFormer algorithm | |
Rueckert et al. | Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art | |
CN115049709A (en) | Deep learning point cloud lumbar registration method for spinal minimally invasive surgery navigation | |
Liu et al. | LGI Net: Enhancing local-global information interaction for medical image segmentation | |
US10299864B1 (en) | Co-localization of multiple internal organs based on images obtained during surgery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |