CN116030260A - Surgical whole-scene semantic segmentation method based on long-strip convolution attention - Google Patents

Surgical whole-scene semantic segmentation method based on long-strip convolution attention Download PDF

Info

Publication number
CN116030260A
CN116030260A CN202310304276.8A CN202310304276A CN116030260A CN 116030260 A CN116030260 A CN 116030260A CN 202310304276 A CN202310304276 A CN 202310304276A CN 116030260 A CN116030260 A CN 116030260A
Authority
CN
China
Prior art keywords
convolution
representing
boundary
segmentation
surgical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310304276.8A
Other languages
Chinese (zh)
Other versions
CN116030260B (en
Inventor
刘敏
朱悦豪
汪嘉正
张哲�
王耀南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202310304276.8A priority Critical patent/CN116030260B/en
Publication of CN116030260A publication Critical patent/CN116030260A/en
Application granted granted Critical
Publication of CN116030260B publication Critical patent/CN116030260B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a surgical whole-scene semantic segmentation method based on long-strip convolution attention, which comprises the following steps: acquiring image data of an endoscopic surgery video and a truth value label corresponding to the image data; constructing a full surgical field Jing Yuyi segmentation model; encoding the image data; the coding result passes through a long strip convolution attention module to output a characteristic diagram; performing up-sampling operation and splicing operation on the feature graphs corresponding to the coding results of each stage to obtain a segmentation result; convolving the feature map with the largest size to obtain a boundary map; setting a boundary guiding dividing head, and obtaining a target boundary diagram by the truth value label through the boundary guiding dividing head; calculating boundary loss according to the boundary diagram and the target boundary diagram; calculating segmentation loss according to the segmentation result and the truth value label; combining the boundary loss and the segmentation loss to construct a mixed loss function; the surgical whole-field Jing Yuyi segmentation model is optimized by a hybrid loss function. The method meets the precision requirement of the surgical scene segmentation on the regional boundary.

Description

Surgical whole-scene semantic segmentation method based on long-strip convolution attention
Technical Field
The invention relates to the technical field of surgical scene segmentation, in particular to a surgical whole-scene semantic segmentation method based on long-strip convolution attention.
Background
The intelligent endoscopic surgical robot is a typical application of robot-assisted minimally invasive surgery, and can effectively improve the success rate of surgery, shorten the recovery period of surgery and improve the safety of patients. Automated surgical scene segmentation is a key technology for computer-assisted surgery and intelligent surgical robots. Its task is to segment the anatomical region and the medical device objects in the surgical scene and assign a class label to each pixel. The segmentation results may be used for a number of clinical tasks such as lesion tissue localization, surgical decision making, surgical navigation, and surgical skill assessment, among others.
Accurate interpretation of the entire scene in an endoscopic surgical video is a very challenging task. Compared with the traditional natural scene, the contrast of local features of the segmented target in the surgical scene is lower, and the feature similarity of the segmented target in the local region of different biological tissues or instruments is higher. Most of the work adopts an attention mechanism to combine the local semantic features of the target with the global features of the target, and captures long-range dependence to solve the related problems. DANet can adaptively combine local features with its global dependencies by using both positional and spatial attentions in parallel, but the computational complexity of positional attentions is high and feature modeling is based only on the information of the current feature map, which remains relatively limited for the segmentation surgery scenario. The Transformer based model associates richer cues with each pixel by self-attention, however, associating more views by self-attention will result in an increase in the secondary computational complexity relative to the number of pixel embeddings, and the self-attention mechanism only considers the adaptability of the spatial dimension but ignores the adaptability of the channel dimension, which is also important to the visual task.
Another non-negligible challenge is the precision requirements of the surgical scene segmentation on the regional edges. The clinician must pay attention to the cut boundaries and control errors at all times while performing the procedure, which requires that the network model be able to accurately resolve the tissue boundaries. GSCNN proposes a dual stream CNN architecture for semantic segmentation that links shape information into a single processing branch that can produce clearer predictions around object boundaries. The model may improve the performance of the baseline, but the multi-path branching operation increases the computational complexity of the network.
Disclosure of Invention
Based on the above, it is necessary to provide a surgical whole-scene semantic segmentation method based on long convolution attention, aiming at the existing problems.
The invention provides a surgical whole-scene semantic segmentation method based on long-strip convolution attention, which comprises the following steps:
s1: acquiring image data of an endoscopic surgery video and a truth value label corresponding to the image data; constructing a full surgical field Jing Yuyi segmentation model; the surgical whole-scene semantic segmentation model comprises an encoder, a strip convolution attention module and a segmentation module;
s2: the encoder encodes the image data and outputs an encoding result; the coding result comprises coding results of different stages;
s3: the coding result passes through the strip convolution attention module to output a feature map; the feature map comprises feature maps corresponding to the coding results of all stages;
s4: the segmentation module performs up-sampling operation and splicing operation on the feature graphs corresponding to the coding results of each stage to obtain segmentation results;
s5: convolving the feature map with the largest size to obtain a boundary map; setting a boundary guiding dividing head, and obtaining a target boundary diagram by the truth value label through the boundary guiding dividing head;
s6: calculating boundary loss according to the boundary map and the target boundary map; calculating segmentation loss according to the segmentation result and the truth value label; combining the boundary loss and the segmentation loss to construct a mixed loss function; optimizing the surgical whole-field Jing Yuyi segmentation model by the hybrid loss function;
s7: and inputting the image data to be segmented into the optimized full surgical field Jing Yuyi segmentation model, and outputting a final segmentation result.
Preferably, in S2, HRNetV2 is used as the encoder; the sizes of the encoding results of the stages outputted are different.
Preferably, in S3, the elongated convolution attention module includes a region feature extraction block and an instrument feature extraction block; based on the coding results of each stage, the regional characteristic extraction block and the instrument characteristic extraction block are extracted in parallel to obtain regional characteristics and instrument characteristics; and adding the coding results of each stage with the corresponding regional characteristics and the corresponding instrument characteristics to obtain a characteristic diagram corresponding to the coding results of each stage.
Preferably, the region feature extraction block includes a depth convolution, a first multi-branch depth banded convolution, a1×1 convolution; the extraction process of the regional characteristics comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:
Figure SMS_1
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure SMS_2
the first multi-branch depth banded convolution, the 1 x 1 convolution, obtains a first attention map based on the local information, the first attention map being noted as:
Figure SMS_3
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure SMS_4
multiplying the first attention map with the local information to obtain the region feature; the calculation formula is as follows:
Figure SMS_5
wherein ,
Figure SMS_6
representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />
Figure SMS_7
Represent the firstiRegional characteristics of the stage;k j representing the convolution kernel size; />
Figure SMS_8
Representing element-by-element multiplication.
Preferably, the instrument feature extraction block comprises a depth convolution, a second multi-branch depth banded convolution, a1×1 convolution; the process for extracting the instrument features comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:
Figure SMS_9
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure SMS_10
the second multi-branch depth banded convolution, the 1 x 1 convolution, obtains a second attention map based on the local information, the second attention map being noted as:
Figure SMS_11
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows: />
Figure SMS_12
Multiplying the second attention map with the local information to obtain the instrument feature; the calculation formula is as follows:
Figure SMS_13
wherein ,
Figure SMS_14
representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />
Figure SMS_15
Represent the firstiInstrument characteristics of the stage;k j representing the convolution kernel size; />
Figure SMS_16
Representing element-by-element multiplication.
Preferably, in S4, the segmentation module screens out the feature map with the largest size, and upsamples the sizes of the rest feature maps to the same size as the feature map with the largest size; and splicing the up-sampled feature images, and up-sampling the splicing result by a set multiple to obtain a segmentation result.
Preferably, in S5, the calculation formula of the boundary map is:
Figure SMS_17
wherein ,bm x representing a boundary map;UP ×4 representing a 4-fold upsampling operation;W 1×1 representing a1 x 1 convolution;W bg a convolution operation representing a combination of a Relu activation function and a batch normalization;
Figure SMS_18
a feature map having a maximum size;
the boundary guiding dividing head comprises three branches; each branch extracts the boundary feature of the truth label by applying the convolution network with the Laplace kernel, and each branchStep length is inconsistent; splicing boundary features extracted by each branch to obtain boundary information, and setting a fixed contour threshold to refine the boundary information to obtain a target boundary map; the target boundary map is recorded as:bm gt
preferably, in S6, the calculation formula of the boundary loss is:
Figure SMS_19
the calculation formula of the segmentation result is as follows:
Figure SMS_20
the calculation formula of the segmentation loss is as follows:
Figure SMS_21
wherein ,L bg representing boundary loss;L dice () Representing a dice loss function;bm x representing a boundary map;bm gt representing a target boundary map;L ce () Representing a cross entropy loss function;αandβis a constant;L seg representing segmentation loss;x seg representing the segmentation result;gtrepresenting a truth value tag;UP ×4 representing a 4-fold upsampling operation;x maff and representing the splicing result.
Preferably, in S6, the calculation formula of the mixing loss function is:
Figure SMS_22
wherein ,L joint representing a mixing loss function;L seg representing segmentation loss;L bg representing boundary loss;γandεis constant.
Preferably, the set multiple is 4 times.
The beneficial effects are that: the method provided by the invention overcomes the problem of low local feature contrast of the segmented target in the surgical scene through the constructed surgical whole scene semantic segmentation model, and also meets the precision requirement of the surgical scene segmentation on the region boundary.
Drawings
Exemplary embodiments of the present invention may be more fully understood by reference to the following drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the application, and not constitute a limitation of the invention. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a flow chart of a method provided according to an exemplary embodiment of the present application.
Fig. 2 is a schematic structural diagram of a full surgical field Jing Yuyi segmentation model provided according to an exemplary embodiment of the present application.
Fig. 3 is a schematic diagram of a long strip convolution attention module provided according to an exemplary embodiment of the present application.
Fig. 4 is a schematic diagram of a segmentation effect provided according to an exemplary embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
In addition, the terms "first" and "second" etc. are used to distinguish different objects and are not used to describe a particular order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
The embodiment of the application provides a surgical whole-scene semantic segmentation method based on long-strip convolution attention, and the method is described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, which are flowcharts illustrating a surgical whole-scene semantic segmentation method based on long convolution attention according to some embodiments of the present application, as shown in the drawings, the method may include the following steps:
s1: acquiring image data of an endoscopic surgery video and a truth value label corresponding to the image data; constructing a full surgical field Jing Yuyi segmentation model; the surgical whole-scene semantic segmentation model comprises an encoder, a strip convolution attention module (LSKA) and a segmentation module;
s2: the encoder encodes the image data and outputs an encoding result; the coding result comprises coding results of different stages;
in this embodiment, the encoder employs HRNetV2; the sizes of the encoding results of the stages outputted are different.
S3: the coding result passes through the strip convolution attention module to output a feature map; the feature map comprises feature maps corresponding to the coding results of all stages;
specifically, as shown in fig. 3, the elongated convolution attention module includes a region feature extraction block (an-block) and an instrument feature extraction block (Ins-block); based on the coding results of each stage, the regional characteristic extraction block and the instrument characteristic extraction block are extracted in parallel to obtain regional characteristics and instrument characteristics; and adding the coding results of each stage with the corresponding regional characteristics and the corresponding instrument characteristics to obtain a characteristic diagram corresponding to the coding results of each stage.
The regional feature extraction block comprises a depth convolution, a first multi-branch depth banded convolution and a1 multiplied by 1 convolution; the extraction process of the regional characteristics comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:
Figure SMS_23
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure SMS_24
the first multi-branch depth banded convolution, the 1 x 1 convolution, obtains a first attention map based on the local information, the first attention map being noted as:
Figure SMS_25
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure SMS_26
multiplying the first attention map with the local information to obtain the region feature; the calculation formula is as follows:
Figure SMS_27
wherein ,
Figure SMS_28
representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe result of the encoding of the phase is that,i∈{0,1,2,3};W 1×1 representing a1 x 1 convolution;jrepresent the firstjThe number of branches is chosen such that,j∈{0,1,2};
Figure SMS_29
represent the firstiRegional characteristics of the stage;k j the size of the convolution kernel is indicated,k j may be set to 11 or 21 or 31; />
Figure SMS_30
Representing element-by-element multiplication.
The instrument feature extraction block comprises a depth convolution, a second multi-branch depth banded convolution and a1 multiplied by 1 convolution; the process for extracting the instrument features comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:
Figure SMS_31
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure SMS_32
the second multi-branch depth banded convolution, the 1 x 1 convolution, obtains a second attention map based on the local information, the second attention map being noted as:
Figure SMS_33
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure SMS_34
multiplying the second attention map with the local information to obtain the instrument feature; the calculation formula is as follows:
Figure SMS_35
wherein ,
Figure SMS_36
representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />
Figure SMS_37
Represent the firstiInstrument characteristics of the stage;k j representing the convolution kernel size; />
Figure SMS_38
Representing element-by-element multiplication.
S4: the segmentation module performs up-sampling operation and splicing operation on the feature graphs corresponding to the coding results of each stage to obtain segmentation results;
specifically, the segmentation module screens out the feature map with the largest size, and upsamples the sizes of the rest feature maps to the same size as the feature map with the largest size; and splicing the up-sampled feature images, and up-sampling the splicing result by a set multiple to obtain a segmentation result.
In this embodiment, the set multiple is 4 times.
S5: convolving the feature map with the largest size to obtain a boundary map; setting a boundary guiding dividing head, and obtaining a target boundary diagram by the truth value label through the boundary guiding dividing head;
specifically, the calculation formula of the boundary map is:
Figure SMS_39
wherein ,bm x a boundary map (a boundary map of a feature map of the largest size, i.e., a boundary map of the highest resolution branch);UP ×4 representing a 4-fold upsampling operation;W 1×1 representing a1 x 1 convolution;W bg a convolution operation representing a combination of a Relu activation function and a batch normalization;
Figure SMS_40
a feature map having a maximum size;
the boundary guiding dividing head comprises three branches; extracting boundary characteristics of a truth value tag by each branch through applying a convolution network with a Laplace kernel, wherein the step sizes of the branches are inconsistent; splicing boundary features extracted by each branch to obtain boundary information, and setting a fixed contour threshold to refine the boundary information to obtain a target boundary map; the target boundary map is recorded as:bm gt
in this embodiment, three branches are designed for the boundary guiding dividing head and different step sizes are set to obtain multi-size information; feature maps of different sizes are up-sampled to the same resolution (same size), and dynamically re-weighted by a stitching operation to obtain richer boundary information.
S6: calculating boundary loss according to the boundary map and the target boundary map; calculating segmentation loss according to the segmentation result and the truth value label; combining the boundary loss and the segmentation loss to construct a mixed loss function; optimizing the surgical whole-field Jing Yuyi segmentation model by the hybrid loss function;
specifically, the calculation formula of the boundary loss is:
Figure SMS_41
the calculation formula of the segmentation result is as follows:
Figure SMS_42
the calculation formula of the segmentation loss is as follows:
Figure SMS_43
wherein ,L bg representing boundary loss;L dice () Representing a dice loss function;bm x representing a boundary map;bm gt representing a target boundary map;L ce () Representing a cross entropy loss function;αandβis constant, in the present embodimentαAndβdefault to 1;L seg representing segmentation loss;x seg representing the segmentation result;gtrepresenting a truth value tag;UP ×4 representing a 4-fold upsampling operation;x maff and representing the splicing result.
The calculation formula of the mixing loss function is as follows:
Figure SMS_44
wherein ,L joint representing a mixing loss function;L seg representing segmentation loss;L bg representing boundary loss;γandεis constant, in the present embodimentγAndεdefault to 1.
S7: and inputting the image data to be segmented into the optimized full surgical field Jing Yuyi segmentation model, and outputting a final segmentation result.
As shown in fig. 4, fig. 4 is a schematic diagram of a segmentation effect, in which (a) is the 151 st picture of the 1 st test sequence in the endos 2018 dataset, (a 1) is the truth label of (a), and (a 2) is the segmentation result of (a) by the method provided in this embodiment; (b) A 150 th picture of the 2 nd test sequence in the Endovis2018 dataset, (b 1) a truth label of (b), and (b 2) a segmentation result of (b) by this method provided in this example; (c) 153 th picture of the 3 rd test sequence in the Endovis2018 dataset, (c 1) is the truth label of (c), and (c 2) is the segmentation result of (c) by this method provided in this example. As can be seen from the graph, the segmentation result of the method provided by the embodiment is very similar to the truth label, so that the method provided by the embodiment can be demonstrated to have a better segmentation effect.
The method provided by the embodiment can be used for adaptively learning the regional characteristics and the instrument characteristics through the long-strip convolution attention module. The long-strip convolution attention module extracts the block-shaped characteristics of an operation area and the strip-shaped characteristics of an operation instrument respectively by using cascade and addition of deep strip convolution in parallel, and establishes the connection of pixel long distances through convolution kernels with the maximum size of 31 x 31, so that the receptive field of a network is increased, and the false recognition caused by the similarity of the area characteristics is reduced; and a boundary segmentation head is designed as depth supervision, so that the model is guided to learn boundary characteristics, and the capability of the model for distinguishing the operation boundary is improved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the embodiments, and are intended to be included within the scope of the claims and description.

Claims (10)

1. The surgical whole-scene semantic segmentation method based on the long-strip convolution attention is characterized by comprising the following steps of:
s1: acquiring image data of an endoscopic surgery video and a truth value label corresponding to the image data; constructing a full surgical field Jing Yuyi segmentation model; the surgical whole-scene semantic segmentation model comprises an encoder, a strip convolution attention module and a segmentation module;
s2: the encoder encodes the image data and outputs an encoding result; the coding result comprises coding results of different stages;
s3: the coding result passes through the strip convolution attention module to output a feature map; the feature map comprises feature maps corresponding to the coding results of all stages;
s4: the segmentation module performs up-sampling operation and splicing operation on the feature graphs corresponding to the coding results of each stage to obtain segmentation results;
s5: convolving the feature map with the largest size to obtain a boundary map; setting a boundary guiding dividing head, and obtaining a target boundary diagram by the truth value label through the boundary guiding dividing head;
s6: calculating boundary loss according to the boundary map and the target boundary map; calculating segmentation loss according to the segmentation result and the truth value label; combining the boundary loss and the segmentation loss to construct a mixed loss function; optimizing the surgical whole-field Jing Yuyi segmentation model by the hybrid loss function;
s7: and inputting the image data to be segmented into the optimized full surgical field Jing Yuyi segmentation model, and outputting a final segmentation result.
2. The surgical whole-scene semantic segmentation method based on long-strip convolution attention according to claim 1, wherein in S2, the encoder adopts HRNetV2; the sizes of the encoding results of the stages outputted are different.
3. The surgical whole-scene semantic segmentation method based on long-strip convolution attention according to claim 2, wherein in S3, the long-strip convolution attention module comprises a region feature extraction block and an instrument feature extraction block; based on the coding results of each stage, the regional characteristic extraction block and the instrument characteristic extraction block are extracted in parallel to obtain regional characteristics and instrument characteristics; and adding the coding results of each stage with the corresponding regional characteristics and the corresponding instrument characteristics to obtain a characteristic diagram corresponding to the coding results of each stage.
4. The long-strip convolution attention-based surgical whole-scene semantic segmentation method according to claim 3, wherein the region feature extraction block comprises a depth convolution, a first multi-branch depth banded convolution, a1 x 1 convolution; the extraction process of the regional characteristics comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:
Figure QLYQS_1
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure QLYQS_2
the first multi-branch depth banded convolution, the 1 x 1 convolution, obtains a first attention map based on the local information, the first attention map being noted as:
Figure QLYQS_3
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure QLYQS_4
multiplying the first attention map with the local information to obtain the region feature; the calculation formula is as follows:
Figure QLYQS_5
wherein ,
Figure QLYQS_6
representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />
Figure QLYQS_7
Represent the firstiRegional characteristics of the stage;k j representing the convolution kernel size; />
Figure QLYQS_8
Representing element-by-element multiplication.
5. The long-strip convolution attention-based surgical whole-scene semantic segmentation method according to claim 4, wherein the instrument feature extraction block comprises a depth convolution, a second multi-branch depth banded convolution, a1 x 1 convolution; the process for extracting the instrument features comprises the following steps:
local information of the coding results of each stage of the depth convolution aggregation is recorded as:
Figure QLYQS_9
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure QLYQS_10
the second multi-branch depth banded convolution, the 1 x 1 convolution, obtains a second attention map based on the local information, the second attention map being noted as:
Figure QLYQS_11
the method comprises the steps of carrying out a first treatment on the surface of the The calculation formula is as follows:
Figure QLYQS_12
multiplying the second attention map with the local information to obtain the instrument feature; the calculation formula is as follows:
Figure QLYQS_13
wherein ,
Figure QLYQS_14
representing a depth convolution with a convolution kernel size of 5 x 5;BNrepresenting a batch normalization operation;x i represent the firstiThe encoding result of the stage;W 1×1 representing a1 x 1 convolution;jrepresent the firstjA plurality of branches; />
Figure QLYQS_15
Represent the firstiInstrument characteristics of the stage;k j representing the convolution kernel size; />
Figure QLYQS_16
Representing element-by-element multiplication.
6. The method for semantic segmentation of a surgical whole scene based on long-strip convolution attention according to claim 5, wherein in S4, the segmentation module screens out feature images with the largest size and upsamples the sizes of the rest feature images to the same size as the feature images with the largest size; and splicing the up-sampled feature images, and up-sampling the splicing result by a set multiple to obtain a segmentation result.
7. The surgical whole-scene semantic segmentation method based on long-strip convolution attention as set forth in claim 6, wherein in S5, a calculation formula of the boundary map is:
Figure QLYQS_17
wherein ,bm x representing a boundary map;UP ×4 representing a 4-fold upsampling operation;W 1×1 representing a1 x 1 convolution;W bg a convolution operation representing a combination of a Relu activation function and a batch normalization;
Figure QLYQS_18
a feature map having a maximum size;
the boundary guiding dividing head comprises three branches; extracting boundary characteristics of a truth value tag by each branch through applying a convolution network with a Laplace kernel, wherein the step sizes of the branches are inconsistent; splicing boundary features extracted by each branch to obtain boundary information, and setting a fixed contour threshold to refine the boundary information to obtain a target boundary map; the target boundary map is recorded as:bm gt
8. the surgical whole-scene semantic segmentation method based on long-strip convolution attention according to claim 7, wherein in S6, a calculation formula of boundary loss is:
Figure QLYQS_19
the calculation formula of the segmentation result is as follows:
Figure QLYQS_20
the calculation formula of the segmentation loss is as follows:
Figure QLYQS_21
wherein ,L bg representing boundary loss;L dice () Representing a dice loss function;bm x representing a boundary map;bm gt representing a target boundary map;L ce () Representing a cross entropy loss function;αandβis a constant;L seg representing segmentation loss;x seg representing the segmentation result;gtrepresenting a truth value tag;UP ×4 representing a 4-fold upsampling operation;x maff and representing the splicing result.
9. The surgical whole-scene semantic segmentation method based on the long-strip convolution attention according to claim 8, wherein in S6, a calculation formula of the mixing loss function is:
Figure QLYQS_22
wherein ,L joint representing a mixing loss function;L seg representing segmentation loss;L bg representing boundary loss;γandεis constant.
10. The method for semantic segmentation of surgical whole-scene based on long-strip convolution attention according to claim 6, wherein the set multiple is 4 times.
CN202310304276.8A 2023-03-27 2023-03-27 Surgical whole-scene semantic segmentation method based on long-strip convolution attention Active CN116030260B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310304276.8A CN116030260B (en) 2023-03-27 2023-03-27 Surgical whole-scene semantic segmentation method based on long-strip convolution attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310304276.8A CN116030260B (en) 2023-03-27 2023-03-27 Surgical whole-scene semantic segmentation method based on long-strip convolution attention

Publications (2)

Publication Number Publication Date
CN116030260A true CN116030260A (en) 2023-04-28
CN116030260B CN116030260B (en) 2023-08-01

Family

ID=86077847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310304276.8A Active CN116030260B (en) 2023-03-27 2023-03-27 Surgical whole-scene semantic segmentation method based on long-strip convolution attention

Country Status (1)

Country Link
CN (1) CN116030260B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
WO2018111940A1 (en) * 2016-12-12 2018-06-21 Danny Ziyi Chen Segmenting ultrasound images
CN111833273A (en) * 2020-07-17 2020-10-27 华东师范大学 Semantic boundary enhancement method based on long-distance dependence
WO2021030629A1 (en) * 2019-08-14 2021-02-18 Genentech, Inc. Three dimensional object segmentation of medical images localized with object detection
CN112634279A (en) * 2020-12-02 2021-04-09 四川大学华西医院 Medical image semantic segmentation method based on attention Unet model
WO2021104056A1 (en) * 2019-11-27 2021-06-03 中国科学院深圳先进技术研究院 Automatic tumor segmentation system and method, and electronic device
CN113807355A (en) * 2021-07-29 2021-12-17 北京工商大学 Image semantic segmentation method based on coding and decoding structure
CN114359102A (en) * 2022-01-10 2022-04-15 天津大学 Image depth restoration evidence obtaining method based on attention mechanism and edge guide
CN114565628A (en) * 2022-03-23 2022-05-31 中南大学 Image segmentation method and system based on boundary perception attention
CN114723669A (en) * 2022-03-08 2022-07-08 同济大学 Liver tumor two-point five-dimensional deep learning segmentation algorithm based on context information perception
CN114972756A (en) * 2022-05-30 2022-08-30 湖南大学 Semantic segmentation method and device for medical image
CN114998373A (en) * 2022-06-15 2022-09-02 南京信息工程大学 Improved U-Net cloud picture segmentation method based on multi-scale loss function
CN115035295A (en) * 2022-06-15 2022-09-09 湖北工业大学 Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function
US20220309674A1 (en) * 2021-03-26 2022-09-29 Nanjing University Of Posts And Telecommunications Medical image segmentation method based on u-net
CN115601549A (en) * 2022-12-07 2023-01-13 山东锋士信息技术有限公司(Cn) River and lake remote sensing image segmentation method based on deformable convolution and self-attention model
CN115661462A (en) * 2022-11-14 2023-01-31 郑州大学 Medical image segmentation method based on convolution and deformable self-attention mechanism

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
WO2018111940A1 (en) * 2016-12-12 2018-06-21 Danny Ziyi Chen Segmenting ultrasound images
WO2021030629A1 (en) * 2019-08-14 2021-02-18 Genentech, Inc. Three dimensional object segmentation of medical images localized with object detection
WO2021104056A1 (en) * 2019-11-27 2021-06-03 中国科学院深圳先进技术研究院 Automatic tumor segmentation system and method, and electronic device
CN111833273A (en) * 2020-07-17 2020-10-27 华东师范大学 Semantic boundary enhancement method based on long-distance dependence
CN112634279A (en) * 2020-12-02 2021-04-09 四川大学华西医院 Medical image semantic segmentation method based on attention Unet model
US20220309674A1 (en) * 2021-03-26 2022-09-29 Nanjing University Of Posts And Telecommunications Medical image segmentation method based on u-net
CN113807355A (en) * 2021-07-29 2021-12-17 北京工商大学 Image semantic segmentation method based on coding and decoding structure
CN114359102A (en) * 2022-01-10 2022-04-15 天津大学 Image depth restoration evidence obtaining method based on attention mechanism and edge guide
CN114723669A (en) * 2022-03-08 2022-07-08 同济大学 Liver tumor two-point five-dimensional deep learning segmentation algorithm based on context information perception
CN114565628A (en) * 2022-03-23 2022-05-31 中南大学 Image segmentation method and system based on boundary perception attention
CN114972756A (en) * 2022-05-30 2022-08-30 湖南大学 Semantic segmentation method and device for medical image
CN114998373A (en) * 2022-06-15 2022-09-02 南京信息工程大学 Improved U-Net cloud picture segmentation method based on multi-scale loss function
CN115035295A (en) * 2022-06-15 2022-09-09 湖北工业大学 Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function
CN115661462A (en) * 2022-11-14 2023-01-31 郑州大学 Medical image segmentation method based on convolution and deformable self-attention mechanism
CN115601549A (en) * 2022-12-07 2023-01-13 山东锋士信息技术有限公司(Cn) River and lake remote sensing image segmentation method based on deformable convolution and self-attention model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HAO DU: "SwinPA-Net: Swin Transformer-Based Multiscale Feature Pyramid Aggregation Network for Medical Image Segmentation", IEEE *
MENG-HAO GUO: "SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation", ARXIV *
严广宇;刘正熙;: "基于混合注意力的实时语义分割算法", 现代计算机, no. 10 *
完美屁桃: "论文阅读-SegNeXt: 重新思考基于卷积注意力的语义分割", Retrieved from the Internet <URL:https://blog.csdn.net/qq_43687860/article/details/129122842> *
狗熊会: "医学图像分割综述", Retrieved from the Internet <URL:https://roll.sohu.com/a/533482881_455817> *

Also Published As

Publication number Publication date
CN116030260B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
Shvets et al. Automatic instrument segmentation in robot-assisted surgery using deep learning
Münzer et al. Content-based processing and analysis of endoscopic images and videos: A survey
US11699236B2 (en) Systems and methods for the segmentation of multi-modal image data
Dorent et al. CrossMoDA 2021 challenge: Benchmark of cross-modality domain adaptation techniques for vestibular schwannoma and cochlea segmentation
Reiter et al. Appearance learning for 3D tracking of robotic surgical tools
US10366488B2 (en) Image processing used to estimate abnormalities
Xu et al. Class-incremental domain adaptation with smoothing and calibration for surgical report generation
CN112184579B (en) Tissue lesion area image auxiliary restoration system and method
CN116563252A (en) Esophageal early cancer lesion segmentation method based on attention double-branch feature fusion
JP2021533451A (en) Systems and methods for automatic detection of visual objects in medical images
WO2013016113A1 (en) Tool tracking during surgical procedures
Song et al. An efficient deep learning based coarse-to-fine cephalometric landmark detection method
Oliva Maza et al. An ORB-SLAM3-based approach for surgical navigation in ureteroscopy
Shi et al. Attention gate based dual-pathway network for vertebra segmentation of X-ray spine images
CN116030260B (en) Surgical whole-scene semantic segmentation method based on long-strip convolution attention
CN113813053A (en) Operation process analysis method based on laparoscope endoscopic image
Lin et al. CSwinDoubleU-Net: A double U-shaped network combined with convolution and Swin Transformer for colorectal polyp segmentation
Ali et al. Towards robotic knee arthroscopy: multi-scale network for tissue-tool segmentation
CN116188486A (en) Video segmentation method and system for laparoscopic liver operation
Hofman et al. First‐in‐human real‐time AI‐assisted instrument deocclusion during augmented reality robotic surgery
CN115311317A (en) Laparoscope image segmentation method and system based on ScaleFormer algorithm
Rueckert et al. Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art
CN115049709A (en) Deep learning point cloud lumbar registration method for spinal minimally invasive surgery navigation
Liu et al. LGI Net: Enhancing local-global information interaction for medical image segmentation
US10299864B1 (en) Co-localization of multiple internal organs based on images obtained during surgery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant