CN112699879A - Attention-guided real-time minimally invasive surgical tool detection method and system - Google Patents

Attention-guided real-time minimally invasive surgical tool detection method and system Download PDF

Info

Publication number
CN112699879A
CN112699879A CN202011622027.6A CN202011622027A CN112699879A CN 112699879 A CN112699879 A CN 112699879A CN 202011622027 A CN202011622027 A CN 202011622027A CN 112699879 A CN112699879 A CN 112699879A
Authority
CN
China
Prior art keywords
attention
minimally invasive
surgical tool
detection module
fine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011622027.6A
Other languages
Chinese (zh)
Inventor
赵子健
史攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202011622027.6A priority Critical patent/CN112699879A/en
Publication of CN112699879A publication Critical patent/CN112699879A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of minimally invasive surgery video analysis, and provides a method and a system for detecting a minimally invasive surgery tool in real time based on attention guidance. The method comprises the steps of obtaining a minimally invasive surgery video and processing the minimally invasive surgery video frame by frame to obtain a surgery image; inputting the operation images into a convolutional neural network framework based on attention guidance frame by frame, and outputting an accurate operation tool boundary box; wherein the attention-directed-based convolutional neural network framework comprises a coarse detection module and a fine detection module which are cascaded; the rough detection module is used for performing rough positioning parameter regression on the operation image to obtain a fine anchor point and determining whether the fine anchor point is an operation tool or a background; the fine detection module is used for obtaining an accurate surgical tool bounding box based on the attention mechanism and the predicted fine anchor point category.

Description

Attention-guided real-time minimally invasive surgical tool detection method and system
Technical Field
The invention belongs to the technical field of minimally invasive surgery video analysis, and particularly relates to a real-time minimally invasive surgery tool detection method and system based on attention guidance.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The advantages of minimally invasive surgery, besides being self-evident of low invasiveness, include post-operative pain and loss of blood, low trauma, and short recovery time, which are beneficial to both hospitalized patients and clinicians. However, the indirect observation and operation method of minimally invasive surgery impairs the hand-eye coordination ability of the surgeon, and may affect the visual understanding of the surgeon during the surgery, and the surgeon needs to acquire additional information to monitor the movement of the surgical tool in the body, which hinders the worldwide popularization of this technology. The minimally invasive surgery video analysis technology can reliably monitor the surgery in the tool interaction video through sufficient image analysis and processing, not only can automatically identify the ongoing surgery task so as to generate a surgery report and reconstruct a surgery process, but also can be used for reminding a clinician of possible complications and providing accurate and real-time navigation for the surgeon.
To reduce medical accidents, medical technicians attempt to enhance the ability of surgeons to ensure patient safety through context-aware computer-assisted surgery systems. The operation tool detection is used as an important component of the system, the tool identification and positioning problems are considered simultaneously according to the visual data, accurate position estimation of two-dimensional or three-dimensional operation tools can be provided, and potential application fields of the system comprise accurate positioning and posture estimation of the operation tools, real-time reminding of operations, operation flow optimization, objective skill assessment and the like. Therefore, real-time and accurate detection of the surgical tool can provide important information for operation navigation in minimally invasive surgery.
Detection of minimally invasive surgical tools is very challenging compared to the task of target detection in natural scenes. Firstly, the number of label samples of the former data set is generally larger, and the number of label samples of the latter data set is relatively smaller; secondly, the complexity of the surgical environment and the movement of organs in the background can cause the target to be lost more easily; in addition, the appearance of the surgical tool, the surgical background, and the pose of the surgical tool lack consistency, and the data set may cause information loss due to illumination variation, specular reflection, tool shadows, motion blur, bleeding, or smoke occlusion in the surgical video, which all bring difficulties to algorithm and model training. Therefore, the inventor finds that the method adopted by the current surgical tool detection is difficult to achieve the real-time effect, and the detection precision is easily influenced by the interference factors.
Disclosure of Invention
In order to solve at least one technical problem in the background art, the invention provides a method and a system for detecting a minimally invasive surgical tool in real time based on attention guidance.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a real-time minimally invasive surgical tool detection method based on attention guidance.
A real-time minimally invasive surgical tool detection method based on attention guidance comprises the following steps:
obtaining a minimally invasive surgery video and processing the minimally invasive surgery video frame by frame to obtain a surgery image;
inputting the operation images into a convolutional neural network framework based on attention guidance frame by frame, and outputting an accurate operation tool boundary box;
wherein the attention-directed-based convolutional neural network framework comprises a coarse detection module and a fine detection module which are cascaded; the rough detection module is used for performing rough positioning parameter regression on the operation image to obtain a fine anchor point and determining whether the fine anchor point is an operation tool or a background; the fine detection module is used for obtaining an accurate surgical tool bounding box based on the attention mechanism and the predicted fine anchor point category.
A second aspect of the invention provides an attention-guided real-time minimally invasive surgical tool detection system.
An attention-guided real-time minimally invasive surgical tool detection system comprising:
the operation image acquisition module is used for acquiring a minimally invasive operation video and performing frame-by-frame processing to obtain an operation image;
the surgical tool detection module is used for inputting surgical images into the convolutional neural network framework based on attention guidance frame by frame and outputting an accurate surgical tool bounding box;
wherein the attention-directed-based convolutional neural network framework comprises a coarse detection module and a fine detection module which are cascaded; the rough detection module is used for performing rough positioning parameter regression on the operation image to obtain a fine anchor point and determining whether the fine anchor point is an operation tool or a background; the fine detection module is used for obtaining an accurate surgical tool bounding box based on the attention mechanism and the predicted fine anchor point category.
A third aspect of the invention provides a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps in the method for attention-guided real-time minimally invasive surgical tool detection based on attention as described above.
A fourth aspect of the invention provides a computer apparatus.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps in the attention-directed based real-time minimally invasive surgical tool detection method as described above.
Compared with the prior art, the invention has the beneficial effects that:
the method comprises the steps of performing rough positioning parameter regression on an operation image to obtain a fine anchor point, and determining whether the fine anchor point is an operation tool or a background; and then, based on the attention mechanism and the predicted fine anchor point category, an accurate surgical tool boundary frame is obtained, the detection speed is improved while the higher detection accuracy is ensured, and the requirement of the detection real-time performance of the minimally invasive surgical tool is met.
The invention adopts the light-weight convolutional neural network based on attention guidance, and utilizes the light-weight head module to replace standard convolution, thereby greatly reducing the number of parameters of the convolutional neural network and the complexity of calculation, improving the detection speed while ensuring higher detection accuracy, and meeting the requirement of detection real-time performance of minimally invasive surgical tools.
The method extracts the candidate surgical tool bounding boxes by adopting a method based on an attention mechanism, and utilizes the extrusion and excitation module to fuse more context information, thereby enhancing the focusing capability of the network on the related image areas, facilitating the regression and classification of minimally invasive surgical tools and further improving the accuracy of surgical tool detection.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow chart of a method for real-time minimally invasive surgical tool detection based on attention-guidance according to an embodiment of the present invention;
FIG. 2 is a diagram of an overall convolutional neural network framework for an embodiment of the present invention;
FIG. 3 is a detailed block diagram of a lightweight attention-directed transition connection module of an embodiment of the present invention;
FIG. 4 is a detailed block diagram of a compression and excitation module according to an embodiment of the present invention;
FIG. 5 is a detailed block diagram of a light-weight head module according to an embodiment of the invention;
FIG. 6 is a schematic structural diagram of a real-time minimally invasive surgical tool detection system based on attention guidance according to an embodiment of the invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Interpretation of terms
CNN, an abbreviation of volumetric Neural Network, is a type of feed-forward Neural Network that contains convolution calculation and has a depth structure, and mainly extracts features on an image for further classification and detection.
VGG-16, an abbreviation of Visual Geometry Group Network, is a convolutional neural Network with 16 layers depth, which is commonly used as a backbone Network of a detection model to extract corresponding features.
Attention-guided modeling improves the representation ability of the network by modeling the dependency of each channel, and enables channel-by-channel adjustment of features, so that the network can learn global information to selectively enhance features containing useful information and suppress useless features.
Example one
As shown in fig. 1, the present embodiment provides a method for detecting a minimally invasive surgical tool in real time based on attention guidance, which includes:
s101: and acquiring a minimally invasive surgery video, and processing the minimally invasive surgery video frame by frame to obtain a surgery image.
In a specific implementation, the process of step S101 includes:
s1011: utilize the camera to obtain the video of whole operation process when minimal access surgery goes on, for example: speed 25 FPS;
s1012: utilizing related framing software to down-sample the video with the corresponding speed acquired in the step S1011, for example, the sampling speed is 5FPS, and further storing the video as an operation image; it should be noted here that in the implementation, the original video is down-sampled to the video frame rate that can be artificially marked; the down-sampling can be used for down-sampling the original video, so that the time information among video segments is enriched, and the accuracy of the detection of the surgical tool is improved.
S1013: step S1012 is repeated until all the minimally invasive surgery videos are converted into surgery images.
S102: and inputting the operation images into the convolutional neural network framework based on attention guidance frame by frame, and outputting an accurate operation tool bounding box.
Wherein the attention-directed-based convolutional neural network framework comprises a coarse detection module and a fine detection module which are cascaded; the rough detection module is used for performing rough positioning parameter regression on the operation image to obtain a fine anchor point and determining whether the fine anchor point is an operation tool or a background; the fine detection module is used for obtaining an accurate surgical tool bounding box based on the attention mechanism and the predicted fine anchor point category.
Specifically, the category of the surgical tool appearing in the minimally invasive surgery video is C, and since seven surgical tools are shared in the processed surgery video, C is set to 7. The seven surgical tools are respectively a grasper, a two-stage clip, a hook, scissors, a forceps, a flusher and a collection bag. The original resolution of each frame of image is 854 × 480, and in order to improve the network processing efficiency, the resolution of the image is uniformly set to be 320 × 320 before the training of the convolutional neural network framework based on attention guidance. The attention-directed convolutional neural network framework is fine-tuned using the stochastic gradient descent principle.
Such as: initializing learning rates of all layers of an attention-directed convolutional neural network framework to 5 × 10-5Momentum of 0.9, weight decay of 5 × 10-5
It should be noted here that, in other embodiments, the learning-related parameters of the convolutional neural network framework based on attention-guiding may also be set by a person skilled in the art according to actual situations, and will not be described here again.
Before the rough detection module carries out rough positioning parameter regression on the operation images to obtain the fine anchor points, the input minimally invasive operation images are processed in batches.
Specifically, the input batch images are preprocessed, and the preprocessing comprises two data enhancement operations of optical transformation and geometric transformation, wherein the data enhancement operation comprises random brightness adjustment and contrast adjustment, and the data enhancement operation comprises random expansion, original training frame clipping and random horizontal turning frame level turning. Most of these operations are stochastic processes to ensure as much data enrichment as possible, thereby increasing the number of training data set samples.
In the process of training the convolutional neural network framework based on attention guidance, as shown in fig. 2, the backbone network selected by the rough detection module is a modification of VGG-16 pre-trained on the ImageNet standard data set, wherein the fully connected layers 6 and 7 are modified into convolutional layers 6 and 7, respectively, and then two additional convolutional layers 6-1 and 6-2 are added after the VGG-16 for extracting richer features;
performing multi-scale prediction by using the characteristics extracted by the convolutional layer 4-3, the convolutional layer 5-3, the convolutional layer 7 and the convolutional layer 6-2 of the modified VGG-16;
obtaining a fine anchor point through the rough regression positioning parameters, and providing a better initialization method for a fine detection module;
meanwhile, the obtained fine anchor points are classified, whether the anchor points are tools or backgrounds is judged, a large number of negative anchor points are filtered, and the problem of unbalance of positive samples and negative samples is solved.
As shown in fig. 2, the fine detection module comprises four cascaded lightweight attention-directed distracting link modules that incorporate the squeeze-and-fire module and the lightweight head module.
In the fine detection module, the fine anchor point will go through four transition connection modules to adaptively fuse the low-level and high-level features derived from the coarse detection module.
Detailed structural view of the transfer link module referring to fig. 3, it neatly combines the compression and excitation module and the light-weight head module.
As shown in fig. 4, the squeeze and fire module includes a global pooling layer, two complete connection layers, and a Sigmoid activation function, the most important operation of which is squeeze and fire. By analyzing the relationship among the channels, the network can automatically focus on the channel characteristics with the most abundant information and restrain the unimportant channel characteristics.
Firstly, performing global pooling on the feature maps input into the extrusion and excitation module to obtain a real number array with the length of M, so that the feature maps on each channel have a global receptive field, and thus, the low-level feature maps with smaller receptive fields can utilize global information to improve the feature extraction capability of a network and obtain richer semantic information of images; secondly, inputting a real number row with the length of M into a full connection layer, firstly reducing the dimension into a vector with the dimension of 1 multiplied by M/r, using a ReLU activation function, then increasing the dimension into a vector with the dimension of 1 multiplied by M, and adopting a Sigmoid activation function to calculate the weight coefficient of a channel; finally, the weighting coefficient is multiplied by the corresponding characteristic channel, so as to update the characteristic diagram.
In the embodiment, a candidate surgical tool boundary frame is extracted by adopting an attention mechanism-based method, more context information is fused by utilizing the extrusion and excitation module, the focusing capability of the network on the related image area is enhanced, the regression and classification of the minimally invasive surgical tool are facilitated, and the accuracy of surgical tool detection is improved.
As shown in fig. 5, the lightweight headblock is based on a deep separable convolution design, replacing the traditional 3 x 3 standard convolution with a fusion of two-way features.
The depth separable convolution is two steps, which have different functions:
1. separating the depth information;
2. size was reduced using 1 x 1 convolution to fuse channels. The calculation amount of the depth separable convolution is about 1/9 of the calculation amount of the traditional convolution operation, and the detection speed is obviously improved.
In the light-weight head module, one path is the characteristic generated by the standard convolution of 1 multiplied by 1, the other path is the characteristic generated by the standard convolution of 1 multiplied by 1 and the characteristic generated by the depth separable convolution of 3 multiplied by 3, and through the fusion of the two paths, the network can directly output the coordinates and the classification scores of the surgical tools only by the convolution of 1 multiplied by 1, thereby reducing the number of parameters and the calculation complexity of the network, greatly improving the detection speed and meeting the requirement of the real-time property of the minimally invasive surgery.
Specifically, in the fine detection module, through fusion of low-level features and high-level features, the class of the surgical tool and the size of a bounding box are finally obtained through regression by using an L1 loss function, and then the position of the surgical tool in the image is output.
According to the embodiment, the light-weight convolutional neural network based on attention guidance is adopted, and the light-weight head module is used for replacing standard convolution, so that the number of parameters of the convolutional neural network and the complexity of calculation are greatly reduced, the higher detection accuracy is ensured, the detection speed is increased, and the requirement of detection real-time performance of minimally invasive surgical tools is met.
The attention-guided real-time detection method for the minimally invasive surgical tool ensures the detection real-time performance of the minimally invasive surgical tool and improves the detection accuracy of the surgical tool.
Example two
As shown in fig. 6, the present embodiment provides an attention-guided real-time minimally invasive surgical tool detection system, which includes:
the operation image acquisition module is used for acquiring a minimally invasive operation video and performing frame-by-frame processing to obtain an operation image;
the surgical tool detection module is used for inputting surgical images into the convolutional neural network framework based on attention guidance frame by frame and outputting an accurate surgical tool bounding box;
wherein the attention-directed-based convolutional neural network framework comprises a coarse detection module and a fine detection module which are cascaded; the rough detection module is used for performing rough positioning parameter regression on the operation image to obtain a fine anchor point and determining whether the fine anchor point is an operation tool or a background; the fine detection module is used for obtaining an accurate surgical tool bounding box based on the attention mechanism and the predicted fine anchor point category.
It should be noted that, each module in the attention-guided real-time minimally invasive surgical tool detection system according to the present embodiment corresponds to each step in the attention-guided real-time minimally invasive surgical tool detection method according to the first embodiment one by one, and the specific implementation process thereof is the same, and will not be described herein again.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the attention-guided real-time minimally invasive surgical tool detection method according to the first embodiment.
Example four
The embodiment provides a computer device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for detecting a minimally invasive surgical tool based on attention guidance according to the embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described in terms of flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1.一种基于注意力引导的实时微创手术工具检测方法,其特征在于,包括:1. a real-time minimally invasive surgical tool detection method based on attention guidance, is characterized in that, comprising: 获取微创手术视频并进行逐帧处理后得到手术图像;Obtain minimally invasive surgical videos and process them frame by frame to obtain surgical images; 将手术图像逐帧输入至基于注意力引导的卷积神经网络框架中,输出精确的手术工具边界框;Input surgical images frame by frame into an attention-guided convolutional neural network framework to output accurate surgical tool bounding boxes; 其中,基于注意力引导的卷积神经网络框架包括级联的粗略检测模块和精细检测模块;粗略检测模块用于将手术图像进行粗略的定位参数回归得到精细锚点,确定精细锚点是手术工具还是背景;精细检测模块用于基于注意力机制及预测的精细锚点类别,得到精确的手术工具边界框。Among them, the convolutional neural network framework based on attention guidance includes a cascaded coarse detection module and fine detection module; the coarse detection module is used to regress the rough positioning parameters of the surgical image to obtain the fine anchor point, and determine that the fine anchor point is a surgical tool Or the background; the fine detection module is used to obtain accurate surgical tool bounding boxes based on the attention mechanism and predicted fine anchor categories. 2.如权利要求1所述的基于注意力引导的实时微创手术工具检测方法,其特征在于,所述粗略检测模块为改进的VGG-16网络,其中原VGG-16的全连接层6和全连接层7分别被修改为卷积层6和卷积层7,然后在VGG-16之后添加了两个额外的卷积层6-1和卷积层6-2。2. The real-time minimally invasive surgical tool detection method based on attention guidance as claimed in claim 1, wherein the rough detection module is an improved VGG-16 network, wherein the fully connected layers 6 and 6 of the original VGG-16 Fully connected layer 7 is modified to convolutional layer 6 and convolutional layer 7, respectively, and then two additional convolutional layers 6-1 and 6-2 are added after VGG-16. 3.如权利要求1所述的基于注意力引导的实时微创手术工具检测方法,其特征在于,所述精细检测模块包括四个级联的轻量级注意力引导的转移连接模块,所述轻量级注意力引导的转移连接模块中融合了挤压与激励模块和轻量头模块。3. The real-time minimally invasive surgical tool detection method based on attention guidance according to claim 1, wherein the fine detection module comprises four cascaded lightweight attention-guided transfer connection modules, the The squeeze and excitation module and the lightweight head module are integrated in the lightweight attention-directed transfer connection module. 4.如权利要求3所述的基于注意力引导的实时微创手术工具检测方法,其特征在于,所述挤压与激励模块包括一个全局池化层,两个完整连接层和一个Sigmoid激活函数。4. The real-time minimally invasive surgical tool detection method based on attention guidance according to claim 3, wherein the squeeze and excitation module comprises a global pooling layer, two complete connection layers and a sigmoid activation function . 5.如权利要求3所述的基于注意力引导的实时微创手术工具检测方法,其特征在于,所述轻量头模块基于深度可分离卷积设计,对粗略检测模块输出的低级特征和高级特征这两路特征进行融合。5. The real-time minimally invasive surgical tool detection method based on attention guidance according to claim 3, characterized in that, the lightweight head module is designed based on depth-separable convolution, and the low-level features and high-level features output by the rough detection module are The two features are fused. 6.如权利要求1所述的基于注意力引导的实时微创手术工具检测方法,其特征在于,所述基于注意力引导的卷积神经网络框架还用于对输入的手术图像进行分批预处理。6. The real-time minimally invasive surgical tool detection method based on attention guidance according to claim 1, wherein the convolutional neural network framework based on attention guidance is also used for batch pre-processing of input surgical images. deal with. 7.如权利要求6所述的基于注意力引导的实时微创手术工具检测方法,其特征在于,对输入的手术图像进行预处理,包括光学变换和几何变换两种数据增强操作,光学变换包括随机调整亮度和对比度,几何变换包括随机扩展、裁剪原始训练帧和随机水平翻转帧。7. The real-time minimally invasive surgical tool detection method based on attention guidance according to claim 6, wherein the preprocessing of the input surgical image includes two data enhancement operations of optical transformation and geometric transformation, and the optical transformation includes Brightness and contrast are randomly adjusted, and geometric transformations include random expansion, cropping of original training frames, and random horizontal flipping of frames. 8.一种基于注意力引导的实时微创手术工具检测系统,其特征在于,包括:8. A real-time minimally invasive surgical tool detection system based on attention guidance, is characterized in that, comprising: 手术图像获取模块,其用于获取微创手术视频并进行逐帧处理后得到手术图像;a surgical image acquisition module, which is used for acquiring minimally invasive surgical videos and performing frame-by-frame processing to obtain surgical images; 手术工具检测模块,其用于将手术图像逐帧输入至基于注意力引导的卷积神经网络框架中,输出精确的手术工具边界框;A surgical tool detection module, which is used to input surgical images frame by frame into an attention-guided convolutional neural network framework, and output accurate surgical tool bounding boxes; 其中,基于注意力引导的卷积神经网络框架包括级联的粗略检测模块和精细检测模块;粗略检测模块用于将手术图像进行粗略的定位参数回归得到精细锚点,确定精细锚点是手术工具还是背景;精细检测模块用于基于注意力机制及预测的精细锚点类别,得到精确的手术工具边界框。Among them, the convolutional neural network framework based on attention guidance includes a cascaded coarse detection module and fine detection module; the coarse detection module is used to regress the rough positioning parameters of the surgical image to obtain the fine anchor point, and determine that the fine anchor point is a surgical tool Or the background; the fine detection module is used to obtain accurate surgical tool bounding boxes based on the attention mechanism and predicted fine anchor categories. 9.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7中任一项所述的基于注意力引导的实时微创手术工具检测方法中的步骤。9. A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the real-time minimally invasive method based on attention guidance according to any one of claims 1-7 is realized when the program is executed by the processor. Steps in a surgical tool detection method. 10.一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-7中任一项所述的基于注意力引导的实时微创手术工具检测方法中的步骤。10. A computer device, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor implements any of claims 1-7 when the processor executes the program. Steps in an attention-guided real-time minimally invasive surgical tool detection method.
CN202011622027.6A 2020-12-30 2020-12-30 Attention-guided real-time minimally invasive surgical tool detection method and system Pending CN112699879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011622027.6A CN112699879A (en) 2020-12-30 2020-12-30 Attention-guided real-time minimally invasive surgical tool detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011622027.6A CN112699879A (en) 2020-12-30 2020-12-30 Attention-guided real-time minimally invasive surgical tool detection method and system

Publications (1)

Publication Number Publication Date
CN112699879A true CN112699879A (en) 2021-04-23

Family

ID=75511203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011622027.6A Pending CN112699879A (en) 2020-12-30 2020-12-30 Attention-guided real-time minimally invasive surgical tool detection method and system

Country Status (1)

Country Link
CN (1) CN112699879A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359873A (en) * 2022-10-17 2022-11-18 成都与睿创新科技有限公司 Control method for operation quality

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652175A (en) * 2020-06-11 2020-09-11 山东大学 Real-time surgical tool detection method for video analysis of robot-assisted surgery
CN112037263A (en) * 2020-09-14 2020-12-04 山东大学 Operation tool tracking system based on convolutional neural network and long-short term memory network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652175A (en) * 2020-06-11 2020-09-11 山东大学 Real-time surgical tool detection method for video analysis of robot-assisted surgery
CN112037263A (en) * 2020-09-14 2020-12-04 山东大学 Operation tool tracking system based on convolutional neural network and long-short term memory network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIE HU,LI SHEN,GANG SUN: "Squeeze-and-Excitation Networks", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
PAN SHI, ZIJIAN ZHAO,ET. AL.: "Real-Time Surgical Tool Detection in Minimally Invasive Surgery Based on Attention-Guided Convolutional Neural Network", 《IEEE ACCESS》 *
关世豪,杨桄,等: "基于注意力机制的多目标优化高光谱波段选择", 《光学学报》 *
赵文清,程幸福,等: "注意力机制和Faster RCNN相结合的绝缘子识别", 《智能系统学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359873A (en) * 2022-10-17 2022-11-18 成都与睿创新科技有限公司 Control method for operation quality
CN115359873B (en) * 2022-10-17 2023-03-24 成都与睿创新科技有限公司 Control method for operation quality

Similar Documents

Publication Publication Date Title
KR101926123B1 (en) Device and method for segmenting surgical image
Liu et al. An anchor-free convolutional neural network for real-time surgical tool detection in robot-assisted surgery
CN114220035A (en) Rapid pest detection method based on improved YOLO V4
Kamble et al. Applications of artificial intelligence in human life
WO2020133636A1 (en) Method and system for intelligent envelope detection and warning in prostate surgery
KR20190100011A (en) Method and apparatus for providing surgical information using surgical video
CN112037263B (en) Surgical tool tracking system based on convolutional neural network and long-term and short-term memory network
CN112668492A (en) Behavior identification method for self-supervised learning and skeletal information
Rezaei et al. Whole heart and great vessel segmentation with context-aware of generative adversarial networks
CN111783520A (en) Method and device for automatic identification of stages of laparoscopic surgery based on dual-stream network
JP2022527007A (en) Auxiliary imaging device, control method and device for analysis of movement disorder disease
CN113034495A (en) Spine image segmentation method, medium and electronic device
CN111652175A (en) Real-time surgical tool detection method for video analysis of robot-assisted surgery
Mamdouh et al. A New Model for Image Segmentation Based on Deep Learning.
Le et al. Robust surgical tool detection in laparoscopic surgery using yolov8 model
CN117197836A (en) Traditional Chinese medicine physique identification method based on multi-modal feature depth fusion
CN112699879A (en) Attention-guided real-time minimally invasive surgical tool detection method and system
KR102628324B1 (en) Device and method for analysing results of surgical through user interface based on artificial interlligence
CN111274854B (en) Human body action recognition method and vision enhancement processing system
Lan A novel deep learning architecture by integrating visual simultaneous localization and mapping (VSLAM) into CNN for real-time surgical video analysis
CN117274985A (en) Method and system for detecting tubercle bacillus real-time target based on deep learning
CN115546491A (en) Fall alarm method, system, electronic equipment and storage medium
CN114972881A (en) Image segmentation data labeling method and device
CN113344911B (en) Method and device for measuring size of calculus
Ismail et al. Acne lesion and wrinkle detection using faster R-CNN with ResNet-50

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210423