TWI802510B

TWI802510B - Interpretation assistance system and method for pulmonary nodule

Info

Publication number: TWI802510B
Application number: TW111135810A
Authority: TW
Inventors: 王永成; 林森炳; 莫元亨; 范君凱; 侯貴圓; 翁啓閎
Original assignee: 國泰醫療財團法人國泰綜合醫院; 雲象科技股份有限公司
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2023-05-11

Abstract

The present invention provides an interpretation assistance system for pulmonary nodules, which comprises: an inference server, an image server and a reporting auxiliary information subsystem. The inference server receives a set of lung computed tomography (CT) images and respectively inputs them into a plurality of inference models to obtain a plurality of nodule data, deletes the nodule data corresponding to the overlapped bounding box information, and outputs an inferred nodule data and the set of Lung CT images. The image server marks the nodule detection position on the set of lung CT images according to the inferred nodule data. The reporting auxiliary information subsystem displays the marked lung CT images and the inferred nodule data, receives a confirmed nodule data edited by a user for the inferred nodule data, and finally copies the confirmed nodule data to a report system.

Description

Pulmonary nodule auxiliary interpretation system and method

本發明關於一種肺部結節輔助判讀系統及其方法，特別是關於一種低劑量電腦斷層影像的肺部結節輔助判讀系統及其方法。The present invention relates to a pulmonary nodule auxiliary interpretation system and method thereof, in particular to a pulmonary nodule auxiliary interpretation system and method of low-dose computed tomography images.

依據衛生福利部最新資料顯示，肺癌仍持續高居國人癌症死因之首。近年相關的大型臨床試驗都顯示針對高危險族群進行低劑量電腦斷層篩檢，能早期發現小於1公分的肺部結節，進而降低死亡率。According to the latest data from the Ministry of Health and Welfare, lung cancer continues to be the top cause of cancer death in the country. Related large-scale clinical trials in recent years have shown that low-dose computed tomography screening for high-risk groups can detect pulmonary nodules smaller than 1 cm early, thereby reducing mortality.

然而，單一個案的肺部電腦斷層影像達數十張，放射科醫師若逐一檢查需高度專注，平均個案判讀時間達10-15分鐘。特別是小血管與小結節影像表現十分接近，另外毛玻璃狀結節(ground glass opacity, GGO)的邊緣或內部呈現不規則雲霧狀薄影，不僅難以辨識，甚至還可能需要調整窗框條件才能發現。However, there are dozens of computed tomography images of the lungs in a single case, and radiologists need to be highly focused to examine them one by one, and the average case interpretation time is 10-15 minutes. In particular, small blood vessels and small nodules appear very similar in images. In addition, ground glass opacity (GGO) has irregular cloud-like thin shadows on the edge or inside, which is not only difficult to identify, but may even require adjustment of the window frame conditions to find it.

對於放射科醫師來說，低劑量肺部電腦斷層的判讀不僅耗時還傷眼，加上日常的臨床報告工作量繁重，亟需高準確性的偵測模型，以及能簡化工作流程與縮短打報告時間的輔助工具，增進診斷效率。For radiologists, the interpretation of low-dose lung computed tomography is not only time-consuming but also harmful to the eyes. In addition to the heavy workload of daily clinical reports, there is an urgent need for high-accuracy detection models, as well as simplification of workflow and shortening of printing time. An auxiliary tool for reporting time to improve diagnostic efficiency.

本發明提供一種肺部結節輔助判讀系統，包含：一推論伺服器，其係接收一組肺部電腦斷層影像並分別輸入至複數個推論模型以得到複數個結節資料，每一該結節資料包含一預測框資訊及相對應之一預測機率資訊，刪除重疊之該些預測框資訊所對應的結節資料以輸出一推論結節資料與該組肺部電腦斷層影像；一影像伺服器，其係依照該推論結節資料標記結節偵測位置於該組肺部電腦斷層影像；以及一報告輔助資訊子系統，係運作於一電腦主機內並與該影像伺服器連接，包含下列模組：一顯示模組，顯示經標記之該組肺部電腦斷層影像及該推論結節資料；一輸入模組，接收一使用者對該推論結節資料編輯而得之一確認結節資料；以及一輸出模組，將該確認結節資料複製至一報告系統。The present invention provides an auxiliary interpretation system for pulmonary nodules, comprising: an inference server, which receives a group of computerized tomographic images of the lungs and inputs them to a plurality of inference models to obtain a plurality of nodule data, each of which includes a Prediction frame information and a corresponding prediction probability information, delete the nodule data corresponding to the overlapping prediction frame information to output a deduced nodule data and the group of lung computed tomography images; an image server, which is based on the inference The nodule data marks the nodule detection position in the group of lung computed tomography images; and a reporting auxiliary information subsystem is operated in a computer host and connected to the image server, including the following modules: a display module, displaying The set of marked lung computed tomography images and the deduced nodule data; an input module receiving confirmed nodule data edited by a user on the deduced nodule data; and an output module receiving the confirmed nodule data Copy to a reporting system.

本發明提供一種肺部結節輔助判讀方法，包含：接收一組肺部電腦斷層影像並分別輸入至複數個推論模型以得到複數個結節資料，每一該結節資料包含一預測框資訊及相對應之一預測機率資訊；刪除重複之該些預測框資訊所對應的結節資料以輸出一推論結節資料；依照該推論結節資料標記結節偵測位置於該組肺部電腦斷層影像；以及顯示經標記之該組肺部電腦斷層影像及該推論結節資料，接收一使用者對該推論結節資料編輯而得之一確認結節資料，並將該確認結節資料複製至一報告系統。The present invention provides an auxiliary interpretation method for pulmonary nodules, which includes: receiving a set of computed tomographic images of the lungs and inputting them into multiple inference models to obtain multiple nodule data, each of which contains a prediction frame information and a corresponding A prediction probability information; delete the nodule data corresponding to the repeated prediction frame information to output a deduced nodule data; mark the nodule detection position in the group of lung computer tomography images according to the deduced nodule data; and display the marked nodule data A set of computed tomography images of the lungs and the deduced nodule data, receiving confirmed nodule data edited by a user on the deduced nodule data, and copying the confirmed nodule data to a reporting system.

於某些具體實施例中，該些推論模型包含：一第一推論模型為具有預設錨框之區域提議網路模型，並使用經標記實質肺結節之電腦斷層影像訓練；以及一第二推論模型為不具預設錨框之區域提議網路模型，並使用經標記毛玻璃肺結節以及半實心肺結節之電腦斷層影像訓練。In some embodiments, the inference models include: a first inference model is a region proposal network model with preset anchor boxes, trained using computed tomography images of labeled solid pulmonary nodules; and a second inference model The model is a region proposal network model without preset anchor boxes and trained using computed tomography images of labeled ground-glass pulmonary nodules and semi-solid pulmonary nodules.

於某些具體實施例中，該些推論模型採用UNet骨幹結構，並在每個上採樣及下採樣結構中包含一殘差模塊與一壓縮激發模塊。In some embodiments, the inference models adopt a UNet backbone structure, and include a residual module and a compressed excitation module in each upsampling and downsampling structure.

於某些具體實施例中，該殘差模塊包含兩個標準化層。In some embodiments, the residual module includes two normalization layers.

於某些具體實施例中，該組肺部電腦斷層影像在輸入該些推論模型前可以進一步執行一窗框設定、一重新採樣處理及一裁切處理。In some embodiments, the group of lung computed tomography images may further perform a window frame setting, a resampling process and a cropping process before being input into the inference models.

於某些具體實施例中，該些推論模型進一步包含一肺葉切割模型，且每一該結節資料進一步包含一肺葉資訊。In some embodiments, the inference models further include a lobe cut model, and each of the nodule data further includes lobe information.

於某些具體實施例中，該殘差模塊進一步包含一激活層及一卷積層。In some embodiments, the residual module further includes an activation layer and a convolutional layer.

於某些具體實施例中，該壓縮激發模塊包含一全局平均池化層及兩個線性層子模塊。In some embodiments, the compressed excitation module includes a global average pooling layer and two linear layer sub-modules.

於某些具體實施例中，每個該上採樣結構中包含一變維暨上採樣層。In some embodiments, each of the upsampling structures includes a variable dimension and upsampling layer.

於某些具體實施例中，每個該下採樣結構中包含一可變維度層及至少一固定維度層。In some embodiments, each of the downsampling structures includes a variable dimension layer and at least one fixed dimension layer.

本發明所提供之肺部結節輔助判讀系統及其方法結合多種推論模型進行肺部電腦斷層影像之結節偵測，提高肺部結節之診斷敏感度，並將偵測結果表列給放射科醫師參考，使放射科醫師縮短診斷時間，並能進一步確認偵測到的結節影像，並勾選報告所要記載的結節資訊並匯出，大幅降低診斷時間及報告輸入時間，提高診斷量能及準確性。The pulmonary nodule auxiliary interpretation system and method provided by the present invention combine multiple inference models to detect nodules in pulmonary computerized tomography images, improve the diagnostic sensitivity of pulmonary nodules, and list the detection results to radiologists for reference , so that radiologists can shorten the diagnosis time, and can further confirm the detected nodule images, and check and export the nodule information to be recorded in the report, which greatly reduces the diagnosis time and report input time, and improves the diagnostic capacity and accuracy.

有關於本發明其他技術內容、特點與功效，在以下配合參考圖式之較佳實施例的詳細說明中，將可清楚的呈現。Other technical contents, features and effects of the present invention will be clearly presented in the following detailed description of preferred embodiments with reference to the drawings.

除非另有定義，本文使用的所有技術和科學術語具有與本發明所屬領域中的技術人員所通常理解相同的含義。Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

如本文所用，冠詞「一」、「一個」以及「任何」是指一個或多於一個(即至少一個)的物品的文法物品。例如，「一個元件」意指一個元件或多於一個元件。As used herein, the articles "a", "an" and "any" refer to one or more than one (ie, at least one) of the grammatical items of the item. For example, "an element" means one element or more than one element.

如本文所用，「約」、「大約」或「近乎」一詞實質上代表所述之數值或數值範圍為基準在20%以內，較佳為於10%以內，以及更佳者為於5%以內浮動。於文中所提供之數字化的量為近似值，意旨若術語「約」、「大約」或「近乎」沒有被使用時亦可被推得。As used herein, the term "about", "approximately" or "approximately" means that the stated value or range of values is within 20%, preferably within 10%, and more preferably within 5% float within. Numerical quantities provided herein are approximations and are intended to be inferred if the terms "about", "approximately" or "approximately" are not used.

如本文所用，「連接」、「連結」或「相連」意指使用電線、電路板、連接線、網路線、轉接裝置、轉換裝置、藍芽或無線網路任一方式進行通電或網路通訊的結合。As used herein, "connect", "connect" or "connect" means to use any means of wires, circuit boards, cables, network cables, adapters, converters, Bluetooth or wireless networks to connect to power or network Combination of communications.

如本文所用，「肺部電腦斷層影像」意指使用電腦斷層掃瞄儀(Computed Tomography, CT)掃描胸腔而得的影像，通常從肺尖上方掃描至橫隔膜，不限於使用Philips或Toshiba電腦斷層掃瞄儀。「肺部電腦斷層影像」包含低劑量肺部電腦斷層影像(Low-Dose CT, LDCT)，其係採用劑量約0.8~1.2mSv的X光螺旋掃描，原始影像矩陣大小包含但不限於512 x512像素，原始影像厚度包含但不限於3毫米、2毫米、1毫米。As used herein, "computed tomography of the lung" means an image obtained by scanning the chest cavity using a computerized tomography scanner (Computed Tomography, CT), usually from above the lung apex to the diaphragm, not limited to using Philips or Toshiba CT scanner. "Lung computed tomography" includes low-dose lung computed tomography (Low-Dose CT, LDCT), which uses X-ray helical scanning with a dose of about 0.8~1.2mSv, and the original image matrix size includes but is not limited to 512 x 512 pixels , the thickness of the original image includes but is not limited to 3mm, 2mm, and 1mm.

如圖1所示，本發明實施例提供一種肺部結節輔助判讀系統1，包含：一推論伺服器10、一影像伺服器20以及一報告輔助資訊子系統30。推論伺服器10與影像伺服器20可為網路伺服器或是應用程式伺服器，並運作於同一伺服器主機中或是分別運作於各自的伺服器主機中。As shown in FIG. 1 , an embodiment of the present invention provides an auxiliary pulmonary nodule interpretation system 1 , including: an inference server 10 , an image server 20 and a reporting auxiliary information subsystem 30 . The inference server 10 and the image server 20 can be web servers or application program servers, and run in the same server host or separately in separate server hosts.

報告輔助資訊子系統30連接推論伺服器10、影像伺服器20及放射科資訊系統4(Radiological Information System, RIS)，以作為肺部結節輔助判讀系統1中與放射科資訊系統4及電腦斷層原始影像儲存裝置2存取資料之應用程式介面，同時採用醫院內網連接來確保資料安全性。The report auxiliary information subsystem 30 is connected to the inference server 10, the image server 20, and the Radiological Information System 4 (Radiological Information System, RIS) to serve as the pulmonary nodule auxiliary interpretation system 1 with the Radiological Information System 4 and the original computer tomography system. The image storage device 2 is an application programming interface for accessing data, and at the same time uses a hospital intranet connection to ensure data security.

放射科資訊系統4接收醫療資訊系統5(Healthcare Information System, HIS)的患者資訊以建立CT檢查單，電腦斷層掃描儀3依照CT檢查單的檢查資訊將掃描得到的三維肺部電腦斷層原始影像儲存到儲存裝置2，其中患者資訊至少包含病歷號、姓名，檢查資訊至少包含檢查單號、檢查項目。當放射科醫生要打肺部電腦斷層診斷報告時，可以從報告輔助資訊子系統30選取CT檢查單號，以將儲存裝置2中對應的肺部電腦斷層原始影像輸入至推論伺服器10。The radiology department information system 4 receives the patient information from the medical information system 5 (Healthcare Information System, HIS) to create a CT checklist, and the computerized tomography scanner 3 stores the scanned 3D lung computer tomography original image according to the checklist information to the storage device 2, wherein the patient information includes at least the medical record number and name, and the examination information includes at least the examination order number and examination items. When the radiologist wants to print a lung computer tomography diagnosis report, he can select the CT examination order number from the report auxiliary information subsystem 30 to input the corresponding original lung computer tomography image in the storage device 2 to the inference server 10 .

請同時參考圖2，推論伺服器10接收來自儲存裝置2的一組肺部電腦斷層影像(步驟S10)並利用深度學習技術自動辨識肺部結節(約20秒)，以提供給使用者(例如放射科醫生)作為影像判讀參考。在輸入複數個推論模型前至少可執行下列影像前處理步驟(步驟S20)：一窗框設定(步驟S21)，例如設定為肺窗框(lung window)，其預設窗位(window level, WL)為-600，窗寬(window width, WW)為1500。一重新採樣處理(步驟S22)，將三維的肺部電腦斷層影像之三個維度都重新採樣(resampling)至1mm，使不同像素間距(pixel spacing)的原始影像都可以輸入至各個推論模型。一裁切處理(步驟S23)，將三維肺部電腦斷層影像分割成例如約128毫米立方的影像區塊輸入推論模型。Please refer to FIG. 2 at the same time. The inference server 10 receives a group of lung computed tomography images from the storage device 2 (step S10) and uses deep learning technology to automatically identify lung nodules (about 20 seconds) to provide to the user (for example Radiologist) as a reference for image interpretation. At least the following image preprocessing steps (step S20) can be performed before inputting a plurality of inference models: a window frame setting (step S21), for example, a lung window frame (lung window), and its preset window level (window level, WL ) is -600, and the window width (window width, WW) is 1500. A resampling process (step S22 ), resampling the three dimensions of the three-dimensional lung computed tomography image to 1mm, so that the original images with different pixel spacings can be input into each inference model. A cropping process (step S23 ), the three-dimensional lung computed tomography image is divided into, for example, approximately 128 mm cubic image blocks for input into the inference model.

推論伺服器10包含第一推論模型11及第二推論模型12，此外還可進一步包含肺葉分割模型13。推論伺服器10將上述肺部電腦斷層影像分別輸入至第一推論模型11及第二推論模型12進行推論 (步驟S30)，共可得到複數個結節資料。每一該結節資料至少包含

五個數值，其中預測框資訊

代表一球形預測框之球心( X, Y, Z)及直徑d， p為相對應之球形預測框內含肺部結節的預測機率資訊。理想上，單一球形預測框應盡其所能的將單一結節緊密包含，

與 d的單位為mm。若進一步輸入肺葉分割模型，該結節資料進一步包含一肺葉資訊

，肺葉資訊代表預測框資訊所在位置的肺葉，例如右上葉、右中葉、右下葉、左上葉、左下葉及非肺葉，並刪除

為非肺葉所對應的結節資料。 The inference server 10 includes a first inference model 11 and a second inference model 12 , and may further include a lung lobe segmentation model 13 . The inference server 10 inputs the above computed tomography images of the lungs to the first inference model 11 and the second inference model 12 for inference (step S30 ), and a plurality of nodule data can be obtained. Each nodule data contains at least

Five values, among which the prediction frame information

Represents the center ( X, Y, Z ) and diameter d of a spherical prediction frame, and p is the prediction probability information of pulmonary nodules contained in the corresponding spherical prediction frame. Ideally, a single spherical prediction box should try its best to tightly contain a single nodule,

The unit of d and d is mm. If the lung lobe segmentation model is further input, the nodule data further includes a lung lobe information

, the lung lobe information represents the lung lobe where the prediction box information is located, such as the upper right lobe, the middle right lobe, the lower right lobe, the upper left lobe, the lower left lobe and the non-lobe, and delete

It is the nodule data corresponding to the non-lobe.

接著，使用非最大值抑制(Non-Maximum Suppression, NMS)刪除重複結節資料(步驟S40)。非最大抑制用於計算複數個預測框之間的三維重疊度(intersection over union, IoU)，其原理是將每個預測框分別與其他預測框做重疊度計算。可設定重疊度大於0.5即為預測框重疊，若有預測框和其他預測框重疊，則刪除重疊之該些預測框資訊所對應的結節資料，僅保留帶有最高預測機率的結節資料。如此得以輸出含有預測機率較高且不重複之一組推論結節資料與該組肺部電腦斷層影像。於傳送推論結節資料至影像伺服器20之前，推論伺服器10亦可進一步執行一單位轉換步驟以將

的單位刻度轉換為像素。 Next, use non-maximum suppression (Non-Maximum Suppression, NMS) to delete duplicate nodule data (step S40). Non-maximum suppression is used to calculate the three-dimensional overlap (intersection over union, IoU) between multiple prediction frames. The principle is to calculate the overlap between each prediction frame and other prediction frames. It can be set that the overlap degree is greater than 0.5, which means that the prediction frame overlaps. If a prediction frame overlaps with other prediction frames, the nodule data corresponding to the overlapped prediction frame information will be deleted, and only the nodule data with the highest prediction probability will be kept. In this way, a set of inferred nodule data and a set of lung computed tomography images that contain a high probability of prediction and are not repeated can be output. Before sending the inference nodule data to the image server 20, the inference server 10 may further perform a unit conversion step to convert

The unit scale is converted to pixels.

影像伺服器20接收該組肺部電腦斷層影像及該推論結節資料，並依照該推論結節資料標記結節偵測位置於該組肺部電腦斷層影像。對於每個輸入的影像區塊，其產生的推論結節資料至少包含集合

及其相對應之預測機率資訊。其中，

為該輸入影像區塊內所推論出的結節個數，

為第

個結節的預測中心位置，

為第

個結節的預測直徑，

為第

個結節所在的肺葉資訊。 The image server 20 receives the set of computed tomography images of the lungs and the deduced nodule data, and marks nodule detection positions in the set of computed tomography images of the lungs according to the deduced nodule data. For each input image block, the resulting inferred nodule data contains at least the set

And its corresponding predicted probability information. in,

is the number of nodules deduced in the input image block,

for the first

The predicted center position of a nodule,

for the first

predicted diameter of nodules,

for the first

Information about the lung lobe where a nodule is located.

報告輔助資訊子系統30係運作於一電腦主機內並與該影像伺服器20連接，報告輔助資訊子系統30包含下列模組：輸入模組31、顯示模組32、輸出模組33。The reporting auxiliary information subsystem 30 operates in a computer host and is connected to the image server 20 . The reporting auxiliary information subsystem 30 includes the following modules: an input module 31 , a display module 32 , and an output module 33 .

如圖3A所示，顯示模組32接收推論結節資料並於使用者操作的顯示介面上顯示經標記之該組肺部電腦斷層影像，像素中心(

)且像素邊長為

的一個二維非實心方框A將可在第

張軸切影像上框選出第

個結節之結節偵測位置。 As shown in FIG. 3A , the display module 32 receives the deduced nodule data and displays the group of marked CT images of the lungs on the display interface operated by the user. The center of the pixel (

) and the pixel side length is

A two-dimensional non-solid box A will be available at the

Select the first frame on the axis-cut image

The nodule detection position of nodules.

同時參照圖3B，顯示模組32亦列出該推論結節資料，推論結節資料可以依照結節預測機率、結節直徑(尺寸)、結節所在張數進行排序。使用者(放射科醫師)能藉由報告輔助資訊子系統30的協助快速找到可能的肺部結節位置並予以確認。在推論結節資料列表中點選其中一個結節資料B時，影像視窗中可以跳轉到相對應的電腦斷層影像上的結節偵測位置A，醫師能快速的進行確認，減少誤診機率。Referring to FIG. 3B at the same time, the display module 32 also lists the inferred nodule data, and the inferred nodule data can be sorted according to nodule prediction probability, nodule diameter (size), and number of nodule sheets. The user (radiologist) can quickly find and confirm the possible location of the pulmonary nodule with the assistance of the reporting auxiliary information subsystem 30 . When one of the nodule data B is selected in the inferred nodule data list, the image window can jump to the nodule detection position A on the corresponding computed tomography image, and the doctor can quickly confirm and reduce the probability of misdiagnosis.

接著，醫師能夠在推論結節資料旁的輸入框中勾選何者確認為肺部結節，甚至可以進一步添加其他的確認結節位置。輸入模組31可以接收上述使用者對該推論結節資料編輯(例如勾選、添加或刪除)而得之一確認結節資料。輸出模組33，將該確認結節資料利用網頁應用程式介面(web API)複製及/或回傳至在放射科資訊系統之一報告系統，確認結節資料至少包含結節位置、結節直徑、結節所在肺葉，使醫師在報告中不需要人力輸入相關訊息，節省大量打報告時間。Then, the doctor can check which one is confirmed as pulmonary nodule in the input box next to the inferred nodule data, and can even further add other confirmed nodule locations. The input module 31 may receive one of the confirmed nodule data obtained by editing (for example, checking, adding or deleting) the inferred nodule data by the above-mentioned user. The output module 33 uses the web API to copy and/or return the confirmed nodule data to a reporting system in the radiology department information system, and confirms that the nodule data at least includes nodule position, nodule diameter, and lung lobe where the nodule is located , so that the doctor does not need to manually input relevant information in the report, saving a lot of time for typing the report.

最後介紹第一推論模型與第二推論模型的訓練方法及模組架構。第一訓練資料集與第二訓練資料集係由國泰醫院提供，採用3,000多名肺結節患者之肺部電腦斷層影像，由放射科專科醫生在肺部斷層影像軸切切面上圈選結節位置進行標註，多張切面之圈選結節位置可組成一個三維肺結節標註影像，其中肺結節包含實質肺結節、半實心肺結節(Pulmonary subsolid nodules, SSNs)與毛玻璃肺結節。肺部電腦斷層影像亦經過步驟S20之影像前處理，並且裁切成約128毫米立方的三維影像區塊

。 Finally, the training method and module structure of the first inference model and the second inference model are introduced. The first training data set and the second training data set were provided by Cathay Pacific Hospital, using the computed tomography images of the lungs of more than 3,000 patients with pulmonary nodules, and the radiologists circled the nodule locations on the axial section of the lung tomographic images Annotation, circled nodule positions in multiple slices can form a three-dimensional pulmonary nodule annotation image, in which pulmonary nodules include solid pulmonary nodules, semi-solid pulmonary nodules (Pulmonary subsolid nodules, SSNs) and ground glass pulmonary nodules. The computed tomography image of the lung is also subjected to image pre-processing in step S20, and cut into three-dimensional image blocks of about 128 mm cubic

.

第一訓練資料集包含複數個三維影像區塊，其包含標註有實質肺結節、毛玻璃肺結節、半實心肺結節或是並無任何標註(代表為正常肺組織)的三維影像區塊。第二訓練資料集包含複數個三維影像區塊，其包含標註有毛玻璃肺結節、半實心肺結節 (具有毛玻璃成分)或是並無標註(正常肺組織)的三維影像區塊。第二訓練資料集為第一訓練資料集的子集，其中的結節訓練資料因皆帶有毛玻璃成分，結節直徑或中心位置較不容易做精確定義，因此第二訓練資料集應為較難學習的資料集。收納各種型態的結節訓練資料有助於提高推論模型的敏感度，避免結節漏抓。The first training data set includes a plurality of 3D image blocks, including solid pulmonary nodules, ground glass pulmonary nodules, semi-solid pulmonary nodules or 3D image blocks without any annotation (representing normal lung tissue). The second training data set includes a plurality of 3D image blocks, including 3D image blocks marked with ground glass pulmonary nodules, semi-solid pulmonary nodules (with ground glass components) or unlabeled (normal lung tissue). The second training data set is a subset of the first training data set. Since the nodule training data contains ground glass components, it is difficult to define the diameter or center position of the nodule accurately. Therefore, the second training data set should be more difficult to learn data set. Accommodating various types of nodule training data helps to improve the sensitivity of the inference model and avoid missed nodule grasping.

如圖4A及圖4B所示，圖4A之第一推論模型與圖4B之第二推論模型皆採用3D-UNet骨幹結構，其係由數個下採樣及數個上採樣構成。下採樣可濃縮影像特徵，增加感受野(receptive field)，每次下採樣包含一個可變維度層B及至少一個固定維度層A。每次上採樣則包含一變維暨上採樣層C。並在每個上採樣及下採樣結構中包含殘差模塊與壓縮激發模塊，用以加強影像特徵的擷取。As shown in FIG. 4A and FIG. 4B , both the first inference model in FIG. 4A and the second inference model in FIG. 4B adopt a 3D-UNet backbone structure, which is composed of several down-sampling and several up-sampling. Downsampling can condense image features and increase the receptive field. Each downsampling includes a variable dimension layer B and at least one fixed dimension layer A. Each upsampling includes a variable dimension and upsampling layer C. Each up-sampling and down-sampling structure includes a residual module and a compressed excitation module to enhance the extraction of image features.

N張影像且長H、寬W、高D的灰階三維影像區塊(N, 1, D, H, W)各自輸入第一推論模型或第二推論模型後都會先進行下採樣四次。下採樣階段，特徵會被不斷濃縮，用以得到較抽象精簡的影像特徵，因此每次下採樣會陸續產生32、64、96、128個3D特徵圖，且資料大小會隨之減半。接著進行兩次上採樣，同時每次上採樣都會進行三線性上採樣操作(trilinear upsampling operation)。上採樣階段，會將濃縮後的特徵逐漸放大，最後在一個最終的目標的尺度下輸出模型預測的結節資料。於上採樣和下採樣階段之間，有兩條連線。此兩條連線會將同尺度的上採樣輸入與同尺度的下採樣輸出進行拼接(concatenate，圖中以[]表示)合併，此做法可避免可能有助於預測的資訊流失，增加模型學習的容易性。The gray-scale 3D image blocks (N, 1, D, H, W) of N images with length H, width W, and height D will be down-sampled four times after inputting the first inference model or the second inference model respectively. In the downsampling stage, the features will be continuously condensed to obtain more abstract and streamlined image features. Therefore, each downsampling will successively generate 32, 64, 96, and 128 3D feature maps, and the data size will be halved accordingly. Then two upsamplings are performed, and a trilinear upsampling operation (trilinear upsampling operation) is performed for each upsampling. In the upsampling stage, the condensed features will be gradually enlarged, and finally the nodule data predicted by the model will be output at the final target scale. Between the upsampling and downsampling stages, there are two wires. These two connections will concatenate (concatenate, represented by [] in the figure) the upsampling input of the same scale and the downsampling output of the same scale. This method can avoid the loss of information that may be helpful for prediction and increase model learning. the ease of.

接著，將上採樣三次後的資料輸入三維區域提議網路模型(3D Region Proposal Network, 3D RPN) 的後端模塊(後續簡稱為Head)，Head中只包含n個可變維度層(

)，目的是學習將採樣到的影像特徵經過分類和回歸以轉換成預測到的結節資訊。 Next, input the data upsampled three times into the back-end module of the 3D Region Proposal Network (3D RPN) (hereafter referred to as Head), and the Head only contains n variable-dimensional layers (

), the purpose is to learn to convert the sampled image features into predicted nodule information through classification and regression.

如圖4A所示，第一推論模型為具有預設錨框之三維物體區域提議網路模型 (3D RPN)，並使用第一訓練資料集進行訓練。此第一推論模型的預設錨框乃是人為預先在影像空間上灑落的，數個不同大小且等間隔分佈的球形框。模型將嘗試在散落的預設球形框內去檢測出是否有涵蓋物體。As shown in FIG. 4A, the first inference model is a 3D object region proposal network model (3D RPN) with a preset anchor frame, and is trained using the first training data set. The default anchor frame of the first inference model is artificially sprinkled in advance on the image space, several spherical frames of different sizes and equally spaced distribution. The model will try to detect if there are covered objects within the scattered preset spherical boxes.

第一推論模型的3D RPN中使用5種不同大小的預設錨框(Na=5)。錨框直徑分別為 5, 10, 15, 20, 25

。將第二次上採樣後的特徵圖(N, 128, D/4, H/4, W/4)輸入第一推論模型的Head之後得到形狀為

，意即N張影像區塊中，一共設有錨框數量

個，每個(第i個)錨框對應之五個預測目標包含：每個錨框中心位置

與直徑

的校正資訊

以及該錨框包含節結的機率有多少

。這些校正資訊包含可以用來把第

個預設錨框中心

微調至預測結節中心之偏移校正資訊，還包含可以用來將錨框直徑

縮放至預測結節直徑之縮放校正資訊。需要注意的是，由於是在長寬高皆縮小四倍的影像尺度上去作出預測，因此在輸出預測的結節資料之前，我們的座標預測需要在各座標軸方向乘以四倍，以還原至原來的影像尺度。 5 different sizes of preset anchor boxes (Na=5) are used in the 3D RPN of the first inference model. Anchor frame diameters are 5, 10, 15, 20, 25

. After inputting the feature map (N, 128, D/4, H/4, W/4) after the second upsampling into the head of the first inference model, the shape is

, which means that in N image blocks, there are a total of anchor frames

, each (i-th) anchor box corresponds to five prediction targets including: the center position of each anchor box

with diameter

Calibration information for

and what is the probability that the anchor box contains a nodule

. These calibration information contains information that can be used to convert the

default anchor box centers

Deskew information fine-tuned to the center of the predicted nodule, also contains the anchor box diameter that can be used to

Scale correction information for scaling to predicted nodule diameter. It should be noted that since the prediction is made on an image scale that is reduced by four times in length, width and height, before outputting the predicted nodule data, our coordinate prediction needs to be multiplied by four times in the direction of each coordinate axis to restore to the original Image scale.

經訓練完成之第一推論模型可以進一步包含一第一推論後處理。輸入第一推論模型進行推論的三維影像區塊，在得到錨框中心位置與直徑的校正資訊後，第一推論後處理計算回推校正後錨框資訊，並進行非最大抑制計算複數個校正後錨框之間的三維重疊度(intersection over union, IoU)，當重疊度＞0.5時可視作校正後錨框資訊重複。依照預測機率資訊大小依序刪除重複之該些校正後錨框資訊，以在N張影像中輸出較高機率且不重複之一組結節資料

一共預測出num_pred_nodules個結節。 The trained first inference model may further include a first inference post-processing. Input the 3D image block for inference by the first inference model. After obtaining the correction information of the center position and diameter of the anchor frame, the first inference post-processing calculates the information of the post-correction anchor frame, and performs non-maximum suppression to calculate multiple corrections. The three-dimensional overlap (intersection over union, IoU) between the anchor boxes, when the overlap is greater than 0.5, it can be regarded as the repetition of the corrected anchor box information. Delete the repeated corrected anchor frame information in order according to the size of the predicted probability information, so as to output a set of nodule data with a higher probability and no repetition in N images

A total of num_pred_nodules nodules are predicted.

於第一訓練模式中，不同的影像區塊將被輸入至模型多次以得到偏移校正資訊、縮放校正資訊以及錨框包含節結的機率，供學習各錨框是否有配對到物體，以及有配對到物體的錨框應如何：(1)校正至單一結節位置；(2)校正其大小使其剛好可以足夠包含一整顆結節。每個輸入的影像區塊都有相對應的醫師標記做為基準事實(ground truth)，基準事實資訊包含N張影像中的num_nodule個結節數目及相對應的結節中心(X, Y, Z)和結節直徑d。In the first training mode, different image blocks will be input to the model multiple times to obtain offset correction information, scale correction information, and the probability of anchor boxes containing nodules for learning whether each anchor box is matched to an object, and How should the anchor box paired to the object be: (1) correct to a single nodule position; (2) correct its size so that it is just big enough to contain a whole nodule. Each input image block has a corresponding physician mark as the ground truth. The ground truth information includes the number of num_nodule nodules in N images and the corresponding nodule centers (X, Y, Z) and Nodule diameter d.

預設錨框與基準事實資訊經過框匹配處理(box matching)得出各預設錨框是否有和結節匹配。若有匹配，此類有匹配錨框將透過RPN目標產生器(RPN target generator)以得到真實的錨框偏移與縮放校正量。此真實校正量結合模型於推論階段所產出的校正資訊，將可評估出模型預測的校正回歸誤差(regression loss)。另，模型將針對每個錨框預測出包含結節的機率，結合預設錨框是否確實包含結節，以及模型於該錨框所預測出的結節存在機率，可評估出模型的錨框分類誤差(classification loss)。模型的參數優化過程最終可將錨框校正量回歸誤差以及錨框分類誤差分別下降並逐步收斂，得出一個經訓練過的成熟推論模型。The preset anchor boxes and the reference fact information are subjected to box matching processing (box matching) to determine whether each preset anchor box matches a nodule. If there is a match, such matching anchor boxes will pass through the RPN target generator (RPN target generator) to obtain the actual anchor box offset and scaling correction. This real correction amount combined with the correction information produced by the model in the inference stage will evaluate the correction regression error (regression loss) predicted by the model. In addition, the model will predict the probability of containing nodules for each anchor frame, combined with whether the preset anchor frame does contain nodules, and the probability of the existence of nodules predicted by the model in the anchor frame, the anchor frame classification error of the model can be evaluated ( classification loss). The parameter optimization process of the model can finally reduce the regression error of the anchor box correction amount and the anchor box classification error respectively and gradually converge to obtain a trained and mature inference model.

如圖4B所示，第二推論模型為不具預設錨框之區域提議網路模型(Object as points)，並使用第二訓練資料集進行訓練。第二推論模型的RPN中不採用預設錨框。它的原理是以三維機率密度圖 (3D probability density map)的方式來估計節結的可能位置。找尋出節結的最大可能位置後，將繼續在該位置評估結節的可能大小。此第二推論模型因無錨框，故較不會被人為縮限了可能的結節檢出位置和範圍。因毛玻璃類結節的中心位置和大小存在有較高的標註不確定性，因此本發明實施例搭配採用不受限制的檢出範圍之第二推論模型有機會增加毛玻璃類肺結節的偵測機率。As shown in FIG. 4B , the second inference model is a region proposal network model (Object as points) without default anchor boxes, and is trained using the second training data set. The RPN of the second inference model does not use preset anchor boxes. Its principle is to estimate the possible location of nodules in the form of a three-dimensional probability density map (3D probability density map). Once the largest possible location of the nodule has been found, the assessment of the likely size of the nodule at that location will continue. Since the second inference model has no anchor frame, it is less likely to artificially limit the possible nodule detection locations and ranges. Since there is a high labeling uncertainty in the center position and size of the ground glass-like nodules, the embodiment of the present invention combined with the second inference model with an unlimited detection range may increase the detection probability of the ground-glass-like pulmonary nodules.

將第二次上採樣後的N張特徵影像

輸入第二推論模型的Head之後得到預測目標資訊

，其輸出可拆分成以下五張形狀為

張量：(1) N張三維機率密度圖，用以預測各影像的像素位置有多少機率為結節中心；(2)~(4)N張三個X, Y, Z位置偏移量預測張量，用以預測各像素若為結節中心，是否位置還需要微調；(5) N張結節直徑預測張量，用以預測各像素若為結節中心，其結節直徑應當為何。需要注意的是，由於是在長寬高皆縮小四倍的影像尺度上去作出預測，因此在預測輸出之前，我們的座標預測需要在各座標軸方向乘以四倍，以還原至原來的影像尺度。 The N feature images after the second upsampling

After inputting the Head of the second inference model, the predicted target information is obtained

, its output can be split into the following five shapes:

Tensor: (1) N three-dimensional probability density maps, used to predict the probability of the pixel position of each image being the nodule center; (2)~(4) N three X, Y, Z position offset prediction sheets (5) N nodule diameter prediction tensors, used to predict what the nodule diameter should be if each pixel is the nodule center. It should be noted that since the prediction is made on an image scale in which the length, width, and height are reduced by four times, our coordinate prediction needs to be multiplied by four times in the direction of each coordinate axis before the prediction output, so as to return to the original image scale.

經訓練完成之第二推論模型可以進一步包含一第二推論後處理，輸入第二推論模型進行推論的三維影像區塊，在得到結節中心位置偏移資訊後，第二推論後處理計算回推正確結節中心位置資訊以輸出一組結節資料。The trained second inference model may further include a second inference post-processing, input the 3D image block of the second inference model for inference, after obtaining the offset information of the nodule center position, the second inference post-processing calculates the correct inference Nodule center location information to output a set of nodule data.

於第二訓練模式中，基準事實資料經RPN訓練目標產生器處理後可得：(1)訓練目標直徑、(2)訓練目標其中心位置的偏移校正量、(3)訓練目標的機率密度圖，用以學習訓練目標的可能位置。計算此(1)、(2)、(3)的數值與模型預測目標資訊的差異，可評估出模型預測的整體誤差(Total Loss)。模型的參數優化過程最終可將整體誤差下降並逐步收斂，得出一個經訓練過的成熟推論模型。如圖5A至圖5C所示，可變維度層、固定維度層以及變維暨上採樣層皆包含殘差模塊(residual block)與壓縮激發模塊(squeeze-excitation block)。殘差模塊包含了兩個標準化層、二或三個卷積層 (取決於是否確切要降採樣)，以及兩個ReLU激活層。壓縮激發模塊藉由加權方式過濾不重要特徵以協助整理影像，其包含全局平均池化層(Global Average Pooling, GAP)、兩個線性層子模塊，其中線性層子模塊可以進一步包含線性層及激活層，激活層可以為ReLU激活層或Sigmoid激活層。圖5A為固定維度層，經過此處理的資料不會改變維度。圖5B之可變維度層則藉由調整卷積層數量及步長來變動維度。此可變維度層輸出的張量形狀為

。其中，

為部分卷積層的步長(可為1 或2)。當

時，表示其輸出的特徵圖之長、寬、高皆會減半。圖5C則是在最後包含三線性上採樣層進行升維。 In the second training mode, the base fact data can be obtained after being processed by the RPN training target generator: (1) the diameter of the training target, (2) the offset correction amount of the center position of the training target, (3) the probability density of the training target graph to learn the possible positions of the training objects. By calculating the difference between the values of (1), (2), and (3) and the target information predicted by the model, the overall error (Total Loss) of the model prediction can be evaluated. The parameter optimization process of the model can finally reduce the overall error and gradually converge, and a trained and mature inference model can be obtained. As shown in FIGS. 5A to 5C , the variable dimension layer, the fixed dimension layer, and the variable dimension and upsampling layer all include a residual block and a squeeze-excitation block. The residual module consists of two normalization layers, two or three convolutional layers (depending on whether downsampling is actually required), and two ReLU activation layers. The compressed excitation module helps to organize images by filtering unimportant features in a weighted manner. It includes a global average pooling layer (Global Average Pooling, GAP), two linear layer sub-modules, and the linear layer sub-module can further include linear layers and activations. layer, the activation layer can be a ReLU activation layer or a Sigmoid activation layer. Figure 5A is a fixed-dimensional layer, and the data after this processing will not change the dimension. The variable dimension layer in Figure 5B changes the dimension by adjusting the number of convolutional layers and the step size. The shape of the tensor output by this variable dimensionality layer is

. in,

is the stride of the partial convolution layer (can be 1 or 2). when

When , the length, width, and height of the output feature map will be halved. Figure 5C includes a trilinear upsampling layer at the end for upscaling.

卷積層(3D Conv)以及激活層的存在是用以學習並產生高階影像特徵。標準化層是用以重整特徵分布。如此，影像特徵才不至於經過多次卷積或激活後，擁有過於極端的數值或過廣的分佈，造成訓練過程不穩定。本發明實施例採用之標準化層為GroupNorm或SyncBatchNorm。此兩種標準化層皆能緩解模型過於肥大時，有限的顯卡記憶體資源上訓練不穩定的問題，提升訓練結果。壓縮激發模塊的輸出會等同於輸入的影像 (

張

的特徵影像)。壓縮激發模塊是用來學習並給予

張三維特徵影像不同的權重，以便於保留較有助於預測的三維特徵影像。 The convolution layer (3D Conv) and the activation layer are used to learn and generate high-level image features. The normalization layer is used to reshape the feature distribution. In this way, the image features will not have too extreme values or too wide distribution after multiple convolutions or activations, causing the training process to be unstable. The normalization layer used in the embodiment of the present invention is GroupNorm or SyncBatchNorm. These two normalization layers can alleviate the problem of unstable training on limited graphics card memory resources when the model is too hypertrophy, and improve the training results. The output of the compressed excitation module will be equal to the input image (

open

feature image). Compression excitation modules are used to learn and give

Different weights of the three-dimensional feature images are used to preserve the three-dimensional feature images that are more helpful for prediction.

本發明已透過上述之實施例揭露如上，僅是本發明部分較佳的實施例選擇，然其並非用以限定本發明，任何熟悉此一技術領域具有通常知識者，在瞭解本發明前述的技術特徵及實施例，並在不脫離本發明之精神和範圍內所做的均等變化或潤飾，仍屬本發明涵蓋之範圍，而本發明之專利保護範圍須視本說明書所附之請求項所界定者為準。The present invention has been disclosed through the above-mentioned embodiments, which are only part of the preferred embodiments of the present invention, but they are not intended to limit the present invention. Any person familiar with this technical field who has ordinary knowledge can understand the aforementioned technology of the present invention Features and embodiments, and equivalent changes or modifications made without departing from the spirit and scope of the present invention still fall within the scope of the present invention, and the scope of patent protection of the present invention must be defined by the appended claims of this specification Whichever prevails.

1:肺部結節輔助判讀系統 10:推論伺服器 11:第一推論模型 12:第二推論模型 13:肺葉分割模型 20:影像伺服器 30:報告輔助資訊子系統 31:輸入模組 32:顯示模組 33:輸出模組 2:儲存裝置 3:電腦斷層掃描儀 4:放射科資訊系統 5:醫療資訊系統 A:固定維度層 B:可變維度層 C:變維暨上採樣層 1: Pulmonary nodule auxiliary interpretation system 10: Inference Server 11: First inference model 12: The second inference model 13: Lung Lobe Segmentation Model 20: Image server 30: Report auxiliary information subsystem 31: Input module 32: Display module 33: Output module 2: storage device 3:Computed tomography scanner 4: Radiology Information System 5: Medical Information System A: Fixed dimension layer B: variable dimension layer C: variable dimension and upsampling layer

圖1為本發明實施例之肺部結節輔助判讀系統架構圖。Fig. 1 is a structure diagram of an auxiliary interpretation system for pulmonary nodules according to an embodiment of the present invention.

圖2為本發明實施例之推論伺服器中之影像推論處理流程圖。FIG. 2 is a flow chart of image inference processing in an inference server according to an embodiment of the present invention.

圖3A為本發明實施例之報告輔助資訊子系統之結節輔助偵測介面示意圖。圖3B為本發明實施例之推論結節資料列表示意圖。3A is a schematic diagram of the nodule auxiliary detection interface of the report auxiliary information subsystem of the embodiment of the present invention. FIG. 3B is a schematic diagram of a list of inferred nodule data according to an embodiment of the present invention.

圖4A為本發明實施例之第一推論模型之資料結構流程示意圖。圖4B為本發明實施例之第二推論模型之資料結構流程示意圖。FIG. 4A is a schematic flowchart of the data structure of the first inference model of the embodiment of the present invention. FIG. 4B is a schematic flowchart of the data structure of the second inference model of the embodiment of the present invention.

圖5A為本發明實施例之固定維度層流程示意圖。圖5B為本發明實施例之可變維度層流程示意圖。圖5C為本發明實施例之變維暨上採樣層流程示意圖。FIG. 5A is a schematic flow diagram of a fixed dimension layer according to an embodiment of the present invention. FIG. 5B is a schematic flow diagram of a variable dimension layer according to an embodiment of the present invention. FIG. 5C is a schematic flow diagram of the variable dimension and upsampling layer according to the embodiment of the present invention.

1:肺部結節輔助判讀系統 1: Pulmonary nodule auxiliary interpretation system

10:推論伺服器 10: Inference Server

11:第一推論模型 11: First inference model

12:第二推論模型 12: The second inference model

13:肺葉分割模型 13: Lung Lobe Segmentation Model

20:影像伺服器 20: Image server

30:報告輔助資訊子系統 30: Report auxiliary information subsystem

31:輸入模組 31: Input module

32:顯示模組 32: Display module

33:輸出模組 33: Output module

2:儲存裝置 2: storage device

3:電腦斷層掃描儀 3:Computed tomography scanner

4:放射科資訊系統 4: Radiology Information System

5:醫療資訊系統 5: Medical Information System

Claims

A pulmonary nodule aided interpretation system, comprising: an inference server, which receives a group of lung computed tomography images and inputs them to a plurality of inference models to obtain a plurality of nodule data, each of which contains a prediction frame information And a corresponding prediction probability information, delete the nodule data corresponding to the overlapping prediction frame information to output a deduced nodule data and the group of lung computed tomography images, wherein the deduced models include: a first deduced model is A region proposal network model with default anchor boxes and trained using computed tomography images of labeled solid lung nodules; and a second inference model is a region proposal network model without default anchor boxes and using labeled ground glass lung CT imaging training of nodules and semi-solid pulmonary nodules; an image server that marks nodule detection locations in the set of lung CT images according to the inferred nodule data; and a reporting auxiliary information subsystem that operates on a The host computer is connected to the image server, and includes the following modules: a display module, which displays the marked group of lung computed tomography images and the inferred nodule data; an input module, which receives a user's inference A confirmed nodule data obtained by editing the nodule data; and an output module for copying the confirmed nodule data to a reporting system.

The auxiliary interpretation system as described in Claim 1, wherein the inference models adopt a UNet backbone structure, and each up-sampling and down-sampling structure includes a residual module and a compressed excitation module.

The auxiliary interpretation system as claimed in claim 2, wherein the residual module includes two normalization layers.

The auxiliary interpretation system as described in Claim 1, wherein the group of lung computed tomography images can further perform a window frame setting, a resampling process and a cropping process before inputting the inference models.

The auxiliary interpretation system as described in Claim 1, wherein the inference models further include a lung lobe cut model, and each of the nodule data further includes a lung lobe information.

A method for assisting pulmonary nodule interpretation, comprising: a server host receives a group of lung computed tomography images and inputs them to a plurality of inference models to obtain a plurality of nodule data, each of which includes a prediction frame information and a corresponding One prediction probability information; delete the nodule data corresponding to the repeated prediction frame information to output a deduced nodule data; and mark the nodule detection position in the group of lung computed tomography images according to the deduced nodule data; and a computer host Displaying the group of marked CT images of the lungs and the deduced nodule data, receiving confirmed nodule data edited by a user on the deduced nodule data, and copying the confirmed nodule data to a reporting system; The inference model includes: a first inference model is a region proposal network model with default anchor boxes, and is trained using computed tomography images of labeled solid pulmonary nodules; and a second inference model is a region proposal without default anchor boxes network model and trained using computed tomography images of labeled ground-glass pulmonary nodules and semi-solid pulmonary nodules.

The auxiliary interpretation method as described in Claim 6, wherein the inference models adopt a UNet backbone structure, and a residual module and a compression excitation module are added to each upsampling and downsampling structure.

The auxiliary interpretation method as claimed in claim 7, wherein the residual module includes two normalization layers.

The auxiliary interpretation method as described in Claim 6, further comprising a window frame setting, a re-sampling process and a cropping process before the group of lung computed tomography images are input into the inference models.

The auxiliary interpretation method as described in Claim 6, wherein the inference models further include a lung lobe cut model, and each of the nodule data further includes a lung lobe information.