TWI832032B

TWI832032B - System of generating training data by questions and answers and method thereof

Info

Publication number: TWI832032B
Application number: TW110103801A
Authority: TW
Inventors: 阮柏翰; 鄒奇軒
Original assignee: 台達電子工業股份有限公司
Priority date: 2020-02-07
Filing date: 2021-02-02
Publication date: 2024-02-11
Also published as: TW202131352A; CN113254608A

Abstract

A system of generating training data by questions and answers and a method thereof are provided by the disclosure. A platform-end provides the browse of a target image, receives the operation to adjust a display view of the target image, shows a question used to analyze the display view, gets a response answer, and associates the display view with the question and the response answer to generating the training data used to automatically analyze the target image. The disclosure can automatically prompt the region of interest, and significantly reduce the burden to collect the training data to the user.

Description

System and method for generating training data through question and answer

本發明涉及一種生成訓練資料的系統及其方法，特別是涉及一種透過問答生成訓練資料的系統及其方法。 The present invention relates to a system and method for generating training data, and in particular to a system and method for generating training data through question and answer.

對於病理判讀，目前都是由專家(如病理科醫師)進行人工判讀，於人工判讀過程中，專家必須使用顯微鏡觀看實體玻片，同時使用錄音筆、紙筆等工具手動記錄觀看的結果，最後根據觀看結果撰寫病理報告。 For pathological interpretation, manual interpretation is currently performed by experts (such as pathologists). During the manual interpretation process, the experts must use a microscope to view the physical slides, and at the same time use recording pens, paper and pen and other tools to manually record the viewing results. Finally, Write a pathology report based on the viewing results.

然而，上述方式存在以下問題： However, the above method has the following problems:

1.專家判讀不一致：不同專家對於同一實體玻片通常有不同的閱片角度(即所認定的感興趣區域不同)，這會導致不同專家對於同一實體玻片產生不同想法，最終導致病理報告內容的不一致。 1. Inconsistent interpretations by experts: Different experts usually have different reading angles for the same physical slide (that is, the identified areas of interest are different). This will lead to different experts having different ideas about the same physical slide, and ultimately lead to differences in the content of the pathology report. Inconsistent.

2.多人共同閱讀不易：受限於實體玻片及顯微鏡同時僅容許一人進行觀看，人工判讀無法實現多人共同觀看，也無法確保多人輪流觀看的過程中，都能聚焦於同一個感興趣區域。 2. It is not easy for multiple people to read together: Due to the limitation that the physical slides and microscopes only allow one person to watch at the same time, manual interpretation cannot achieve multiple people watching at the same time, nor can it ensure that multiple people can focus on the same feeling during the viewing process. area of interest.

3.病理特徵的多樣性：隨著醫學技術進展，越來越多的病理特徵被發現。對於快速增加的病理特徵，專家必須於繁重工作之外，提撥大量時間學習新的病理特徵，以正確聚焦病理特徵所在的感興趣區域，而造成專家的嚴重負擔。 3. Diversity of pathological characteristics: With the advancement of medical technology, more and more pathological characteristics have been discovered. For the rapidly increasing number of pathological features, experts must allocate a lot of time to learn new pathological features in addition to heavy work in order to correctly focus on the area of interest where the pathological features are located, which causes a serious burden on the experts.

因此，現有對實體玻片進行人工判讀的方式存在前述問題，而亟待更有效的方案被提出。 Therefore, the existing manual interpretation of physical slides has the aforementioned problems, and a more effective solution is urgently needed.

本發明提供一種透過問答生成訓練資料的系統與方法，可於用戶瀏覽影像期間，提示感興趣區域，並透過問答蒐集感興趣區域的訓練資料。 The present invention provides a system and method for generating training data through question and answer, which can prompt a region of interest to a user during image browsing, and collect training data for the region of interest through question and answer.

本發明提出一種透過問答生成訓練資料的方法，包括以下步驟：於一平台端提供一目標影像的瀏覽；接受操作來調整該目標影像的一顯示視野；呈現用以分析該顯示視野的一問題，並取得一回應答案；及，關聯該顯示視野與對應的一問答集來產生用於自動分析該目標影像的一訓練資料，其中該問答集包括該問題與該回應答案。 The present invention proposes a method for generating training data through question and answer, which includes the following steps: providing browsing of a target image on a platform; accepting operations to adjust a display field of view of the target image; presenting a question for analyzing the display field of view, and obtain a response answer; and, associate the display field of view with a corresponding question and answer set to generate training data for automatically analyzing the target image, wherein the question and answer set includes the question and the response answer.

本發明另提出一種透過問答生成訓練資料的系統，包括一資料庫與一平台端，該資料庫用以儲存一目標影像，該平台端連接該資料庫，並經由網路連接一用戶端，該平台端被配置來於該用戶端提供一瀏覽介面來以瀏覽該目標影像，接受來自該用戶端的操作來調整該目標影像的一顯示視野，呈現用以分析該顯示視野的一問題，取得一回應答案，並關聯該顯示視野與對應的一問答集來產生用於自動分析該目標影像的一訓練資料，其中該問答集包括該問題與該回應答案。 The present invention also proposes a system for generating training data through question and answer, including a database and a platform. The database is used to store a target image. The platform is connected to the database and connected to a client through the network. The platform is configured to provide a browsing interface for the client to browse the target image, accept operations from the client to adjust a display field of view of the target image, present a question for analyzing the display field of view, and obtain a response. The answer is generated by associating the display field of view with a corresponding question and answer set to generate training data for automatically analyzing the target image, wherein the question and answer set includes the question and the response answer.

本發明可自動提示感興趣區域，並大幅降低為了收集訓練資料對用戶造成的負擔。 The invention can automatically prompt areas of interest and greatly reduce the burden on users in collecting training data.

10:平台端 10:Platform side

100:處理模組 100: Processing module

101:儲存模組 101:Storage module

102:通訊模組 102:Communication module

103、110:人機介面 103, 110: Human-computer interface

104:web模組 104:web module

105:管理模組 105: Management module

11:用戶端 11: Client

12:資料庫 12:Database

120:影像庫 120:Image library

121:知識庫 121:Knowledge Base

122:瀏覽記錄 122: Browsing history

123:用戶資料庫 123:User database

13:AI模組 13:AI module

130:學習模型 130: Learning model

131:資料轉換模組 131:Data conversion module

132:訓練模組 132:Training module

20:影像瀏覽與記錄模組 20:Image browsing and recording module

200:影像檢視模組 200:Image viewing module

201:影像操作模組 201:Image operation module

202:影像資訊處理模組 202:Image information processing module

21:知識獲取模組 21: Knowledge acquisition module

210:問答模組 210: Q&A module

211:動作模組 211:Action Module

212:標的模組 212:Target module

213:知識處理與提供模組 213: Knowledge processing and provision module

22:知識紀錄與處理模組 22:Knowledge recording and processing module

220:影像擷取模組 220:Image capture module

221:訓練資料產生模組 221: Training data generation module

30-32:目標影像 30-32: Target image

40-42:顯示基準點 40-42: Display reference point

50-51:範圍 50-51: Range

60-62、70-72、80-81:介面 60-62, 70-72, 80-81: Interface

800-802:區域 800-802: Area

S10-S13:第一訓練資料產生步驟 S10-S13: First training data generation step

S20-S27:第二訓練資料產生步驟 S20-S27: Second training data generation step

S30-S34:顯示步驟 S30-S34: Display steps

S40-S42:瀏覽步驟 S40-S42: Browsing steps

S50-S52:問答步驟 S50-S52: Question and Answer Steps

S60、S70-S73:訓練步驟 S60, S70-S73: training steps

S61、S80-S81:自動分析步驟 S61, S80-S81: Automatic analysis steps

圖1為本發明一實施例的系統的架構圖；圖2為本發明一實施例的平台端的部分架構圖；圖3為本發明一實施例的學習模型的輸出入的示意圖；圖4為本發明一實施例的學習模型的輸出入的示意圖；圖5為本發明一實施例的學習模型的輸出入的示意圖；圖6為本發明一實施例的方法的流程圖；圖7A為本發明一實施例的方法的第一部分流程圖；圖7B為本發明一實施例的方法的第二部分流程圖；圖8為本發明一實施例的方法的訓練與自動分析的流程圖；圖9為本發明一實施例的目標影像的顯示視野的示意圖；圖10為圖9的另一顯示視野的示意圖；圖11為圖9的另一顯示視野的示意圖；圖12為本發明一實施例的動作歷程的示意圖；及圖13為本發明一實施例的熱點圖的示意圖。 Figure 1 is an architecture diagram of the system according to an embodiment of the present invention; Figure 2 is a partial architecture diagram of the platform side according to an embodiment of the present invention; Figure 3 is a schematic diagram of the input and output of the learning model according to an embodiment of the present invention; Figure 4 is a schematic diagram of the input and output of the learning model according to an embodiment of the present invention. Figure 5 is a schematic diagram of the input and output of the learning model according to one embodiment of the present invention; Figure 6 is a flow chart of the method according to one embodiment of the present invention; Figure 7A is a schematic diagram of the input and output of the learning model according to one embodiment of the present invention; The first part of the flow chart of the method of the embodiment; Figure 7B is the second part of the flow chart of the method of an embodiment of the present invention; Figure 8 is a flow chart of the training and automatic analysis of the method of an embodiment of the present invention; Figure 9 is a flow chart of the method of the present invention. Figure 10 is a schematic diagram of another display field of view of Figure 9; Figure 11 is a schematic diagram of another display field of view of Figure 9; Figure 12 is an action process of an embodiment of the present invention. schematic diagram; and FIG. 13 is a schematic diagram of a heat map according to an embodiment of the present invention.

下面結合圖式和具體實施例對本發明技術方案進行詳細的描述，以更進一步瞭解本發明的目的、方案及功效，但並非作為本發明所附申請專利範圍的限制。 The technical solution of the present invention will be described in detail below in conjunction with the drawings and specific embodiments to further understand the purpose, solutions and effects of the present invention, but this is not intended to limit the scope of the patent application attached to the present invention.

人工智慧(AI，Artificial Intelligence)的影像自動判讀必須使用大量的訓練資料來訓練學習模型。 Automatic image interpretation by artificial intelligence (AI) must use a large amount of training data to train the learning model.

於瀏覽影像期間，專家可對大量影像逐張進行人工判讀與特徵標記(labeling)，再依據這些標記產生訓練資料，但這使得專家必須中斷瀏覽並耗費大量時間進行標記與註解。 During image browsing, experts can manually interpret and label a large number of images one by one, and then generate training data based on these labels. However, this requires experts to interrupt browsing and spend a lot of time labeling and annotating.

以判讀醫學影像(如數位病理影像)為例，醫師在判讀醫學影像過程中(如判斷是否有癌細胞)，當找到癌細胞時，必須先中斷判讀來對癌細胞進行標記與註解，若同時在跟病患進行說明或進行教學時，會造成極大的不便。 Take the interpretation of medical images (such as digital pathology images) as an example. During the process of interpreting medical images (such as determining whether there are cancer cells), when a doctor finds cancer cells, he must first interrupt the interpretation to mark and annotate the cancer cells. If at the same time This can cause great inconvenience when explaining or teaching to patients.

對此，本發明主要是提供一種透過問答生成訓練資料的系統與方法。本發明主要原理在於，(以人工或AI技術)預先對待分析影像(即後述之目標影像)的一或多個感興趣區域(即後述之顯示視野)設定與此區域有關的問題。 In this regard, the present invention mainly provides a system and method for generating training data through question and answer. The main principle of the present invention is to (using manual or AI technology) preset one or more areas of interest (ie, the display field of view to be described later) of the image to be analyzed (ie, the target image to be described later) and set questions related to this area.

並且，於用戶(例如是醫師、專家或檢查人員等)判讀目標影像過程中，當瀏覽至已設定問題的感興趣區域時，本發明會自動呈現問題，用戶可以知道目前的顯示視野為感興趣區域，而可以更為專注在目前的顯示視野的觀察。並且，用戶僅需簡單回答與目前區域有關的問題，即能完成此感興趣區域的訓練參數的輸入，由於不會中斷判讀，不會花費過多時間，進而提升用戶進行回饋的意願，而提升訓練資料的精確性。 Moreover, when the user (such as a doctor, expert or examiner) interprets the target image, when browsing to the area of interest for which the question has been set, the present invention will automatically present the question, and the user can know that the current displayed field of view is the area of interest. area, and you can focus more on the current display field of view. Moreover, the user only needs to simply answer questions related to the current area to complete the input of training parameters for this area of interest. Since the interpretation will not be interrupted, it will not take too much time, thus increasing the user's willingness to give feedback and improving training. Accuracy of data.

並且，本發明由於免除了用戶手動標記與註解，可以大幅降低用戶負擔，讓用戶可以更為專心進行判讀，而提升判讀的精確性。 Moreover, since the present invention eliminates the need for manual marking and annotation by the user, it can significantly reduce the user's burden, allowing the user to concentrate more on interpretation, thereby improving the accuracy of interpretation.

並且，本發明經由提出與目前瀏覽區域有關的問題，可讓用戶在回答問題時，注意到此問題有關的觀察點，而提升判讀的精確性。 Furthermore, by raising questions related to the current browsing area, the present invention allows the user to pay attention to the observation points related to the question when answering the question, thereby improving the accuracy of interpretation.

請參閱圖1，為本發明一實施例的系統的架構圖。本發明的透過問答生成訓練資料的系統可包括平台端10(例如是伺服器、雲端服務平台、桌機、筆記型電腦等通用電腦或其任意組合)、資料庫12(例如是網路資料庫、本地資料庫、關聯式資料庫等資料庫或其組合)與AI模組13。 Please refer to FIG. 1 , which is an architectural diagram of a system according to an embodiment of the present invention. The system for generating training data through question and answer of the present invention may include a platform 10 (for example, a server, a cloud service platform, a desktop General-purpose computers such as computers and laptops or any combination thereof), database 12 (for example, a network database, a local database, a relational database and other databases or a combination thereof) and the AI module 13.

平台端10可包括儲存模組101、通訊模組102、人機介面103與處理模組100。 The platform 10 may include a storage module 101, a communication module 102, a human-machine interface 103 and a processing module 100.

儲存模組101(例如是RAM、EEPROM、固態硬碟、磁碟硬碟、快閃記憶體等儲存裝置或其任意組合)用以儲存資料。通訊模組102(例如是網路介面卡，NIC)用以連接網路(如網際網路)，並透過網路與外部設備(如資料庫12、AI模組13及/或用戶端11)通訊。人機介面103(包括輸入介面與輸出介面，如滑鼠、鍵盤、各式按鍵、觸控板、顯示器、觸控螢幕、投影模組等)用以與用戶進行互動。處理模組100(可為CPU、GPU、TPU、MCU等處理器或其任意組合)，用以控制平台端並實現本發明所提出之功能。 The storage module 101 (for example, a storage device such as RAM, EEPROM, solid state drive, magnetic hard drive, flash memory or any combination thereof) is used to store data. The communication module 102 (such as a network interface card, NIC) is used to connect to a network (such as the Internet) and communicate with external devices (such as the database 12, AI module 13 and/or client 11) through the network Communication. The human-machine interface 103 (including input interface and output interface, such as mouse, keyboard, various buttons, touch pad, display, touch screen, projection module, etc.) is used to interact with the user. The processing module 100 (which can be a processor such as CPU, GPU, TPU, MCU or any combination thereof) is used to control the platform and implement the functions proposed by the present invention.

AI模組13可建置於伺服器或雲端服務平台(例如是Amazon Web Service、Google Cloud Platform或Microsoft Azure等)，或建置於平台端10。AI模組13包括學習模型130。本發明所產生的訓練資料即是用來建立並訓練學習模型130，以提升學習模型130的精確度。 The AI module 13 can be built on a server or cloud service platform (such as Amazon Web Service, Google Cloud Platform or Microsoft Azure, etc.), or on the platform side 10 . AI module 13 includes learning model 130 . The training data generated by the present invention is used to establish and train the learning model 130 to improve the accuracy of the learning model 130.

學習模型130是基於機器學習技術所建立並接受訓練，可自動分析輸入影像(如後述之目標影像)進行來產生輸入影像的特定顯示視野的預測知識資訊。前述預測知識資訊即可作為用戶分析輸入影像時的參考資訊。 The learning model 130 is established and trained based on machine learning technology, and can automatically analyze the input image (such as the target image described below) to generate predictive knowledge information for a specific display field of view of the input image. The aforementioned prediction knowledge information can be used as reference information when the user analyzes the input image.

於一實施例中，學習模型130包括VQA(Visual Question Answering，視覺問答)架構模型，能夠基於輸入影像的影像特徵(顯示視野)對文字特徵(問答集、標的資訊等)進行訓練。 In one embodiment, the learning model 130 includes a VQA (Visual Question Answering, visual question answering) architecture model, which can train text features (question and answer sets, target information, etc.) based on the image features (display field of view) of the input image.

於另一實施例中，學習模型130包括DQN(Deep Q-Learning，深度Q學習)架構模型，能夠參考用戶瀏覽影像的流程，在每次顯示視野調整後，對於標的的量化數值(標的資訊)進行調整，並在最終產生成AI自動瀏覽影像的預測標的資訊。具體而言，可先透過DQN架構模型產生對於輸入影像每個瀏覽動作(可以是用戶輸入的瀏覽動作或是自動產生的預測瀏覽動作)的分數(Q-Value)，依據分數最高的瀏覽動作模擬對輸入影像的瀏覽以變換顯示視野，並將變換後的顯示視野的影像再次輸入至DQN架構模型，以獲得下一瀏覽動作與其顯示視野，以此類推，直到停止瀏覽動作，並可基於顯示視野(可以是所變換的任一個顯示視野或是最後的顯示視野)的影像分析預測標的資訊。 In another embodiment, the learning model 130 includes a DQN (Deep Q-Learning, Deep Q Learning) architecture model, which can refer to the user's image browsing process and obtain the quantified value of the target (target information) after each display field of view adjustment. Make adjustments and finally generate an AI automatically browsed image Forecast target information. Specifically, the DQN architecture model can first be used to generate a score (Q-Value) for each browsing action of the input image (which can be a browsing action input by the user or an automatically generated predicted browsing action), and the simulation is based on the browsing action with the highest score. Browse the input image to change the display field of view, and re-input the transformed display field of view image to the DQN architecture model to obtain the next browsing action and its display field of view, and so on until the browsing action is stopped and the display field of view can be based on (It can be any converted display field of view or the last display field of view) image analysis prediction target information.

舉例來說，請參閱圖3至圖5，圖3為本發明一實施例的學習模型的輸出入的示意圖，圖4為本發明一實施例的學習模型的輸出入的示意圖，圖5為本發明一實施例的學習模型的輸出入的示意圖。 For example, please refer to Figures 3 to 5. Figure 3 is a schematic diagram of the input and output of the learning model according to an embodiment of the present invention. Figure 4 is a schematic diagram of the input and output of the learning model according to an embodiment of the present invention. Figure 5 is a schematic diagram of the input and output of the learning model according to an embodiment of the present invention. A schematic diagram of the input and output of a learning model according to an embodiment of the invention.

於一實施例中，如圖3所示，當將目標影像的一顯示視野的影像(即目標影像的特定子影像)輸入至學習模型130後，學習模型130可自動分析並產生預測標的資訊(如透過DQN架構模型)。前述標的資訊是對顯示視野的影像特徵進行分析後生成的分析結果，並用來描述此影像特徵的特性，如影像特徵的類型(例如是細胞的病變種類)、範圍(如所占比例)或程度(例如是嚴重、輕微、不影響)等。 In one embodiment, as shown in FIG. 3 , after an image displaying the field of view of the target image (ie, a specific sub-image of the target image) is input to the learning model 130 , the learning model 130 can automatically analyze and generate predicted target information ( Such as through the DQN architecture model). The aforementioned target information is the analysis result generated by analyzing the image features of the display field of view, and is used to describe the characteristics of this image feature, such as the type of image feature (such as the type of cell lesions), range (such as proportion) or degree (For example, serious, minor, no impact), etc.

於一實施例中，如圖4所示，當將目標影像的一顯示視野的影像與此顯示視野所對應的問題輸入至學習模型130後，學習模型130可自動分析並產生預測答案(如透過VQA架構模型)。更進一步地，學習模型130還可產生有關於此顯示視野的新問題，藉以於用戶下次瀏覽相同顯示視野時，可以回答問題(包括新問題)來補強學習模型130對於此顯示視野的訓練完整度。 In one embodiment, as shown in FIG. 4 , after an image of a display field of view of the target image and a question corresponding to the display field of view are input to the learning model 130 , the learning model 130 can automatically analyze and generate a predicted answer (such as through VQA architecture model). Furthermore, the learning model 130 can also generate new questions about this display field of view, so that when the user browses the same display field of view next time, he can answer questions (including new questions) to strengthen the training of the learning model 130 for this display field of view. Spend.

於一實施例中，如圖5所示，當將目標影像的一顯示視野的影像輸入至學習模型130後，學習模型130可自動分析並產生預測動作歷程(如透過DQN架構模型)。具體而言，可藉由於學習模型130的訓練過程中添加的一或多個用戶對此目標影像的動作歷程(即對目標影像的的多個瀏覽動作的組合)，讓學習模型130可以歸納出適合此目標影像的瀏覽方式並提供經驗不足的用戶做為參考。 In one embodiment, as shown in FIG. 5 , after an image of a display field of view of the target image is input to the learning model 130 , the learning model 130 can automatically analyze and generate a predicted action process (such as through a DQN architecture model). Specifically, the action history of one or more users on the target image (that is, the combination of multiple browsing actions on the target image) added during the training process of the learning model 130 can be used to obtain The learning model 130 can summarize the browsing method suitable for this target image and provide inexperienced users as a reference.

請一併參閱圖1、圖2與6，圖2為本發明一實施例的平台端的部分架構圖，圖6為本發明一實施例的方法的流程圖。 Please refer to Figures 1, 2 and 6 together. Figure 2 is a partial architecture diagram of the platform side according to an embodiment of the present invention, and Figure 6 is a flow chart of a method according to an embodiment of the present invention.

於一實施例中，處理模組100可包括模組104-105、20-22、200-202、210-213、220-221，AI模組13可包括模組131-132。這些模組分別被設定來執行不同的功能。 In one embodiment, the processing module 100 may include modules 104-105, 20-22, 200-202, 210-213, and 220-221, and the AI module 13 may include modules 131-132. These modules are each configured to perform different functions.

前述模組是相互連接(可為電性連接與資訊連接)，並可為硬體模組(例如是電子電路模組、積體電路模組、SoC等等)、軟體模組(例如是韌體、作業系統或應用程式)或軟硬體模組混搭，不加以限定。 The aforementioned modules are interconnected (can be electrical connections and information connections), and can be hardware modules (such as electronic circuit modules, integrated circuit modules, SoC, etc.), software modules (such as firmware modules, etc.) body, operating system or application) or a mix and match of software and hardware modules, without limitation.

值得一提的是，當前述模組為軟體模組(例如是韌體、作業系統或應用程式)時，儲存模組101與AI模組13可包括非暫態電腦可讀取記錄媒體，前述非暫態電腦可讀取記錄媒體儲存有電腦程式，電腦程式記錄有電腦可執行之程式碼，當處理模組100與AI模組13執行前述程式碼後，可實現對應模組之功能。 It is worth mentioning that when the aforementioned module is a software module (such as firmware, operating system or application), the storage module 101 and the AI module 13 may include a non-transitory computer-readable recording medium, as mentioned above The non-transitory computer-readable recording medium stores computer programs, and the computer programs record computer-executable program codes. When the processing module 100 and the AI module 13 execute the aforementioned program codes, the functions of the corresponding modules can be realized.

本實施例的透過問答生成訓練資料的方法包括以下步驟。 The method of generating training data through question and answer in this embodiment includes the following steps.

步驟S10：於平台端10透過影像瀏覽與記錄模組20提供目標影像的瀏覽。前述目標影像可儲存於資料庫12或由用戶上傳。 Step S10: Provide browsing of the target image through the image browsing and recording module 20 on the platform end 10. The aforementioned target images may be stored in the database 12 or uploaded by the user.

具體而言，平台端10可提供瀏覽介面(如GUI)，並於瀏覽介面中顯示目標影像。 Specifically, the platform 10 can provide a browsing interface (such as a GUI) and display the target image in the browsing interface.

於一實施例中，用戶可操作用戶端11(例如是桌機、筆記型電腦、平板、智慧型手機等通用電腦)的人機介面110(與人機介面103相似，於此不再贅述)來連接平台端10，以使用平台端10所提供的瀏覽服務來瀏覽目標影像。 In one embodiment, the user can operate the human-machine interface 110 (similar to the human-machine interface 103 and will not be described again here) of the client 11 (for example, a desktop computer, a notebook computer, a tablet, a smartphone, or other general-purpose computers). To connect to the platform terminal 10, to use the browsing service provided by the platform terminal 10 to browse the target image.

於一實施例中，用戶可直接操作平台端10的人機介面103來使用瀏覽服務。 In one embodiment, the user can directly operate the human-machine interface 103 of the platform 10 to use the browsing service.

步驟S11：平台端10透過影像瀏覽與記錄模組20接受用戶的瀏覽操作(可為本地操作或遠端操作)來調整目標影像的顯示視野，如放大、縮小、平移、旋轉等。 Step S11: The platform 10 accepts the user's browsing operation (which can be a local operation or a remote operation) through the image browsing and recording module 20 to adjust the display field of view of the target image, such as zooming in, zooming out, panning, rotating, etc.

步驟S12：於變更顯示視野後，平台端10透過知識獲取模組21取得目前的顯示視野所對應的問題，並呈現此問題。前述問題是用來分析目前的顯示視野。 Step S12: After changing the display field of view, the platform 10 obtains the problem corresponding to the current display field of view through the knowledge acquisition module 21, and presents the problem. The aforementioned questions are used to analyze the current display field of view.

接著，用戶閱讀完問題後，可參考目前的顯示視野與過往經驗，快速輸入回應答案以回答此問題，而使得平台端10取得此問題的回應答案。藉此，本發明即可獲得目前的顯示視野的一組問答集(包括問題與回應答案)，即獲得目前的顯示視野的訓練參數。 Then, after the user has finished reading the question, he or she can quickly input a response answer to answer the question by referring to the current display field of view and past experience, so that the platform terminal 10 obtains the response answer to the question. Through this, the present invention can obtain a set of question and answer sets (including questions and responses) of the current display field of view, that is, obtain the training parameters of the current display field of view.

步驟S13：平台端透過知識記錄與處理模組22關聯顯示視野與其對應的問答集，來產生訓練資料。前述訓練資料是用來提供給學習模型130進行自動分析目標影像的訓練。 Step S13: The platform associates the display field of view with its corresponding question and answer set through the knowledge recording and processing module 22 to generate training data. The aforementioned training data is used to provide the learning model 130 for training in automatically analyzing target images.

藉此，用戶僅需於目標影像的瀏覽過程中簡單回答問題，即可供本發明自動建立此目標影像的訓練資料。 In this way, the user only needs to simply answer questions during the browsing process of the target image, and the present invention can automatically create training data for the target image.

請一併參閱圖1-2、圖7A-7B、圖9-11，圖7A為本發明一實施例的方法的第一部分流程圖，圖7B為本發明一實施例的方法的第二部分流程圖，圖9為本發明一實施例的目標影像的顯示視野的示意圖，圖10為圖9的另一顯示視野的示意圖，圖11為圖9的另一顯示視野的示意圖。 Please refer to Figures 1-2, Figures 7A-7B, and Figures 9-11 together. Figure 7A is the first part of the flow chart of the method according to one embodiment of the present invention, and Figure 7B is the second part of the flow chart of the method according to one embodiment of the present invention. FIG. 9 is a schematic diagram of a display field of view of a target image according to an embodiment of the present invention. FIG. 10 is a schematic diagram of another display field of view of FIG. 9 . FIG. 11 is a schematic diagram of another display field of view of FIG. 9 .

於本實施例中，平台端10透過web模組104(如網頁伺服器模組)來提供網頁瀏覽服務以與用戶端11進行互動。 In this embodiment, the platform 10 provides web browsing services through a web module 104 (such as a web server module) to interact with the client 11 .

步驟S20：平台端10透過web模組104來於用戶端11提供瀏覽介面來以提供瀏覽目標影像的網頁服務。 Step S20: The platform 10 provides a browsing interface to the client 11 through the web module 104 to provide a web service for browsing target images.

於一實施例中，步驟S20可包括以下步驟S30-S34。 In one embodiment, step S20 may include the following steps S30-S34.

步驟S30：用戶端11透過網頁登入平台端10。 Step S30: The client 11 logs into the platform 10 through the web page.

於一實施例中，資料庫12可包括用戶資料庫123，用戶資料庫123用以儲存不同用戶(或用戶端11)的註冊資料(例如是帳號密碼、網路位址、硬體位址、裝置識別碼、數位簽章、Dongle Key等可識別資訊)。用戶可操作用戶端11來提供登入資料(例如是帳號密碼、網路位址、硬體位址、裝置識別碼、數位簽章、Dongle Key等可識別資訊)至平台端10。 In one embodiment, the database 12 may include a user database 123. The user database 123 is used to store registration information (such as account passwords, network addresses, hardware addresses, devices) of different users (or clients 11). Identification code, digital signature, Dongle Key and other identifiable information). The user can operate the client 11 to provide login information (such as account password, network address, hardware address, device identification code, digital signature, Dongle Key and other identifiable information) to the platform 10 .

接著，平台端10透過管理模組105比對登入資料與各註冊資料，並於登入資料與任一註冊資料相符時判定通過驗證並允許登入。 Then, the platform end 10 compares the login information with each registration information through the management module 105, and determines that the verification is passed and login is allowed when the login information matches any registration information.

步驟S31：於成功登入後，平台端10透過web模組104提供瀏覽介面(如網頁程式)至用戶端11以呈現瀏覽介面。 Step S31: After successful login, the platform 10 provides a browsing interface (such as a web program) to the client 11 through the web module 104 to present the browsing interface.

於一實施例中，資料庫12包括影像庫120，影像庫120儲存有多張影像，如電子元件影像、醫學影像、監視器影像或其他需要判讀的影像。 In one embodiment, the database 12 includes an image library 120 that stores multiple images, such as electronic component images, medical images, monitor images, or other images that need to be interpreted.

步驟S32：用戶端11可透過網頁自影像庫120中選擇多個影像的其中之一作為目標影像(即接下來要判讀的影像)，或者上傳影像至平台端10作為目標影像。 Step S32: The client 11 can select one of the multiple images from the image library 120 as the target image (ie, the image to be interpreted next) through the web page, or upload the image to the platform 10 as the target image.

接著，平台端10透過影像檢視模組200匯入此目標影像。 Then, the platform 10 imports the target image through the image viewing module 200 .

步驟S33：平台端10透過影像資訊處理模組202依據預設的視野參數(如預設的顯示基準點、縮放等級及/或旋轉角度)來決定要呈現的目標影像的顯示視野，即決定要呈現的目標影像的範圍。 Step S33: The platform 10 uses the image information processing module 202 to determine the display field of view of the target image to be presented based on the preset field of view parameters (such as the preset display reference point, zoom level and/or rotation angle), that is, determines the display field of view to be displayed. The range of target images presented.

前述顯示視野的顯示基準點可為影像座標(如像素位置)，是表示此顯示視野的基準點(如影像中心點或任一影像邊界點)於整張目標影像中的位置。前述顯示視野的縮放等級是表示此顯示視野的目前的放大或縮小的等級(此數值會依據不同的解析度的影像，依比例對應到實際放大或縮小的倍率)。前述顯示視野的旋轉角度是表示此顯示視野相對於目標影像的角度差。 The display reference point of the aforementioned display field of view can be an image coordinate (such as a pixel position), which represents the position of the reference point of this display field of view (such as the image center point or any image boundary point) in the entire target image. Set. The aforementioned zoom level of the display field of view represents the current magnification or reduction level of the display field of view (this value will correspond to the actual magnification or reduction ratio in proportion to the images of different resolutions). The aforementioned rotation angle of the display field of view represents the angle difference between the display field of view and the target image.

舉例來說，平台端10可被設定為：縮放等級為1時，實際放大倍率為5倍；縮放等級為5時，實際放大倍率為40倍。 For example, the platform 10 can be set as follows: when the zoom level is 1, the actual magnification is 5 times; when the zoom level is 5, the actual magnification is 40 times.

於另一例子，平台端10可被設定為：縮放等級為1時，實際放大倍率為0.25倍(如縮小以使整張目標影像進入顯示視野)；縮放等級為5時，實際放大倍率為1倍(如等比例瀏覽)。 In another example, the platform 10 can be set as follows: when the zoom level is 1, the actual magnification is 0.25 times (such as zooming out so that the entire target image enters the display field of view); when the zoom level is 5, the actual magnification is 1 times (such as equal-proportion browsing).

本發明之縮放等級與實際放大倍率之間的關係(如比例)可視需求而定，不加以限定。 The relationship (such as the ratio) between the zoom level and the actual magnification of the present invention can be determined according to the requirements and is not limited.

藉此，基於顯示基準點與縮放等級，或顯示基準點、縮放等級與旋轉角度，影像資訊處理模組202可決定視野範圍，如以包圍框(bounding box)計算視野範圍。 Thereby, based on the display reference point and zoom level, or the display reference point, zoom level and rotation angle, the image information processing module 202 can determine the field of view range, such as calculating the field of view range using a bounding box.

步驟S34：平台端10透過網路傳送所決定的顯示視野的影像傳送至用戶端11，以於瀏覽介面上呈現此顯示視野。 Step S34: The platform 10 transmits the image of the determined display field of view to the client 11 through the network, so as to present the display field of view on the browsing interface.

步驟S21：用戶端11透過瀏覽介面調整目標影像的顯示視野。 Step S21: The client 11 adjusts the display field of view of the target image through the browsing interface.

於一實施例中，步驟S21可包括以下步驟S40-S42。 In one embodiment, step S21 may include the following steps S40-S42.

步驟S40：用戶端11透過瀏覽介面輸入瀏覽操作來調整顯示視野，平台端10透過影像操作模組201蒐集用戶的瀏覽操作，並轉換為各視野參數的調整量(即決定調整後的視野參數)。 Step S40: The user terminal 11 inputs browsing operations through the browsing interface to adjust the display field of view. The platform terminal 10 collects the user's browsing operations through the image operation module 201 and converts them into adjustment amounts of each field of view parameter (i.e., determines the adjusted field of view parameters). .

於一實施例中，瀏覽介面可顯示各種操作按鈕(如左移、右移、上移、下移、放大、縮小、順時針旋轉、逆時針旋轉等)，影像操作模組201監控這些操作按鈕來蒐集用戶的瀏覽操作。 In one embodiment, the browsing interface can display various operation buttons (such as left move, right move, up move, down move, zoom in, zoom out, clockwise rotation, counterclockwise rotation, etc.), and the image operation module 201 monitors these operation buttons. To collect user browsing operations.

於一實施例中，影像操作模組201可監控用戶端11的鍵盤或滑鼠動作來蒐集用戶的瀏覽操作(如監測滑鼠或鍵盤訊號來對應觸發調整放大等級(以縮放倍率)、平移顯示基準點與旋轉影像)。 In one embodiment, the image operation module 201 can monitor the keyboard or mouse movements of the client 11 to collect the user's browsing operations (such as monitoring the mouse or keyboard signals to adjust the magnification level (in zoom ratio), panning display, etc. datum point and rotated image).

步驟S41：平台端10透過影像資訊處理模組202基於調整後的視野參數決定新的顯示視野，即決定要呈現的目標影像的新範圍。 Step S41: The platform 10 determines a new display field of view based on the adjusted field of view parameters through the image information processing module 202, that is, determines a new range of the target image to be presented.

步驟S42：平台端10透過網路輸出調整後的顯示視野的影像傳送至用戶端11，以於瀏覽介面上呈現調整後的顯示視野。 Step S42: The platform terminal 10 outputs the image of the adjusted display field of view through the network and sends it to the client 11, so as to present the adjusted display field of view on the browsing interface.

於一實施例中，平台端10於傳輸顯示視野的影像至用戶端11前，可先進行影像壓縮(尤其是失真影像壓縮)來適當地減少影像細節並降低解析度，藉以適度減少影像的大小而有利於網路傳輸。 In one embodiment, before transmitting the image of the display field of view to the user end 11, the platform 10 may first perform image compression (especially distortion image compression) to appropriately reduce image details and resolution, thereby appropriately reducing the size of the image. And it is beneficial to network transmission.

值得一提的是，當目標影像的過大(如解析度過高)時，傳送整張目標影像至用戶端11必須耗費大量傳輸時間，且用戶端11的可能因為硬體效能不足，無法順暢地瀏覽整張目標影像。 It is worth mentioning that when the target image is too large (such as the resolution is too high), it will take a lot of transmission time to send the entire target image to the client 11, and the client 11 may not be able to smoothly transmit the target image due to insufficient hardware performance. Browse the entire target image.

對此，本發明僅傳送顯示視野的影像(並可適度壓縮)至用戶端11，可確保用戶端11順暢地瀏覽整張目標影像。並且，當用戶端11希望觀看影像細節時，可執行放大的瀏覽操作，來使平台端10回傳區域放大後的影像以供順暢瀏覽，而可提升用戶體驗。 In this regard, the present invention only transmits the image of the display field of view (and can be moderately compressed) to the user terminal 11, which can ensure that the user terminal 11 can browse the entire target image smoothly. Moreover, when the user terminal 11 wishes to view the details of the image, it can perform a magnified browsing operation, so that the platform terminal 10 returns the enlarged image of the area for smooth browsing, which can improve the user experience.

於一實施例中，目標影像可被分割為多個圖磚，平台端10是以圖磚作為最小的傳送單位。 In one embodiment, the target image can be divided into multiple tiles, and the platform end 10 uses tiles as the smallest transmission unit.

舉例來說，如圖9所示，顯示基準點是影像中心，目標影像30分割為48個圖磚，且各圖磚有對應的識別碼(如座標)。 For example, as shown in Figure 9, the display reference point is the center of the image, the target image 30 is divided into 48 tiles, and each tile has a corresponding identification code (such as coordinates).

並且，預設的視野參數的顯示視野可以是全範圍，即顯示基準點40為(4,3)，放大等級為1，旋轉角度為0度。平台端10會將顯示視野(全範圍) 中的所有圖磚(共48個)傳送至用戶端11，藉以於瀏覽介面顯示完整的目標影像30。 Moreover, the display field of view of the preset field of view parameter may be the full range, that is, the display reference point 40 is (4,3), the magnification level is 1, and the rotation angle is 0 degrees. Platform 10 will display field of view (full range) All the tiles (48 in total) are sent to the client 11, so that the complete target image 30 is displayed in the browsing interface.

接著，當用戶希望將顯示視野調整至範圍50時，可輸入瀏覽操作。如圖10所示，調整後的顯示基準點41為(2.5,3.5)，放大等級為2，旋轉角度為0度。平台端10會將調整後的顯示視野中的所有圖磚(共15個)傳送至用戶端11，藉以於瀏覽介面顯示調整後的顯示視野。 Then, when the user wants to adjust the display field of view to the range 50, he or she can input a browsing operation. As shown in Figure 10, the adjusted display reference point 41 is (2.5, 3.5), the magnification level is 2, and the rotation angle is 0 degrees. The platform terminal 10 will transmit all the tiles (a total of 15 tiles) in the adjusted display field of view to the client 11, so as to display the adjusted display field of view in the browsing interface.

接著，當用戶希望將顯示視野調整至範圍51時，可輸入瀏覽操作。如圖11所示，調整後的顯示基準點42為(1.5,4.5)，放大等級為3，旋轉角度為0度。平台端10會將調整後的顯示視野中的所有圖磚(共6個)傳送至用戶端11，藉以於瀏覽介面顯示調整後的顯示視野。 Next, when the user wishes to adjust the display field of view to the range 51, a browsing operation can be input. As shown in Figure 11, the adjusted display reference point 42 is (1.5, 4.5), the magnification level is 3, and the rotation angle is 0 degrees. The platform terminal 10 will transmit all the tiles (a total of 6) in the adjusted display field of view to the client 11, so as to display the adjusted display field of view in the browsing interface.

藉由上述方式，本發明可有效提供目標影像的瀏覽服務。 Through the above method, the present invention can effectively provide browsing services of target images.

接著執行步驟S22：平台端10透過知識獲取模組21判斷針對目前的顯示視野是否有對應的問題需讓用戶回答，若有的話則執行問答功能，若沒有對應的問題，則不執行問答。 Next, step S22 is performed: the platform 10 determines through the knowledge acquisition module 21 whether there are corresponding questions for the user to answer in the current display field of view. If so, the question and answer function is executed. If there is no corresponding question, the question and answer function is not executed.

於一實施例中，資料庫12包括知識庫121。知識庫121可儲存對應目標影像的不同的顯示視野的知識資訊。前述知識資訊可包括多個待回答問題、多個問答集(已回答問題)、多個標的資訊及/或動作資訊等。 In one embodiment, the database 12 includes a knowledge base 121 . The knowledge base 121 may store knowledge information corresponding to different display views of the target image. The aforementioned knowledge information may include multiple questions to be answered, multiple question and answer sets (answered questions), multiple target information and/or action information, etc.

前述各問題是針對所對應的顯示視野的影像特徵(如顯示視野中的物件的形狀、種類、顏色、變化等特徵)所預先設定的，或者透過機器學習自動產生的(如圖4)，並且是用來分析影像特徵。 Each of the aforementioned questions is preset for the image characteristics of the corresponding display field of view (such as the shape, type, color, change and other characteristics of objects in the display field of view), or is automatically generated through machine learning (as shown in Figure 4), and It is used to analyze image features.

於一實施例中，各顯示視野被關聯至一組識別碼(可基於所涵蓋的圖磚的識別碼來加以決定，基於對應的視野參數所決定，或依序編號等，不加以限定)。並且，與此顯示視野有關的知識資訊都被對應至相同的識別碼。 In one embodiment, each display field of view is associated with a set of identification codes (which can be determined based on the identification codes of the covered tiles, determined based on the corresponding field of view parameters, or sequentially numbered, etc., without limitation). Moreover, the knowledge information related to the display field of view is all mapped to the same identification code.

步驟S40：平台端10透過問答模組210來基於目前的顯示視野的識別碼自知識庫121取得對應的問題。 Step S40: The platform 10 uses the question and answer module 210 to obtain the corresponding question from the knowledge base 121 based on the identification code of the current display field of view.

步驟S41：平台端10透過問答模組210於用戶端11的瀏覽介面顯示所取得的問題，可提供回答介面來供用戶作答。 Step S41: The platform 10 displays the obtained questions on the browsing interface of the client 11 through the question and answer module 210, and provides an answer interface for the user to answer.

步驟S42：平台端10透過問答模組210透過回答介面接收用戶輸入的回應答案。 Step S42: The platform 10 receives the response answer input by the user through the answer interface through the question and answer module 210.

舉例來說，請參閱圖10，當用戶瀏覽到圖10的顯示視野時，問答模組210可顯示問答介面60或61。 For example, please refer to Figure 10. When the user browses to the display view of Figure 10, the question and answer module 210 can display the question and answer interface 60 or 61.

以問答介面60為例，問題為開放式問題(如問答題)，問答介面60包括文字輸入區，用戶是於文字輸入區中輸入文字內容作為回應答案。 Taking the question and answer interface 60 as an example, the question is an open-ended question (such as a question and answer question). The question and answer interface 60 includes a text input area, and the user inputs text content in the text input area as a response answer.

其他開放式問答可例如為：「問：影像中哪個器官系統異常？答：心血管」、「問：此處以細胞角蛋白7染色的膽管細胞與赫氏小管是什麼？答：免疫組織化學染色」、「問：圖中電子元件如何連接？答：錫焊」等，不加以限定。 Other open-ended questions and answers could be, for example: "Q: Which organ system is abnormal in the image? A: Cardiovascular", "Q: What are the bile duct cells and Heschmann's tubules stained with cytokeratin 7 here? A: Immunohistochemistry staining ”, “Q: How are the electronic components in the picture connected? Answer: Soldering”, etc., are not limited.

以問答介面61為例，問題為封閉式問題(如選擇題)，問答介面60包括多個答案選項，用戶是選擇一或多個答案選項作為回應答案。 Taking the question and answer interface 61 as an example, the question is a closed question (such as a multiple-choice question). The question and answer interface 60 includes multiple answer options, and the user selects one or more answer options as a response answer.

其他封閉式問答可例如為：「問：是否組織蛋白亞基帶正電而使帶負電的DNA更緊密穩定？選項：是；否」、「問：這裡是腦栓塞區域嗎？選項：是；否」、「問：圖中有幾個電子元件？選項：1個；2個；3個」等，不加以限定。 Other closed-ended questions and answers may be, for example: "Q: Do histone subunits have positive charges that make negatively charged DNA more compact and stable? Options: Yes; No", "Q: Is this a cerebral embolism area? Options: Yes; No ", "Q: How many electronic components are there in the picture? Options: 1; 2; 3", etc., without limitation.

於一實施例中，本發明進一步提供標的設定功能，具體而言，於任一顯示視野下，步驟S23：平台端10透過標的模組212可以輸入介面來接受用戶端11觀看目前的顯示視野後所設定的標的資訊。前述標的資訊是作為用戶對全影像的量化特徵的人工分析結果，並作為此目標影像的訓練參數。 In one embodiment, the present invention further provides a target setting function. Specifically, in any display field of view, step S23: the platform 10 can input the interface through the target module 212 to accept the user terminal 11 after viewing the current display field of view. The target information set. The aforementioned target information is the result of the user's manual analysis of the quantitative features of the full image and serves as the training parameter for this target image.

以判斷癌細胞的比例為例，當專家以低倍率(視野較廣)觀看病理影像時，可能認為癌細胞比例高(如推估80%)，而設定此病理影像的標的資訊為「癌細胞比例80%」。於調整顯示視野後，如以中倍率(視野較窄)觀看病理影像時，可能發現部分細胞並非癌細胞(如推估60%)，此時可修正此病理影像的標的資訊為「癌細胞比例60%」。 Take judging the proportion of cancer cells as an example. When experts view pathological images at low magnification (wide field of view), they may think that the proportion of cancer cells is high (such as an estimated 80%), and set the target information of this pathological image as "cancer cells." The ratio is 80%.” After adjusting the display field of view, if you view the pathological image at medium magnification (narrow field of view), you may find that some cells are not cancer cells (e.g., an estimated 60%). At this time, you can modify the target information of this pathological image to "proportion of cancer cells." 60%".

於一實施例中，本發明進一步提供知識資訊提示功能，具體而言，於任一顯示視野下，步驟S24：知識處理與提供模組213判斷目前的顯示視野於知識庫121中是否存在對應的知識資訊。若有，則於瀏覽介面顯示此知識資訊，以對此顯示視野進行說明。 In one embodiment, the present invention further provides a knowledge information prompting function. Specifically, in any display field of view, step S24: the knowledge processing and providing module 213 determines whether the current display field of view has a corresponding information in the knowledge base 121. Knowledge information. If so, the knowledge information is displayed in the browsing interface to explain the display field of view.

於一實施例中，影像資訊處理模組202於每次決定顯示視野後，可生成此顯示視野的影像瀏覽資訊(如識別參數或識別碼)，知識獲取模組21可基於影像瀏覽資訊搜尋知識庫121來偵測是否存在與此顯示視野有關的知識資訊(如先前輸入的問答集或整張目標影像的標的資訊，或AI自動判讀的內容)。 In one embodiment, the image information processing module 202 can generate image browsing information (such as identification parameters or identification codes) of the display field of view each time after determining the display field of view, and the knowledge acquisition module 21 can search for knowledge based on the image browsing information. The library 121 is used to detect whether there is knowledge information related to the displayed field of view (such as the previously input question and answer set or the target information of the entire target image, or the content automatically interpreted by AI).

舉例來說，請參閱圖11，當用戶瀏覽到圖11的顯示視野時，知識獲取模組21找到目前的顯示視野於知識庫有對應的知識資訊(異常比例70%)，可將知識資訊顯示於標的介面62。此外，若用戶判斷知識資訊不正確時，亦可透過輸入介面進行修改(如修改為90%)，來使標的模組212設定新的標的資訊進行學習。 For example, please refer to Figure 11. When the user browses to the display field of view in Figure 11, the knowledge acquisition module 21 finds that the current display field of view has corresponding knowledge information in the knowledge base (abnormal ratio 70%), and can display the knowledge information. In the target interface 62. In addition, if the user determines that the knowledge information is incorrect, he can also modify it through the input interface (for example, modify it to 90%), so that the target module 212 sets new target information for learning.

於一實施例中，本發明進一步提供瀏覽動作判斷功能，具體而言，於每次切換顯示視野時，動作模組211可取得前一個顯示視野的前一個顯示基準點、前一個縮放等級或前一個旋轉角度，並與目前的顯示視野的顯示基準點、縮放等級或旋轉角度進行比較，以決定從前一個顯示視野切換至目前的顯示視野的瀏覽動作(例如是平移方向、平移量、放大等級、旋轉方向、旋轉角度等)，並將瀏覽動作加入至動作資訊，即依據兩組視野參數的變化來判斷瀏覽動作的內容。 In one embodiment, the present invention further provides a browsing action judgment function. Specifically, each time the display field of view is switched, the action module 211 can obtain the previous display reference point, the previous zoom level or the previous display field of view. A rotation angle and compared with the display reference point, zoom level or rotation angle of the current display field of view to determine the browsing action of switching from the previous display field of view to the current display field of view (such as the translation direction, translation amount, magnification level, Rotation direction, rotation angle degree, etc.), and adds the browsing action to the action information, that is, judging the content of the browsing action based on the changes in the two sets of visual field parameters.

接著，執行步驟S25：平台端10透過訓練資料產生模組221關聯顯示視野與對應的知識資訊(例如是用戶完成的問答集、標的資訊及/或動作資訊)來產生訓練資料。 Next, step S25 is executed: the platform 10 generates training data by associating the display field of view with corresponding knowledge information (such as a question and answer set completed by the user, target information and/or action information) through the training data generation module 221 .

於一實施例中，本發明進一步提供記錄功能(步驟S26-S27)。 In one embodiment, the present invention further provides a recording function (steps S26-S27).

步驟S26：於瀏覽過程中，平台端10透過知識記錄與處理模組22判斷預設的記錄條件是否滿足。前述記錄條件可為每次變換顯示視野、每次操作後或定時(如10秒)，不加以限定。 Step S26: During the browsing process, the platform 10 determines whether the preset recording conditions are met through the knowledge recording and processing module 22. The aforementioned recording conditions can be every time the display field of view is changed, after every operation, or at a fixed time (such as 10 seconds), and are not limited.

若記錄條件不滿足，則持續監測。 If the recording conditions are not met, monitoring will continue.

若記錄條件滿足，則執行步驟S27：平台端10透過影像擷取模組220基於前述之影像瀏覽資訊擷取顯示視野的影像，知識記錄與處理模組22記錄顯示視野的影像與知識資訊(如用戶完成的問答集、標的資訊及/或動作資訊)，並進行關聯。 If the recording conditions are met, step S27 is executed: the platform 10 uses the image capture module 220 to capture the image of the display field of view based on the aforementioned image browsing information, and the knowledge recording and processing module 22 records the image of the display field of view and knowledge information (such as Questions and answers completed by the user, subject information and/or action information), and related.

於一實施例中，資料庫12包括瀏覽記錄122。知識記錄與處理模組22是關聯顯示視野的視野參數與知識資訊，並作為瀏覽歷程(可包括動作歷程、所有顯示視野的問答集與標的資訊)記錄於瀏覽記錄122。 In one embodiment, database 12 includes browsing records 122 . The knowledge recording and processing module 22 associates the visual field parameters and knowledge information of the displayed visual field, and records it in the browsing record 122 as a browsing history (which may include action history, question and answer sets and target information of all displayed visual fields).

請一併參閱圖8，為本發明一實施例的方法的訓練與自動分析的流程圖。本發明進一步提供訓練功能(步驟S60)與自動分析功能(步驟S61)。 Please also refer to FIG. 8 , which is a flow chart of training and automatic analysis of the method according to an embodiment of the present invention. The present invention further provides a training function (step S60) and an automatic analysis function (step S61).

步驟S60：AI模組13依據訓練資料對學習模型130進行訓練。於一實施例中步驟S60包括步驟S70-S73。 Step S60: The AI module 13 trains the learning model 130 based on the training data. In one embodiment, step S60 includes steps S70-S73.

步驟S70：AI模組13載入取得學習模型130。 Step S70: The AI module 13 loads and obtains the learning model 130.

步驟S71：AI模組13透過訓練模組132對訓練資料進行分析。 Step S71: The AI module 13 analyzes the training data through the training module 132.

於一實施例中，訓練資料包括顯示視野的影像與其對應的知識資訊(例如是問答集、動作資訊、標的資訊等)。訓練模組132是分析顯示視野的影像特徵，並基於知識資訊(如問答集的問題與用戶輸入的標的資訊)產生文字特徵，即將知識資訊轉換至語意網。 In one embodiment, the training data includes images showing visual fields and corresponding knowledge information (such as question and answer sets, action information, target information, etc.). The training module 132 analyzes the image features of the display field of view and generates text features based on knowledge information (such as questions in the question and answer set and target information input by the user), that is, converts the knowledge information into the semantic network.

於一實施例中，AI模組13可先透過資料轉換模組131將訓練資料轉換為訓練模組132及/或學習模型130可接受的格式，再進行分析與訓練。 In one embodiment, the AI module 13 may first convert the training data into a format acceptable to the training module 132 and/or the learning model 130 through the data conversion module 131, and then perform analysis and training.

步驟S72：訓練模組132輸入影像特徵與文字特徵至學習模型130以產生預測知識資訊(例如是問題的預測答案、預測標的資訊或預測動作資訊等)。 Step S72: The training module 132 inputs image features and text features to the learning model 130 to generate predictive knowledge information (such as predicted answers to questions, predicted target information, or predicted action information, etc.).

步驟S73：訓練模組132比較預測知識資訊與訓練資料的知識資訊來訓練學習模型。如比較預測答案與用戶的回應答案，比較預測標的資訊與用戶輸入的標的資訊，或比較預測動作資訊與用戶實際的瀏覽動作等，不加以限定。 Step S73: The training module 132 compares the predicted knowledge information with the knowledge information of the training data to train the learning model. For example, comparing the predicted answer with the user's response answer, comparing the predicted target information with the target information input by the user, or comparing the predicted action information with the user's actual browsing action, etc., etc., without limitation.

步驟S61：AI模組13接收影像輸入來產生對應的預測知識資訊。於一實施例中步驟S61包括步驟S80-S81。 Step S61: The AI module 13 receives image input to generate corresponding predictive knowledge information. In one embodiment, step S61 includes steps S80-S81.

步驟S80：AI模組13接受(例如是用戶端11或平台端10)操作來選擇一張影像(另一目標影像)。 Step S80: The AI module 13 accepts an operation (for example, the client 11 or the platform 10) to select an image (another target image).

步驟S81：AI模組13輸入此目標影像至學習模型130，以獲得此目標影像的預測知識資訊(可包括此目標影像的一或多個顯示視野的預測動作歷程、預測問題與預測標的資訊)，並可儲存於知識庫121。 Step S81: The AI module 13 inputs the target image to the learning model 130 to obtain the prediction knowledge information of the target image (which may include the prediction action process, prediction questions and prediction target information of one or more display fields of the target image). , and can be stored in the knowledge base 121.

請參閱圖12，為本發明一實施例的動作歷程的示意圖。於完成目標影像的歷程記錄後，本發明的平台端10可提供回放功能，於瀏覽介面中顯示目標影像的動作歷程70、顯示視野71與縮圖介面72。用戶端可於事後選擇動作歷程70的任一步驟進行觀看，或者連續撥放動作歷程70的多個步驟，而重現分析時的瀏覽路徑。 Please refer to FIG. 12 , which is a schematic diagram of the action process of an embodiment of the present invention. After completing the history recording of the target image, the platform terminal 10 of the present invention can provide a playback function and display the action history 70 of the target image, the display field of view 71 and the thumbnail interface 72 in the browsing interface. The client can later choose to act View any step of the action process 70, or continuously play multiple steps of the action process 70 to reproduce the browsing path during analysis.

請參閱圖13，為本發明一實施例的熱點圖的示意圖。 Please refer to FIG. 13 , which is a schematic diagram of a heat map according to an embodiment of the present invention.

於對目標影像進行分析後，本發明的平台端10可提供熱點圖功能，可顯示目標影像在演算法分析的過程中，哪些部分是模型關注的區域。 After analyzing the target image, the platform terminal 10 of the present invention can provide a heat map function, which can display which parts of the target image are the areas of concern to the model during the algorithm analysis process.

如圖所示，平台端10於瀏覽介面中顯示目標影像的顯示視野介面80與縮圖介面81。於顯示視野介面80中，對於演算法分析信心程度較高的區域，會以較接近淺色熱區801表示，對於演算法分析信心程度較低的區域，會以深色熱區802表示，此外白色區域800指的是背景區域。 As shown in the figure, the platform terminal 10 displays a display view interface 80 and a thumbnail interface 81 of the target image in the browsing interface. In the display field of view interface 80, areas with higher confidence in algorithm analysis will be represented by lighter-colored hot areas 801, and areas with lower confidence in algorithm analysis will be represented by darker hot areas 802. In addition, The white area 800 refers to the background area.

藉此，用戶或其他用戶可直接了解目標影像的各位置的重要性，而可直接忽略白色熱區800、深色熱區802(不重要區域)的瀏覽來節省時間，並更專注在淺色熱區801(重要區域)的瀏覽來提升精確度。 In this way, the user or other users can directly understand the importance of each position of the target image, and can directly ignore the browsing of the white hot area 800 and the dark hot area 802 (unimportant area) to save time and focus more on the light color. Browse hot area 801 (important area) to improve accuracy.

本發明透過於指定的顯示視野提示問題或知識資訊，可以讓專家知悉目前的顯示視野為感興趣區域，而更為專注在目前的顯示視野的分析。 By prompting questions or knowledge information in a designated display field of view, the present invention can allow experts to know that the current display field of view is an area of interest and focus more on the analysis of the current display field of view.

本發明透過網頁提供目標影像的瀏覽服務(web模組104)，可讓世界各地的專家透過網頁瀏覽器來觀看相同的目標影像，並對相同的目標影像進行瀏覽操作(包括回答問題與設定標的資訊)，而可以收集到不同專家對於相同的目標影像的不同知識資訊(即訓練資料)。 The present invention provides a target image browsing service (web module 104) through a web page, allowing experts around the world to view the same target image through a web browser and perform browsing operations on the same target image (including answering questions and setting targets). information), and different knowledge information (i.e., training data) of different experts on the same target image can be collected.

並且，後續藉由彙整這些知識資訊，提供給AI模組13進行訓練，作為學習模型130的訓練資料，而可以生成專家驗證且高可靠度的學習模型130。未來其他專家(如病理醫師)進行影像判讀時，可先參考AI模組13判讀的成果(如預測知識資訊)作為後續病理報告撰寫的依據，由於參考來源一致，可以提升專家們最終判讀的結果的一致性。 Moreover, by subsequently collecting this knowledge information and providing it to the AI module 13 for training as training data for the learning model 130, an expert-verified and highly reliable learning model 130 can be generated. When other experts (such as pathologists) perform image interpretation in the future, they can first refer to the results of AI module 13 interpretation (such as predictive knowledge information) as the basis for writing subsequent pathology reports. Since the reference sources are consistent, the final interpretation results of the experts can be improved. consistency.

本發明透過網頁提供目標影像的瀏覽服務還可降低實體玻片運輸的時間與損壞風險。此外，透過提示目前顯示視野的知識資與接受標的資訊的輸入，可完整記錄專家在閱片過程中的想法，且可分享給其他專家觀看，有助於讓不同專家理解彼此的分析想法與策略。 The present invention provides a browsing service of target images through a web page and can also reduce the time and damage risk of physical slide transportation. In addition, by prompting the knowledge of the currently displayed field of view and accepting the input of target information, the expert's thoughts during the image reading process can be completely recorded and shared with other experts for viewing, which helps different experts understand each other's analysis ideas and strategies. .

當然，本發明還可有其它多種實施例，在不背離本發明精神及其實質的情況下，本發明所屬技術領域中具有通常知識者當可根據本發明作出各種相應的改變和變形，但這些相應的改變和變形都應屬於本發明所附的申請專利範圍。 Of course, the present invention can also have various other embodiments. Without departing from the spirit and essence of the present invention, those with ordinary knowledge in the technical field to which the present invention belongs can make various corresponding changes and modifications according to the present invention. However, these Corresponding changes and deformations shall fall within the scope of the patent application attached to the present invention.

Claims

A method for generating training data through question and answer, including: a) a platform providing browsing of a target image; b) the platform accepting operations to adjust a display field of view of the target image; c) the platform presenting to analyze the Display a question of the field of view and obtain a response answer; and d) the platform associates a question and answer set with a browsing action to generate training data for automatically analyzing the target image; wherein the question and answer set corresponds to the display field of view , the question and answer set includes the question and the response answer; wherein the decision of the browsing action is based on obtaining the previous display reference point, the previous zoom level or the previous rotation angle of the previous display field of view, and converting the previous display reference point , the previous zoom level or the previous rotation angle is compared with a display reference point, a zoom level or a rotation angle of the current display field of view to determine the time to switch from the previous display field of view to the current display field of view. The browsing action.

The method for generating training data through question and answer as described in claim 1, wherein step a) includes: a1) after logging in to the platform, the platform receives an operation to select one of multiple images from an image library as the target image, or upload the image to the platform as the target image; a2) the platform determines the display field of view of the target image based on at least one of the display reference point, the zoom level and the rotation angle; and a3 ) The platform terminal outputs an image of the display field of view to present the display field of view.

The method of generating training data through question and answer as described in claim 1, wherein the target image is divided into a plurality of tiles; the step b) includes: b1) The platform accepts an operation to adjust a field of view parameter, wherein the field of view parameter includes at least one of the display reference point, the zoom level and the rotation angle; b2) the platform decides to enter based on the adjusted field of view parameter All the tiles of the display field of view are used as images of the adjusted display field of view; and b3) the platform outputs the adjusted image of the display field of view to present the adjusted display field of view.

The method for generating training data through question and answer as described in claim 1, wherein step c) includes: c1) the platform obtains the corresponding question from a database based on an identification code of the display field of view, wherein the question is for An image feature of the display field of view is preset or automatically generated through machine learning, and is used to analyze the image feature; c2) the platform displays the question; and c3) the platform receives the response answer.

The method of generating training data through question and answer as described in claim 4, wherein the question is a closed question; the step c3) is to display multiple answer options and use the selected answer option as the response answer.

The method of generating training data through question and answer as described in claim 4, wherein the question is an open-ended question; the step c3) is to display a text input area and use the text content input into the text input area as the response answer.

The method of generating training data through question and answer as described in claim 1 further includes: e1) accepting an operation through the platform to select an input image; and e2) inputting the input image into a learning model through the platform to obtain the A predicted motion course of the input image.

The method for generating training data through question and answer as described in claim 1 further includes at least one of the following steps: f) The platform accepts an operation to set a target information, wherein the target information is an image as the target image An analysis result of the characteristics; and g) when the platform determines that the current display field of view has corresponding knowledge information in a database, the knowledge information is displayed, where the knowledge information is used to explain the display field of view.

The method of generating training data through question and answer as described in claim 1 further includes: h1) obtaining a learning model through an AI module; h2) analyzing an image feature of the display field of view of the training data through the AI module, and generate a text feature based on the question and answer set; h3) input the image feature and the text feature to the learning model through the AI module to generate a predicted answer; and h4) use the AI module to compare the predicted answer with the response answers to train the learning model.

The method of generating training data through question and answer as described in claim 1 further includes: i1) accepting an operation through the platform to select an input image; and i2) inputting the input image into a learning model through the platform to obtain the Input a prediction knowledge information of the image.

A system for generating training data through question and answer, including: a database for storing a target image; and a platform connected to the database and connected to a client via a network, the platform being configured to The client provides a browsing interface to browse the target image, accepts operations from the client to adjust a display field of view of the target image, presents a question for analyzing the display field of view, obtains a response answer, and associates a question and answer set with A browse action is used to generate a training set for automatic analysis of the target image. Training data, wherein the question and answer set corresponds to the display field of view, and the question and answer set includes the question and the response answer; wherein, the platform includes: an action module configured to obtain the previous display reference point of the previous display field of view , the previous zoom level or the previous rotation angle, the previous display reference point, the previous zoom level or the previous rotation angle and the current display reference point, a zoom level or a rotation angle of the display field of view Comparison is performed to determine a browsing action of switching from the previous display view to the current display view.

The system for generating training data through question and answer as described in request 11, wherein the database includes: a user database for storing multiple registration information; and an image database for storing multiple images; wherein, the platform The client includes: a web module configured to interact with the client through a web page to receive a login information of the client, and select one of the multiple images in the image library or upload an image from the client as the target image; and a management module configured to accept login when the login information matches any of the registration information.

The system for generating training data through question and answer as described in claim 11, wherein the platform includes: an image viewing module configured to obtain the target image, wherein the target image is divided into a plurality of tiles; an image operation A module configured to adjust a field of view parameter according to the operation of the user terminal, wherein the field of view parameter includes at least one of the display reference point, the zoom level and the rotation angle; and An image information processing module configured to determine all the tiles entering the display field of view as images of the adjusted display field of view based on the adjusted field of view parameter, and output the adjusted image of the display field of view to the client. , to present the adjusted display field of view on the client.

The system for generating training data through question and answer as described in claim 11, wherein the database includes a knowledge base for storing a plurality of questions corresponding to different display fields of view of the target image, and the questions are directed to the display field of view. An image feature is preset or automatically generated through machine learning, and is used to analyze the image feature. The display field of view and the corresponding question are associated with an identification code; wherein, the platform includes: a question and answer module , is configured to obtain the corresponding question based on the identification code of the current display field of view, display the question on the client, and receive the response answer from the client.

The system for generating training data through question and answer as described in claim 14, wherein the question is a closed question or an open question; wherein the question and answer module is configured to display on the client when the question is a closed question There are multiple answer options, and the selected answer option is used as the response answer. When the question is a closed question, a text input area is displayed, and the text content entered into the text input area is used as the response answer.

The system for generating training data through question and answer as described in claim 11, wherein the platform includes: a knowledge recording and processing module, which is configured to capture the image of the displayed field of view and record the corresponding when a recording condition is met. A piece of knowledge information, wherein the knowledge information includes the question and answer set and the browsing action, and the display field of view and the knowledge information are mapped to an identification code for association.

The system for generating training data through question and answer as described in claim 11 further includes an AI module, and the AI module includes: A learning model for automatically generating at least one predicted browsing action for an input image, simulating browsing of the input image based on the at least one predicted browsing action to transform the display field of view, and performing image analysis based on the transformed display field of view A prediction target information of the input image; wherein the learning model includes a DQN architecture model; wherein the prediction target information is associated with the input image.

The system for generating training data through question and answer as described in claim 11, wherein the database includes a knowledge base for storing a plurality of knowledge information corresponding to different display fields of view, and the knowledge information is used to describe the display field of view. ; Among them, the platform further includes: a target module configured to accept the operation of the client to set a target information, wherein the target information is an analysis result of an image feature of the target image; and a knowledge processing The providing module is configured to display the knowledge information on the client when it is determined that the current display field of view has corresponding knowledge information; wherein, the platform is configured to associate the display field of view with the knowledge information to generate The training data and the knowledge information include the question and answer set and the set target information.

The system for generating training data through question and answer as described in claim 11 further includes an AI module. The AI module includes: a learning model for automatically analyzing an input image to automatically generate at least one element of the input image. Predictive knowledge information of a displayed field of view, wherein the predicted knowledge information includes at least one of a predicted action process, a predicted target information, a predicted question of any displayed field of view, and a predicted answer of any displayed field of view.

The system for generating training data through question and answer as described in claim 19, wherein the learning model includes a VQA architecture model; Wherein, the AI module further includes a training module, which is configured to analyze an image feature of the display field of view of the training data, and generate a text feature based on knowledge information of the same display field of view of the training data, and input the image Features and the text features are applied to the learning model to generate the predicted answer, and the learning model is adjusted by comparing the predicted answer with the response answer.