TW202131352A

TW202131352A - System of generating training data by questions and answers and method thereof

Info

Publication number: TW202131352A
Application number: TW110103801A
Authority: TW
Inventors: 阮柏翰; 鄒奇軒
Original assignee: 台達電子工業股份有限公司
Priority date: 2020-02-07
Filing date: 2021-02-02
Publication date: 2021-08-16
Also published as: TWI832032B; CN113254608A

Abstract

A system of generating training data by questions and answers and a method thereof are provided by the disclosure. A platform-end provides the browse of a target image, receives the operation to adjust a display view of the target image, shows a question used to analyze the display view, gets a response answer, and associates the display view with the question and the response answer to generating the training data used to automatically analyze the target image. The disclosure can automatically prompt the region of interest, and significantly reduce the burden to collect the training data to the user.

Description

System and method for generating training data through question and answer

本發明涉及一種生成訓練資料的系統及其方法，特別是涉及一種透過問答生成訓練資料的系統及其方法。The present invention relates to a system and method for generating training data, in particular to a system and method for generating training data through question and answer.

對於病理判讀，目前都是由專家(如病理科醫師)進行人工判讀，於人工判讀過程中，專家必須使用顯微鏡觀看實體玻片，同時使用錄音筆、紙筆等工具手動記錄觀看的結果，最後根據觀看結果撰寫病理報告。For pathology interpretation, experts (such as pathologists) currently perform manual interpretation. In the process of manual interpretation, experts must use a microscope to view the solid glass slide, and use a recording pen, paper and pen to manually record the viewing results. Finally, Write a pathology report based on the viewing results.

然而，上述方式存在以下問題： 1. 專家判讀不一致：不同專家對於同一實體玻片通常有不同的閱片角度(即所認定的感興趣區域不同)，這會導致不同專家對於同一實體玻片產生不同想法，最終導致病理報告內容的不一致。 2. 多人共同閱讀不易：受限於實體玻片及顯微鏡同時僅容許一人進行觀看，人工判讀無法實現多人共同觀看，也無法確保多人輪流觀看的過程中，都能聚焦於同一個感興趣區域。 3. 病理特徵的多樣性：隨著醫學技術進展，越來越多的病理特徵被發現。對於快速增加的病理特徵，專家必須於繁重工作之外，提撥大量時間學習新的病理特徵，以正確聚焦病理特徵所在的感興趣區域，而造成專家的嚴重負擔。However, the above method has the following problems: 1. Expert interpretation is inconsistent: different experts usually have different reading angles for the same physical slide (that is, the identified areas of interest are different), which will cause different experts to have different ideas about the same physical slide, and ultimately lead to the content of the pathology report. Inconsistent. 2. It is not easy for multiple people to read together: limited to the physical slide and microscope allowing only one person to view at the same time, manual interpretation cannot achieve multiple people watching together, and it cannot ensure that multiple people can focus on the same feeling when viewing in turn. Area of interest. 3. Diversity of pathological features: With the advancement of medical technology, more and more pathological features have been discovered. For the rapidly increasing pathological features, experts must spend a lot of time to learn new pathological features in addition to heavy work, so as to correctly focus on the region of interest where the pathological features are located, which causes a serious burden on the experts.

因此，現有對實體玻片進行人工判讀的方式存在前述問題，而亟待更有效的方案被提出。Therefore, the existing methods for manual interpretation of physical slides have the aforementioned problems, and more effective solutions are urgently needed.

本發明提供一種透過問答生成訓練資料的系統與方法，可於用戶瀏覽影像期間，提示感興趣區域，並透過問答蒐集感興趣區域的訓練資料。The present invention provides a system and method for generating training data through question and answer, which can prompt a region of interest when a user browses an image, and collect training data of the region of interest through question and answer.

本發明提出一種透過問答生成訓練資料的方法，包括以下步驟：於一平台端提供一目標影像的瀏覽；接受操作來調整該目標影像的一顯示視野；呈現用以分析該顯示視野的一問題，並取得一回應答案；及，關聯該顯示視野與對應的一問答集來產生用於自動分析該目標影像的一訓練資料，其中該問答集包括該問題與該回應答案。The present invention provides a method for generating training data through question and answer, which includes the following steps: providing a target image for browsing on a platform side; accepting an operation to adjust a display field of view of the target image; presenting a problem for analyzing the display field of view, And obtain a response answer; and, associate the display field of view with a corresponding question and answer set to generate a training data for automatically analyzing the target image, wherein the question and answer set includes the question and the response answer.

本發明另提出一種透過問答生成訓練資料的系統，包括一資料庫與一平台端，該資料庫用以儲存一目標影像，該平台端連接該資料庫，並經由網路連接一用戶端，該平台端被配置來於該用戶端提供一瀏覽介面來以瀏覽該目標影像，接受來自該用戶端的操作來調整該目標影像的一顯示視野，呈現用以分析該顯示視野的一問題，取得一回應答案，並關聯該顯示視野與對應的一問答集來產生用於自動分析該目標影像的一訓練資料，其中該問答集包括該問題與該回應答案。The present invention also provides a system for generating training data through question and answer, including a database and a platform, the database is used to store a target image, the platform is connected to the database, and connected to a client via a network, the The platform is configured to provide a browsing interface on the client to browse the target image, accept operations from the client to adjust a display field of view of the target image, present a problem for analyzing the display field of view, and obtain a response Answer, and correlate the display field of view with a corresponding question and answer set to generate a training data for automatically analyzing the target image, wherein the question and answer set includes the question and the response answer.

本發明可自動提示感興趣區域，並大幅降低為了收集訓練資料對用戶造成的負擔。The invention can automatically prompt the region of interest, and greatly reduces the burden on the user for collecting training data.

下面結合圖式和具體實施例對本發明技術方案進行詳細的描述，以更進一步瞭解本發明的目的、方案及功效，但並非作為本發明所附申請專利範圍的限制。The technical scheme of the present invention will be described in detail below in conjunction with the drawings and specific embodiments to further understand the purpose, scheme and effect of the present invention, but it is not intended as a limitation of the scope of the attached patent application of the present invention.

人工智慧(AI，Artificial Intelligence)的影像自動判讀必須使用大量的訓練資料來訓練學習模型。The automatic interpretation of artificial intelligence (AI) images must use a large amount of training data to train the learning model.

於瀏覽影像期間，專家可對大量影像逐張進行人工判讀與特徵標記(labeling)，再依據這些標記產生訓練資料，但這使得專家必須中斷瀏覽並耗費大量時間進行標記與註解。During the browsing of the images, the expert can manually interpret and label a large number of images one by one, and then generate training data based on these marks, but this makes the expert must interrupt the browsing and spend a lot of time marking and annotating.

以判讀醫學影像(如數位病理影像)為例，醫師在判讀醫學影像過程中(如判斷是否有癌細胞)，當找到癌細胞時，必須先中斷判讀來對癌細胞進行標記與註解，若同時在跟病患進行說明或進行教學時，會造成極大的不便。Take the interpretation of medical images (such as digital pathology images) as an example. When a doctor is interpreting medical images (such as judging whether there are cancer cells), when a cancer cell is found, the interpretation must be interrupted to mark and annotate the cancer cells. It will cause great inconvenience when explaining or teaching with patients.

對此，本發明主要是提供一種透過問答生成訓練資料的系統與方法。本發明主要原理在於，(以人工或AI技術)預先對待分析影像(即後述之目標影像)的一或多個感興趣區域(即後述之顯示視野)設定與此區域有關的問題。In this regard, the present invention mainly provides a system and method for generating training data through question and answer. The main principle of the present invention is that one or more regions of interest (i.e., the display field of view described below) of an image to be analyzed (i.e., the target image described later) are set in advance (using artificial or AI technology) to problems related to this area.

並且，於用戶(例如是醫師、專家或檢查人員等)判讀目標影像過程中，當瀏覽至已設定問題的感興趣區域時，本發明會自動呈現問題，用戶可以知道目前的顯示視野為感興趣區域，而可以更為專注在目前的顯示視野的觀察。並且，用戶僅需簡單回答與目前區域有關的問題，即能完成此感興趣區域的訓練參數的輸入，由於不會中斷判讀，不會花費過多時間，進而提升用戶進行回饋的意願，而提升訓練資料的精確性。Moreover, in the process of interpreting the target image by the user (for example, a physician, expert or examiner, etc.), when browsing to a region of interest for which a problem has been set, the present invention will automatically present the problem, and the user can know that the current display field of view is of interest Area, and can be more focused on the observation of the current display field of view. In addition, users only need to simply answer questions related to the current area to complete the input of the training parameters of the area of interest. Since the interpretation will not be interrupted, it will not take too much time, which will increase the user’s willingness to give feedback and improve training. The accuracy of the data.

並且，本發明由於免除了用戶手動標記與註解，可以大幅降低用戶負擔，讓用戶可以更為專心進行判讀，而提升判讀的精確性。In addition, since the present invention eliminates the need for manual marking and annotation by the user, the burden on the user can be greatly reduced, so that the user can concentrate more on the interpretation and improve the accuracy of the interpretation.

並且，本發明經由提出與目前瀏覽區域有關的問題，可讓用戶在回答問題時，注意到此問題有關的觀察點，而提升判讀的精確性。Moreover, the present invention raises questions related to the current browsing area, so that the user can pay attention to the observation points related to the question when answering the question, thereby improving the accuracy of interpretation.

請參閱圖1，為本發明一實施例的系統的架構圖。本發明的透過問答生成訓練資料的系統可包括平台端10(例如是伺服器、雲端服務平台、桌機、筆記型電腦等通用電腦或其任意組合)、資料庫12(例如是網路資料庫、本地資料庫、關聯式資料庫等資料庫或其組合)與AI模組13。Please refer to FIG. 1, which is an architecture diagram of a system according to an embodiment of the present invention. The system for generating training data through question and answer of the present invention may include a platform terminal 10 (for example, a server, a cloud service platform, a desktop computer, a notebook computer, or any combination thereof), a database 12 (for example, a network database , Local database, relational database and other databases or their combination) and AI module 13.

平台端10可包括儲存模組101、通訊模組102、人機介面103與處理模組100。The platform 10 may include a storage module 101, a communication module 102, a human-machine interface 103, and a processing module 100.

儲存模組101(例如是RAM、EEPROM、固態硬碟、磁碟硬碟、快閃記憶體等儲存裝置或其任意組合)用以儲存資料。通訊模組102(例如是網路介面卡，NIC)用以連接網路(如網際網路)，並透過網路與外部設備(如資料庫12、AI模組13及/或用戶端11)通訊。人機介面103(包括輸入介面與輸出介面，如滑鼠、鍵盤、各式按鍵、觸控板、顯示器、觸控螢幕、投影模組等)用以與用戶進行互動。處理模組100(可為CPU、GPU、TPU、MCU等處理器或其任意組合)，用以控制平台端並實現本發明所提出之功能。The storage module 101 (for example, a storage device such as RAM, EEPROM, solid-state hard disk, magnetic disk hard disk, flash memory, or any combination thereof) is used for storing data. The communication module 102 (such as a network interface card, NIC) is used to connect to a network (such as the Internet), and through the network and external equipment (such as database 12, AI module 13, and/or client 11) communication. The man-machine interface 103 (including an input interface and an output interface, such as a mouse, a keyboard, various keys, a touch panel, a display, a touch screen, a projection module, etc.) is used to interact with the user. The processing module 100 (which can be a processor such as a CPU, GPU, TPU, MCU, or any combination thereof) is used to control the platform side and implement the functions proposed by the present invention.

AI模組13可建置於伺服器或雲端服務平台(例如是Amazon Web Service、Google Cloud Platform或Microsoft Azure等)，或建置於平台端10。AI模組13包括學習模型130。本發明所產生的訓練資料即是用來建立並訓練學習模型130，以提升學習模型130的精確度。The AI module 13 can be built on a server or a cloud service platform (for example, Amazon Web Service, Google Cloud Platform, Microsoft Azure, etc.), or on the platform side 10. The AI module 13 includes a learning model 130. The training data generated by the present invention is used to establish and train the learning model 130 to improve the accuracy of the learning model 130.

學習模型130是基於機器學習技術所建立並接受訓練，可自動分析輸入影像(如後述之目標影像)進行來產生輸入影像的特定顯示視野的預測知識資訊。前述預測知識資訊即可作為用戶分析輸入影像時的參考資訊。The learning model 130 is established and trained based on machine learning technology, and can automatically analyze the input image (such as the target image described later) to generate predictive knowledge information of a specific display field of the input image. The aforementioned predictive knowledge information can be used as reference information when the user analyzes the input image.

於一實施例中，學習模型130包括VQA(Visual Question Answering，視覺問答)架構模型，能夠基於輸入影像的影像特徵(顯示視野)對文字特徵(問答集、標的資訊等)進行訓練。In one embodiment, the learning model 130 includes a VQA (Visual Question Answering, visual question answering) architecture model, which can train text features (question and answer sets, target information, etc.) based on the image features (display field of view) of the input image.

於另一實施例中，學習模型130包括DQN(Deep Q-Learning，深度Q學習)架構模型，能夠參考用戶瀏覽影像的流程，在每次顯示視野調整後，對於標的的量化數值(標的資訊)進行調整，並在最終產生成AI自動瀏覽影像的預測標的資訊。具體而言，可先透過DQN架構模型產生對於輸入影像每個瀏覽動作(可以是用戶輸入的瀏覽動作或是自動產生的預測瀏覽動作)的分數(Q-Value)，依據分數最高的瀏覽動作模擬對輸入影像的瀏覽以變換顯示視野，並將變換後的顯示視野的影像再次輸入至DQN架構模型，以獲得下一瀏覽動作與其顯示視野，以此類推，直到停止瀏覽動作，並可基於顯示視野(可以是所變換的任一個顯示視野或是最後的顯示視野)的影像分析預測標的資訊。In another embodiment, the learning model 130 includes a DQN (Deep Q-Learning, deep Q-Learning) architecture model, which can refer to the user's image browsing process. After each display field of view is adjusted, the target quantitative value (target information) Make adjustments and finally generate predictive target information for AI automatic browsing images. Specifically, the DQN architecture model can be used to first generate a score (Q-Value) for each browsing action of the input image (which can be a browsing action input by the user or a predicted browsing action automatically generated), and simulate the browsing action with the highest score. Browse the input image to change the display field of view, and input the converted image of the display field of view into the DQN architecture model again to obtain the next browsing action and its display field of view, and so on, until the browsing action is stopped, and can be based on the display field of view (It can be either of the converted display field of view or the last display field of view) image analysis predicts target information.

舉例來說，請參閱圖3至圖5，圖3為本發明一實施例的學習模型的輸出入的示意圖，圖4為本發明一實施例的學習模型的輸出入的示意圖，圖5為本發明一實施例的學習模型的輸出入的示意圖。For example, please refer to FIGS. 3 to 5. FIG. 3 is a schematic diagram of the input and output of the learning model according to an embodiment of the present invention, FIG. 4 is a schematic diagram of the input and output of the learning model according to an embodiment of the present invention, and FIG. 5 is A schematic diagram of the input and output of the learning model according to an embodiment of the invention.

於一實施例中，如圖3所示，當將目標影像的一顯示視野的影像(即目標影像的特定子影像)輸入至學習模型130後，學習模型130可自動分析並產生預測標的資訊(如透過DQN架構模型)。前述標的資訊是對顯示視野的影像特徵進行分析後生成的分析結果，並用來描述此影像特徵的特性，如影像特徵的類型(例如是細胞的病變種類)、範圍(如所占比例)或程度(例如是嚴重、輕微、不影響)等。In one embodiment, as shown in FIG. 3, when a display field image of the target image (that is, a specific sub-image of the target image) is input to the learning model 130, the learning model 130 can automatically analyze and generate prediction target information ( Such as through the DQN architecture model). The aforementioned target information is the analysis result generated after analyzing the image characteristics of the display field, and is used to describe the characteristics of the image characteristics, such as the type of the image characteristics (for example, the type of cell lesions), the range (such as the proportion) or the degree (For example, severe, minor, no impact), etc.

於一實施例中，如圖4所示，當將目標影像的一顯示視野的影像與此顯示視野所對應的問題輸入至學習模型130後，學習模型130可自動分析並產生預測答案(如透過VQA架構模型)。更進一步地，學習模型130還可產生有關於此顯示視野的新問題，藉以於用戶下次瀏覽相同顯示視野時，可以回答問題(包括新問題)來補強學習模型130對於此顯示視野的訓練完整度。In one embodiment, as shown in FIG. 4, when an image of a display field of view of the target image and a question corresponding to the display field are input to the learning model 130, the learning model 130 can automatically analyze and generate a predicted answer (such as through VQA architecture model). Furthermore, the learning model 130 can also generate new questions about the display field of view, so that when the user browses the same display field of view next time, the user can answer questions (including new questions) to strengthen the training of the learning model 130 for the display field of view. Spend.

於一實施例中，如圖5所示，當將目標影像的一顯示視野的影像輸入至學習模型130後，學習模型130可自動分析並產生預測動作歷程(如透過DQN架構模型)。具體而言，可藉由於學習模型130的訓練過程中添加的一或多個用戶對此目標影像的動作歷程(即對目標影像的的多個瀏覽動作的組合)，讓學習模型130可以歸納出適合此目標影像的瀏覽方式並提供經驗不足的用戶做為參考。In one embodiment, as shown in FIG. 5, when an image of a display field of view of the target image is input to the learning model 130, the learning model 130 can automatically analyze and generate a predicted action course (for example, through a DQN architecture model). Specifically, the learning model 130 may be able to summarize the action history of one or more users on the target image (that is, the combination of multiple browsing actions on the target image) added during the training process of the learning model 130 It is suitable for the browsing method of this target image and provides inexperienced users as a reference.

請一併參閱圖1、圖2與6 ，圖2為本發明一實施例的平台端的部分架構圖，圖6為本發明一實施例的方法的流程圖。Please refer to FIGS. 1, 2 and 6 together. FIG. 2 is a partial architecture diagram of a platform side according to an embodiment of the present invention, and FIG. 6 is a flowchart of a method according to an embodiment of the present invention.

於一實施例中，處理模組100可包括模組104-105、20-22、200-202、210-213、220-221，AI模組13可包括模組131-132。這些模組分別被設定來執行不同的功能。In one embodiment, the processing module 100 may include modules 104-105, 20-22, 200-202, 210-213, 220-221, and the AI module 13 may include modules 131-132. These modules are configured to perform different functions.

前述模組是相互連接(可為電性連接與資訊連接)，並可為硬體模組(例如是電子電路模組、積體電路模組、SoC等等)、軟體模組(例如是韌體、作業系統或應用程式)或軟硬體模組混搭，不加以限定。The aforementioned modules are mutually connected (electrical connection and information connection), and can be hardware modules (such as electronic circuit modules, integrated circuit modules, SoC, etc.), software modules (such as firmware System, operating system, or application) or a mix and match of software and hardware modules, which are not limited.

值得一提的是，當前述模組為軟體模組(例如是韌體、作業系統或應用程式)時，儲存模組101與AI模組13可包括非暫態電腦可讀取記錄媒體，前述非暫態電腦可讀取記錄媒體儲存有電腦程式，電腦程式記錄有電腦可執行之程式碼，當處理模組100與AI模組13執行前述程式碼後，可實現對應模組之功能。It is worth mentioning that when the aforementioned modules are software modules (for example, firmware, operating systems, or applications), the storage module 101 and the AI module 13 may include non-transitory computer-readable recording media. The non-transitory computer readable recording medium stores a computer program, and the computer program records a computer executable program code. When the processing module 100 and the AI module 13 execute the aforementioned program code, the function of the corresponding module can be realized.

本實施例的透過問答生成訓練資料的方法包括以下步驟。The method of generating training data through question and answer in this embodiment includes the following steps.

步驟S10：於平台端10透過影像瀏覽與記錄模組20提供目標影像的瀏覽。前述目標影像可儲存於資料庫12或由用戶上傳。Step S10: Provide browsing of the target image through the image browsing and recording module 20 on the platform side 10. The aforementioned target image can be stored in the database 12 or uploaded by the user.

具體而言，平台端10可提供瀏覽介面(如GUI)，並於瀏覽介面中顯示目標影像。Specifically, the platform terminal 10 may provide a browsing interface (such as a GUI), and display the target image in the browsing interface.

於一實施例中，用戶可操作用戶端11(例如是桌機、筆記型電腦、平板、智慧型手機等通用電腦)的人機介面110(與人機介面103相似，於此不再贅述)來連接平台端10，以使用平台端10所提供的瀏覽服務來瀏覽目標影像。In one embodiment, the user can operate the man-machine interface 110 (similar to the man-machine interface 103) of the user terminal 11 (for example, a desktop computer, a notebook computer, a tablet, a smart phone, etc.) To connect to the platform terminal 10 to browse the target image using the browsing service provided by the platform terminal 10.

於一實施例中，用戶可直接操作平台端10的人機介面103來使用瀏覽服務。In one embodiment, the user can directly operate the man-machine interface 103 of the platform terminal 10 to use the browsing service.

步驟S11：平台端10透過影像瀏覽與記錄模組20接受用戶的瀏覽操作(可為本地操作或遠端操作)來調整目標影像的顯示視野，如放大、縮小、平移、旋轉等。Step S11: The platform terminal 10 accepts the user's browsing operation (which can be a local operation or a remote operation) through the image browsing and recording module 20 to adjust the display field of view of the target image, such as zooming in, zooming out, translation, and rotation.

步驟S12：於變更顯示視野後，平台端10透過知識獲取模組21取得目前的顯示視野所對應的問題，並呈現此問題。前述問題是用來分析目前的顯示視野。Step S12: After changing the display field of view, the platform terminal 10 obtains the problem corresponding to the current display field of view through the knowledge acquisition module 21 and presents the problem. The aforementioned question is used to analyze the current display field of view.

接著，用戶閱讀完問題後，可參考目前的顯示視野與過往經驗，快速輸入回應答案以回答此問題，而使得平台端10取得此問題的回應答案。藉此，本發明即可獲得目前的顯示視野的一組問答集(包括問題與回應答案)，即獲得目前的顯示視野的訓練參數。Then, after reading the question, the user can refer to the current display field of view and past experience, and quickly input the response answer to answer the question, so that the platform terminal 10 obtains the response answer to the question. In this way, the present invention can obtain a set of question and answer sets (including questions and response answers) of the current display field of view, that is, obtain the training parameters of the current display field of view.

步驟S13：平台端透過知識記錄與處理模組22關聯顯示視野與其對應的問答集，來產生訓練資料。前述訓練資料是用來提供給學習模型13進行自動分析目標影像的訓練。Step S13: The platform side uses the knowledge record and processing module 22 to display the field of view and its corresponding question and answer set in association with each other to generate training data. The aforementioned training data is used to provide the learning model 13 for training of automatically analyzing the target image.

藉此，用戶僅需於目標影像的瀏覽過程中簡單回答問題，即可供本發明自動建立此目標影像的訓練資料。In this way, the user only needs to simply answer the questions during the browsing process of the target image, and the present invention can automatically create the training data of the target image.

請一併參閱圖1-2、圖7A-7B、圖9-11，圖7A為本發明一實施例的方法的第一部分流程圖，圖7B為本發明一實施例的方法的第二部分流程圖，圖9為本發明一實施例的目標影像的顯示視野的示意圖，圖10為圖9的另一顯示視野的示意圖，圖11為圖9的另一顯示視野的示意圖。Please refer to FIGS. 1-2, FIGS. 7A-7B, and FIGS. 9-11 together. FIG. 7A is a flowchart of the first part of a method according to an embodiment of the invention, and FIG. 7B is a flowchart of the second part of a method according to an embodiment of the invention 9 is a schematic diagram of a display field of view of a target image according to an embodiment of the present invention, FIG. 10 is a schematic diagram of another display field of view of FIG. 9, and FIG. 11 is a schematic diagram of another display field of view of FIG. 9.

於本實施例中，平台端10透過web模組104(如網頁伺服器模組)來提供網頁瀏覽服務以與用戶端11進行互動。In this embodiment, the platform 10 provides a web browsing service through a web module 104 (such as a web server module) to interact with the client 11.

步驟S20：平台端10透過web模組104來於用戶端11提供瀏覽介面來以提供瀏覽目標影像的網頁服務。Step S20: The platform terminal 10 provides a browsing interface on the client terminal 11 through the web module 104 to provide a web service for browsing the target image.

於一實施例中，步驟S20可包括以下步驟S30-S34。In one embodiment, step S20 may include the following steps S30-S34.

步驟S30：用戶端11透過網頁登入平台端10。Step S30: The client 11 logs in to the platform 10 through a webpage.

於一實施例中，資料庫12可包括用戶資料庫123，用戶資料庫123用以儲存不同用戶(或用戶端11)的註冊資料(例如是帳號密碼、網路位址、硬體位址、裝置識別碼、數位簽章、Dongle Key等可識別資訊)。用戶可操作用戶端11來提供登入資料(例如是帳號密碼、網路位址、硬體位址、裝置識別碼、數位簽章、Dongle Key等可識別資訊)至平台端10。In one embodiment, the database 12 may include a user database 123, and the user database 123 is used to store the registration information of different users (or the client 11) (for example, account password, network address, hardware address, device Identification code, digital signature, Dongle Key and other identifiable information). The user can operate the client terminal 11 to provide login information (for example, account password, network address, hardware address, device identification code, digital signature, Dongle Key and other identifiable information) to the platform terminal 10.

接著，平台端10透過管理模組105比對登入資料與各註冊資料，並於登入資料與任一註冊資料相符時判定通過驗證並允許登入。Then, the platform terminal 10 compares the login information with each registered information through the management module 105, and determines that the authentication is passed and the login is allowed when the login information matches any of the registered information.

步驟S31：於成功登入後，平台端10透過web模組104提供瀏覽介面(如網頁程式)至用戶端11以呈現瀏覽介面。Step S31: After successfully logging in, the platform terminal 10 provides a browsing interface (such as a web program) to the client terminal 11 through the web module 104 to present the browsing interface.

於一實施例中，資料庫12包括影像庫120，影像庫120儲存有多張影像，如電子元件影像、醫學影像、監視器影像或其他需要判讀的影像。In one embodiment, the database 12 includes an image library 120, and the image library 120 stores multiple images, such as electronic component images, medical images, monitor images, or other images that need to be interpreted.

步驟S32：用戶端11可透過網頁自影像庫120中選擇多個影像的其中之一作為目標影像(即接下來要判讀的影像)，或者上傳影像至平台端10作為目標影像。Step S32: The client 11 can select one of the multiple images from the image library 120 as the target image (ie, the image to be interpreted next) through the webpage, or upload the image to the platform 10 as the target image.

接著，平台端10透過影像檢視模組200匯入此目標影像。Then, the platform terminal 10 imports the target image through the image viewing module 200.

步驟S33：平台端10透過影像資訊處理模組202依據預設的視野參數(如預設的顯示基準點、縮放等級及/或旋轉角度)來決定要呈現的目標影像的顯示視野，即決定要呈現的目標影像的範圍。Step S33: The platform terminal 10 uses the image information processing module 202 to determine the display field of view of the target image to be presented according to preset field of view parameters (such as a preset display reference point, zoom level, and/or rotation angle), that is, determine the desired field of view. The range of the target image presented.

前述顯示視野的顯示基準點可為影像座標(如像素位置)，是表示此顯示視野的基準點(如影像中心點或任一影像邊界點)於整張目標影像中的位置。前述顯示視野的縮放等級是表示此顯示視野的目前的放大或縮小的等級(此數值會依據不同的解析度的影像，依比例對應到實際放大或縮小的倍率)。前述顯示視野的旋轉角度是表示此顯示視野相對於目標影像的角度差。The aforementioned display reference point of the display field of view may be an image coordinate (such as a pixel position), which represents the position of the reference point (such as the image center point or any image boundary point) of the display field of view in the entire target image. The aforementioned zoom level of the display field of view refers to the current magnification or reduction level of the display field (this value will correspond to the actual magnification or reduction in proportion to images of different resolutions). The aforementioned rotation angle of the display field of view refers to the angle difference of the display field of view with respect to the target image.

舉例來說，平台端10可被設定為：縮放等級為1時，實際放大倍率為5倍；縮放等級為5時，實際放大倍率為40倍。For example, the platform end 10 can be set to: when the zoom level is 1, the actual magnification is 5 times; when the zoom level is 5, the actual magnification is 40 times.

於另一例子，平台端10可被設定為：縮放等級為1時，實際放大倍率為0.25倍(如縮小以使整張目標影像進入顯示視野)；縮放等級為5時，實際放大倍率為1倍(如等比例瀏覽)。In another example, the platform end 10 can be set to: when the zoom level is 1, the actual magnification is 0.25 times (such as zooming out so that the entire target image enters the display field of view); when the zoom level is 5, the actual magnification is 1 Times (e.g. browsing in equal proportions).

本發明之縮放等級與實際放大倍率之間的關係(如比例)可視需求而定，不加以限定。The relationship (such as the ratio) between the zoom level and the actual magnification ratio of the present invention can be determined according to requirements and is not limited.

藉此，基於顯示基準點與縮放等級，或顯示基準點、縮放等級與旋轉角度，影像資訊處理模組202可決定視野範圍，如以包圍框(bounding box)計算視野範圍。Thereby, based on the display reference point and zoom level, or the display reference point, zoom level, and rotation angle, the image information processing module 202 can determine the field of view, such as calculating the field of view with a bounding box.

步驟S34：平台端10透過網路傳送所決定的顯示視野的影像傳送至用戶端11，以於瀏覽介面上呈現此顯示視野。Step S34: The platform terminal 10 transmits the determined image of the display field of view to the client 11 through the network, so as to present the display field of view on the browsing interface.

步驟S21：用戶端11透過瀏覽介面調整目標影像的顯示視野。Step S21: The client 11 adjusts the display field of view of the target image through the browsing interface.

於一實施例中，步驟S21可包括以下步驟S40-S42。In one embodiment, step S21 may include the following steps S40-S42.

步驟S40：用戶端11透過瀏覽介面輸入瀏覽操作來調整顯示視野，平台端10透過影像操作模組201蒐集用戶的瀏覽操作，並轉換為各視野參數的調整量(即決定調整後的視野參數)。Step S40: The user terminal 11 inputs a browsing operation through the browsing interface to adjust the display field of view, and the platform side 10 collects the user's browsing operation through the image operation module 201 and converts it into the adjustment amount of each field of view parameter (that is, determines the adjusted field of view parameter) .

於一實施例中，瀏覽介面可顯示各種操作按鈕(如左移、右移、上移、下移、放大、縮小、順時針旋轉、逆時針旋轉等)，影像操作模組201監控這些操作按鈕來蒐集用戶的瀏覽操作。In one embodiment, the browsing interface can display various operation buttons (such as move left, move right, move up, move down, zoom in, zoom out, rotate clockwise, rotate counterclockwise, etc.), and the image operation module 201 monitors these operation buttons To collect the user's browsing operations.

於一實施例中，影像操作模組201可監控用戶端11的鍵盤或滑鼠動作來蒐集用戶的瀏覽操作(如監測滑鼠或鍵盤訊號來對應觸發調整放大等級(以縮放倍率)、平移顯示基準點與旋轉影像)。In one embodiment, the image operation module 201 can monitor the keyboard or mouse actions of the client 11 to collect the user's browsing operations (such as monitoring mouse or keyboard signals to trigger adjustment of the zoom level (at zoom ratio), panning display Datum point and rotating image).

步驟S41：平台端10透過影像資訊處理模組202基於調整後的視野參數決定新的顯示視野，即決定要呈現的目標影像的新範圍。Step S41: The platform terminal 10 determines a new display field of view based on the adjusted field of view parameters through the image information processing module 202, that is, determines the new range of the target image to be presented.

步驟S42：平台端10透過網路輸出調整後的顯示視野的影像傳送至用戶端11，以於瀏覽介面上呈現調整後的顯示視野。Step S42: The platform terminal 10 outputs an image of the adjusted display field of view through the network and transmits it to the client terminal 11 to present the adjusted display field of view on the browsing interface.

於一實施例中，平台端10於傳輸顯示視野的影像至用戶端11前，可先進行影像壓縮(尤其是失真影像壓縮)來適當地減少影像細節並降低解析度，藉以適度減少影像的大小而有利於網路傳輸。In one embodiment, the platform 10 may perform image compression (especially distorted image compression) before transmitting the image of the display field of view to the client 11 to appropriately reduce the image details and reduce the resolution, thereby appropriately reducing the size of the image It is conducive to network transmission.

值得一提的是，當目標影像的過大(如解析度過高)時，傳送整張目標影像至用戶端11必須耗費大量傳輸時間，且用戶端11的可能因為硬體效能不足，無法順暢地瀏覽整張目標影像。It is worth mentioning that when the target image is too large (for example, the resolution is too high), it takes a lot of transmission time to send the entire target image to the client 11, and the client 11 may not be able to perform smoothly due to insufficient hardware performance. Browse the entire target image.

對此，本發明僅傳送顯示視野的影像(並可適度壓縮)至用戶端11，可確保用戶端11順暢地瀏覽整張目標影像。並且，當用戶端11希望觀看影像細節時，可執行放大的瀏覽操作，來使平台端10回傳區域放大後的影像以供順暢瀏覽，而可提升用戶體驗。In this regard, the present invention only transmits the image of the display field of view (and can be appropriately compressed) to the client 11, which can ensure that the client 11 can browse the entire target image smoothly. Moreover, when the user terminal 11 wants to view the details of the image, it can perform an enlarged browsing operation, so that the platform terminal 10 can return the enlarged image of the area for smooth browsing, which can improve the user experience.

於一實施例中，目標影像可被分割為多個圖磚，平台端10是以圖磚作為最小的傳送單位。In one embodiment, the target image can be divided into multiple tiles, and the platform end 10 uses the tiles as the smallest transmission unit.

舉例來說，如圖9所示，顯示基準點是影像中心，目標影像30分割為48個圖磚，且各圖磚有對應的識別碼(如座標)。For example, as shown in FIG. 9, the display reference point is the image center, the target image 30 is divided into 48 tiles, and each tile has a corresponding identification code (such as coordinates).

並且，預設的視野參數的顯示視野可以是全範圍，即顯示基準點40為(4,3)，放大等級為1，旋轉角度為0度。平台端10會將顯示視野(全範圍)中的所有圖磚(共48個)傳送至用戶端11，藉以於瀏覽介面顯示完整的目標影像30。In addition, the display field of view of the preset field of view parameters may be a full range, that is, the display reference point 40 is (4, 3), the magnification level is 1, and the rotation angle is 0 degrees. The platform terminal 10 transmits all the tiles (48 in total) in the display field of view (full range) to the client terminal 11 so as to display the complete target image 30 on the browsing interface.

接著，當用戶希望將顯示視野調整至範圍50時，可輸入瀏覽操作。如圖10所示，調整後的顯示基準點41為(2.5,3.5)，放大等級為2，旋轉角度為0度。平台端10會將調整後的顯示視野中的所有圖磚(共15個)傳送至用戶端11，藉以於瀏覽介面顯示調整後的顯示視野。Then, when the user wants to adjust the display field of view to the range 50, he can input a browsing operation. As shown in FIG. 10, the adjusted display reference point 41 is (2.5, 3.5), the magnification level is 2, and the rotation angle is 0 degrees. The platform end 10 transmits all the tiles (15 in total) in the adjusted display field of view to the user end 11, so that the adjusted display field of view is displayed on the browsing interface.

接著，當用戶希望將顯示視野調整至範圍51時，可輸入瀏覽操作。如圖11所示，調整後的顯示基準點42為(1.5,4.5)，放大等級為3，旋轉角度為0度。平台端10會將調整後的顯示視野中的所有圖磚(共6個)傳送至用戶端11，藉以於瀏覽介面顯示調整後的顯示視野。Then, when the user wants to adjust the display field of view to the range 51, he can input a browsing operation. As shown in FIG. 11, the adjusted display reference point 42 is (1.5, 4.5), the magnification level is 3, and the rotation angle is 0 degrees. The platform terminal 10 transmits all the tiles (6 in total) in the adjusted display field of view to the user terminal 11, so that the adjusted display field of view is displayed on the browsing interface.

藉由上述方式，本發明可有效提供目標影像的瀏覽服務。Through the above method, the present invention can effectively provide the browsing service of the target image.

接著執行步驟S22：平台端10透過知識獲取模組21判斷針對目前的顯示視野是否有對應的問題需讓用戶回答，若有的話則執行問答功能，若沒有對應的問題，則不執行問答。Then step S22 is performed: the platform terminal 10 judges through the knowledge acquisition module 21 whether there is a corresponding question for the current display field to be answered by the user, if there is a question and answer function, if there is no corresponding question, no question and answer is performed.

於一實施例中，資料庫12包括知識庫121。知識庫121可儲存對應目標影像的不同的顯示視野的知識資訊。前述知識資訊可包括多個待回答問題、多個問答集(已回答問題)、多個標的資訊及/或動作資訊等。In one embodiment, the database 12 includes a knowledge base 121. The knowledge base 121 can store knowledge information corresponding to different display fields of the target image. The aforementioned knowledge information may include multiple questions to be answered, multiple question and answer sets (answered questions), multiple target information and/or action information, etc.

前述各問題是針對所對應的顯示視野的影像特徵(如顯示視野中的物件的形狀、種類、顏色、變化等特徵)所預先設定的，或者透過機器學習自動產生的(如圖4)，並且是用來分析影像特徵。The aforementioned questions are pre-set for the corresponding image characteristics of the display field of view (such as the shape, type, color, change, etc. characteristics of the object in the display field of view), or automatically generated through machine learning (Figure 4), and It is used to analyze image characteristics.

於一實施例中，各顯示視野被關聯至一組識別碼(可基於所涵蓋的圖磚的識別碼來加以決定，基於對應的視野參數所決定，或依序編號等，不加以限定)。並且，與此顯示視野有關的知識資訊都被對應至相同的識別碼。In one embodiment, each display field of view is associated with a set of identification codes (which can be determined based on the identification codes of the covered tiles, determined based on the corresponding field of view parameters, or serially numbered, etc., which are not limited). Moreover, the knowledge information related to the display field of view is all corresponding to the same identification code.

步驟S40：平台端10透過問答模組210來基於目前的顯示視野的識別碼自知識庫121取得對應的問題。Step S40: The platform side 10 obtains the corresponding question from the knowledge base 121 based on the identification code of the current display field through the question and answer module 210.

步驟S41：平台端10透過問答模組210於用戶端11的瀏覽介面顯示所取得的問題，可提供回答介面來供用戶作答。Step S41: The platform 10 displays the obtained questions on the browsing interface of the client 11 through the question and answer module 210, and can provide an answer interface for the user to answer.

步驟S42：平台端10透過問答模組210透過回答介面接收用戶輸入的回應答案。Step S42: The platform terminal 10 receives the response answer input by the user through the answer interface through the question and answer module 210.

舉例來說，請參閱圖10，當用戶瀏覽到圖10的顯示視野時，問答模組210可顯示問答介面60或61。For example, referring to FIG. 10, when the user browses to the display field of view of FIG. 10, the question and answer module 210 can display the question and answer interface 60 or 61.

以問答介面60為例，問題為開放式問題(如問答題)，問答介面60包括文字輸入區，用戶是於文字輸入區中輸入文字內容作為回應答案。Take the question and answer interface 60 as an example. The question is an open question (such as a question and answer). The question and answer interface 60 includes a text input area, and the user enters text in the text input area as a response answer.

其他開放式問答可例如為：「問：影像中哪個器官系統異常?答：心血管」、「問：此處以細胞角蛋白7染色的膽管細胞與赫氏小管是什麼?答：免疫組織化學染色」、「問：圖中電子元件如何連接?答：錫焊」等，不加以限定。Other open-ended questions and answers can be, for example: "Q: Which organ system is abnormal in the image? A: Cardiovascular", "Q: What are the bile duct cells and Hexagonal tubules stained with cytokeratin 7 here? A: Immunohistochemical staining ", "Question: How to connect the electronic components in the picture? Answer: Soldering", etc., there is no limitation.

以問答介面61為例，問題為封閉式問題(如選擇題)，問答介面60包括多個答案選項，用戶是選擇一或多個答案選項作為回應答案。Taking the question and answer interface 61 as an example, the question is a closed question (such as a multiple-choice question), the question and answer interface 60 includes multiple answer options, and the user selects one or more answer options as a response answer.

其他封閉式問答可例如為：「問：是否組織蛋白亞基帶正電而使帶負電的DNA更緊密穩定?選項：是；否」、「問：這裡是腦栓塞區域嗎? 選項：是；否」、「問：圖中有幾個電子元件? 選項：1個；2個；3個」等，不加以限定。Other closed question and answer can be for example: "Q: Is the tissue protein subunits positively charged to make the negatively charged DNA more compact and stable? Option: Yes; No", "Q: Is this the cerebral embolism area? Option: Yes; No ", "Question: How many electronic components are there in the picture? Options: 1; 2; 3", etc., which are not limited.

於一實施例中，本發明進一步提供標的設定功能，具體而言，於任一顯示視野下，步驟S23：平台端10透過標的模組212可以輸入介面來接受用戶端11觀看目前的顯示視野後所設定的標的資訊。前述標的資訊是作為用戶對全影像的量化特徵的人工分析結果，並作為此目標影像的訓練參數。In one embodiment, the present invention further provides a target setting function. Specifically, in any display field of view, step S23: the platform terminal 10 can input the interface through the target module 212 to accept the user terminal 11 after viewing the current display field of view The set target information. The aforementioned target information is used as a result of the user's manual analysis of the quantitative features of the full image, and used as the training parameter of the target image.

以判斷癌細胞的比例為例，當專家以低倍率(視野較廣)觀看病理影像時，可能認為癌細胞比例高(如推估80%)，而設定此病理影像的標的資訊為「癌細胞比例80%」。於調整顯示視野後，如以中倍率(視野較窄)觀看病理影像時，可能發現部分細胞並非癌細胞(如推估60%)，此時可修正此病理影像的標的資訊為「癌細胞比例60%」。Take the judgment of the proportion of cancer cells as an example. When experts watch pathological images at low magnification (wider field of view), they may think that the proportion of cancer cells is high (e.g. 80%), and set the target information of this pathological image as "cancer cells". The ratio is 80%". After adjusting the display field of view, such as viewing the pathological image at medium magnification (narrower field of view), it may be found that some of the cells are not cancer cells (e.g. 60%). At this time, the target information of this pathological image can be corrected as "Proportion of Cancer Cells" 60%".

於一實施例中，本發明進一步提供知識資訊提示功能，具體而言，於任一顯示視野下，步驟S24：知識處理與提供模組213判斷目前的顯示視野於知識庫121中是否存在對應的知識資訊。若有，則於瀏覽介面顯示此知識資訊，以對此顯示視野進行說明。In one embodiment, the present invention further provides a knowledge information prompt function. Specifically, in any display field of view, step S24: the knowledge processing and providing module 213 determines whether the current display field of view corresponds to the knowledge base 121 Knowledge information. If so, the knowledge information is displayed on the browsing interface to explain the display field of view.

於一實施例中，影像資訊處理模組202於每次決定顯示視野後，可生成此顯示視野的影像瀏覽資訊(如識別參數或識別碼)，知識獲取模組21可基於影像瀏覽資訊搜尋知識庫121來偵測是否存在與此顯示視野有關的知識資訊(如先前輸入的問答集或整張目標影像的標的資訊，或AI自動判讀的內容)。In one embodiment, each time the image information processing module 202 determines the display field of view, it can generate image browsing information (such as identification parameters or identification codes) of the display field of view, and the knowledge acquisition module 21 can search for knowledge based on the image browsing information. The library 121 detects whether there is knowledge information related to the display field of view (such as the previously input question and answer set or the target information of the entire target image, or the content automatically interpreted by AI).

舉例來說，請參閱圖11，當用戶瀏覽到圖11的顯示視野時，知識獲取模組21找到目前的顯示視野於知識庫有對應的知識資訊(異常比例70%)，可將知識資訊顯示於標的介面62。此外，若用戶判斷知識資訊不正確時，亦可透過輸入介面進行修改(如修改為90%)，來使標的模組212設定新的標的資訊進行學習。For example, please refer to Fig. 11. When the user browses to the display field of view in Fig. 11, the knowledge acquisition module 21 finds that the current display field of view has corresponding knowledge information in the knowledge base (abnormal ratio 70%), and can display the knowledge information于标’s interface 62. In addition, if the user judges that the knowledge information is incorrect, he can also modify it through the input interface (for example, modify it to 90%) to enable the target module 212 to set new target information for learning.

於一實施例中，本發明進一步提供瀏覽動作判斷功能，具體而言，於每次切換顯示視野時，動作模組211可取得前一個顯示視野的前一個顯示基準點、前一個縮放等級或前一個旋轉角度，並與目前的顯示視野的顯示基準點、縮放等級或旋轉角度進行比較，以決定從前一個顯示視野切換至目前的顯示視野的瀏覽動作(例如是平移方向、平移量、放大等級、旋轉方向、旋轉角度等)，並將瀏覽動作加入至動作資訊，即依據兩組視野參數的變化來判斷瀏覽動作的內容。In one embodiment, the present invention further provides a browsing action judgment function. Specifically, each time the display field of view is switched, the action module 211 can obtain the previous display reference point, the previous zoom level or the previous display field of view. A rotation angle, which is compared with the display reference point, zoom level or rotation angle of the current display field of view to determine the browsing action of switching from the previous display field to the current display field of view (e.g. pan direction, pan amount, magnification level, Rotation direction, rotation angle, etc.), and the browsing action is added to the action information, that is, the content of the browsing action is determined based on the changes of the two sets of field of view parameters.

接著，執行步驟S25：平台端10透過訓練資料產生模組221關聯顯示視野與對應的知識資訊(例如是用戶完成的問答集、標的資訊及/或動作資訊)來產生訓練資料。Then, step S25 is performed: the platform terminal 10 uses the training data generation module 221 to associate the display field of view with the corresponding knowledge information (for example, a question and answer set completed by the user, target information and/or action information) to generate training data.

於一實施例中，本發明進一步提供記錄功能(步驟S26-S27)。In one embodiment, the present invention further provides a recording function (steps S26-S27).

步驟S26：於瀏覽過程中，平台端10透過知識記錄與處理模組22判斷預設的記錄條件是否滿足。前述記錄條件可為每次變換顯示視野、每次操作後或定時(如10秒)，不加以限定。Step S26: During the browsing process, the platform terminal 10 judges whether the preset recording condition is satisfied through the knowledge recording and processing module 22. The aforementioned recording conditions can be every time the display field of view is changed, after every operation, or timing (such as 10 seconds), and is not limited.

若記錄條件不滿足，則持續監測。If the recording conditions are not met, continue monitoring.

若記錄條件滿足，則執行步驟S27：平台端10透過影像擷取模組220基於前述之影像瀏覽資訊擷取顯示視野的影像，知識記錄與處理模組22記錄顯示視野的影像與知識資訊(如用戶完成的問答集、標的資訊及/或動作資訊)，並進行關聯。If the recording conditions are met, step S27 is performed: the platform terminal 10 uses the image capturing module 220 to capture the image of the display field of view based on the aforementioned image browsing information, and the knowledge recording and processing module 22 records the image of the display field of view and knowledge information (such as Questions and answers completed by the user, subject information and/or action information), and associated.

於一實施例中，資料庫12包括瀏覽記錄122。知識記錄與處理模組22是關聯顯示視野的視野參數與知識資訊，並作為瀏覽歷程(可包括動作歷程、所有顯示視野的問答集與標的資訊)記錄於瀏覽記錄122。In one embodiment, the database 12 includes browsing records 122. The knowledge recording and processing module 22 associates the visual field parameters and knowledge information of the display field of view, and records them in the browse record 122 as a browsing history (which may include the action history, all the questionnaires and target information showing the field of view).

請一併參閱圖8，為本發明一實施例的方法的訓練與自動分析的流程圖。本發明進一步提供訓練功能(步驟S60)與自動分析功能(步驟S61)。Please also refer to FIG. 8, which is a flowchart of training and automatic analysis of the method according to an embodiment of the present invention. The present invention further provides a training function (step S60) and an automatic analysis function (step S61).

步驟S60：AI模組13依據訓練資料對學習模型130進行訓練。於一實施例中步驟S60包括步驟S70-S73。Step S60: The AI module 13 trains the learning model 130 according to the training data. In one embodiment, step S60 includes steps S70-S73.

步驟S70：AI模組13載入取得學習模型130。Step S70: The AI module 13 loads and obtains the learning model 130.

步驟S71：AI模組13透過訓練模組132對訓練資料進行分析。Step S71: The AI module 13 analyzes the training data through the training module 132.

於一實施例中，訓練資料包括顯示視野的影像與其對應的知識資訊(例如是問答集、動作資訊、標的資訊等)。訓練模組132是分析顯示視野的影像特徵，並基於知識資訊(如問答集的問題與用戶輸入的標的資訊)產生文字特徵，即將知識資訊轉換至語意網。In one embodiment, the training data includes an image showing the field of view and its corresponding knowledge information (for example, question-and-answer set, action information, target information, etc.). The training module 132 analyzes the image characteristics of the display field of view, and generates text characteristics based on knowledge information (such as questions in the question and answer set and the target information input by the user), that is, the knowledge information is converted to the semantic network.

於一實施例中，AI模組13可先透過資料轉換模組131將訓練資料轉換為訓練模組132及/或學習模型130可接受的格式，再進行分析與訓練。In one embodiment, the AI module 13 may first convert the training data into a format acceptable to the training module 132 and/or the learning model 130 through the data conversion module 131, and then perform analysis and training.

步驟S72：訓練模組132輸入影像特徵與文字特徵至學習模型130以產生預測知識資訊(例如是問題的預測答案、預測標的資訊或預測動作資訊等)。Step S72: The training module 132 inputs image features and text features to the learning model 130 to generate predictive knowledge information (for example, predictive answers to questions, predictive target information, or predictive action information, etc.).

步驟S73：訓練模組132比較預測知識資訊與訓練資料的知識資訊來訓練學習模型。如比較預測答案與用戶的回應答案，比較預測標的資訊與用戶輸入的標的資訊，或比較預測動作資訊與用戶實際的瀏覽動作等，不加以限定。Step S73: The training module 132 compares the predicted knowledge information with the knowledge information of the training data to train the learning model. For example, comparing the predicted answer with the user's response answer, comparing the predicted target information with the target information input by the user, or comparing the predicted action information with the user's actual browsing action, etc., there is no limitation.

步驟S61：AI模組13接收影像輸入來產生對應的預測知識資訊。於一實施例中步驟S61包括步驟S80-S81。Step S61: The AI module 13 receives image input to generate corresponding prediction knowledge information. In one embodiment, step S61 includes steps S80-S81.

步驟S80：AI模組13接受(例如是用戶端11或平台端10)操作來選擇一張影像(另一目標影像)。Step S80: The AI module 13 accepts (for example, the client 11 or the platform 10) operation to select an image (another target image).

步驟S81：AI模組13輸入此目標影像至學習模型130，以獲得此目標影像的預測知識資訊(可包括此目標影像的一或多個顯示視野的預測動作歷程、預測問題與預測標的資訊)，並可儲存於知識庫121。Step S81: The AI module 13 inputs the target image to the learning model 130 to obtain the predictive knowledge information of the target image (which may include the predictive action history of one or more display fields of the target image, predictive questions, and predictive target information) , And can be stored in the knowledge base 121.

請參閱圖12，為本發明一實施例的動作歷程的示意圖。於完成目標影像的歷程記錄後，本發明的平台端10可提供回放功能，於瀏覽介面中顯示目標影像的動作歷程70、顯示視野71與縮圖介面72。用戶端可於事後選擇動作歷程70的任一步驟進行觀看，或者連續撥放動作歷程70的多個步驟，而重現分析時的瀏覽路徑。Please refer to FIG. 12, which is a schematic diagram of the action history of an embodiment of the present invention. After completing the history recording of the target image, the platform terminal 10 of the present invention can provide a playback function, displaying the action history 70 of the target image, the display field of view 71 and the thumbnail interface 72 in the browsing interface. The user terminal can select any step of the action history 70 to watch afterwards, or continuously play multiple steps of the action history 70 to reproduce the browsing path during analysis.

請參閱圖13，為本發明一實施例的熱點圖的示意圖。Please refer to FIG. 13, which is a schematic diagram of a heat map according to an embodiment of the present invention.

於對目標影像進行分析後，本發明的平台端10可提供熱點圖功能，可顯示目標影像在演算法分析的過程中，哪些部分是模型關注的區域。After analyzing the target image, the platform terminal 10 of the present invention can provide a heat map function, which can display which part of the target image is the area of interest of the model during the algorithm analysis process.

如圖所示，平台端10於瀏覽介面中顯示目標影像的顯示視野介面80與縮圖介面81。於顯示視野介面80中，對於演算法分析信心程度較高的區域，會以較接近淺色熱區801表示，對於演算法分析信心程度較低的區域，會以深色熱區802表示，此外白色區域800指的是背景區域。As shown in the figure, the platform terminal 10 displays the field of view interface 80 and the thumbnail interface 81 of the target image in the browsing interface. In the display field of view interface 80, areas with a higher degree of confidence in algorithm analysis will be represented by lighter hot areas 801, and areas with lower confidence in algorithm analysis will be represented by darker hot areas 802. In addition, The white area 800 refers to the background area.

藉此，用戶或其他用戶可直接了解目標影像的各位置的重要性，而可直接忽略白色熱區800、深色熱區802(不重要區域)的瀏覽來節省時間，並更專注在淺色熱區801(重要區域) 的瀏覽來提升精確度。In this way, the user or other users can directly understand the importance of each position of the target image, and can directly ignore the browsing of the white hot zone 800 and dark hot zone 802 (unimportant areas) to save time and focus more on light colors Browse hot zone 801 (important area) to improve accuracy.

本發明透過於指定的顯示視野提示問題或知識資訊，可以讓專家知悉目前的顯示視野為感興趣區域，而更為專注在目前的顯示視野的分析。In the present invention, by prompting questions or knowledge information in a designated display field of view, experts can know that the current display field of view is a region of interest, and focus more on the analysis of the current display field of view.

本發明透過網頁提供目標影像的瀏覽服務(web模組104)，可讓世界各地的專家透過網頁瀏覽器來觀看相同的目標影像，並對相同的目標影像進行瀏覽操作(包括回答問題與設定標的資訊)，而可以收集到不同專家對於相同的目標影像的不同知識資訊(即訓練資料)。The present invention provides a browsing service (web module 104) of target images through web pages, allowing experts from all over the world to view the same target images through web browsers, and browse the same target images (including answering questions and setting target images). Information), and different knowledge information (ie training data) of different experts for the same target image can be collected.

並且，後續藉由彙整這些知識資訊，提供給 AI 模組13進行訓練，作為學習模型130的訓練資料，而可以生成專家驗證且高可靠度的學習模型130。未來其他專家(如病理醫師)進行影像判讀時，可先參考AI 模組13判讀的成果(如預測知識資訊)作為後續病理報告撰寫的依據，由於參考來源一致，可以提升專家們最終判讀的結果的一致性。In addition, the subsequent collection of these knowledge information is provided to the AI module 13 for training as training data for the learning model 130, so that an expert-verified and highly reliable learning model 130 can be generated. In the future, when other experts (such as pathologists) perform image interpretation, they can first refer to the results of AI module 13 interpretation (such as predictive knowledge information) as the basis for subsequent pathology report writing. Since the reference sources are consistent, the final interpretation results of the experts can be improved Consistency.

本發明透過網頁提供目標影像的瀏覽服務還可降低實體玻片運輸的時間與損壞風險。此外，透過提示目前顯示視野的知識資與接受標的資訊的輸入，可完整記錄專家在閱片過程中的想法，且可分享給其他專家觀看，有助於讓不同專家理解彼此的分析想法與策略。The present invention provides the browsing service of the target image through the webpage and can also reduce the transportation time and damage risk of the physical slide. In addition, by prompting the knowledge of the current display field of view and accepting the input of the target information, the expert’s thoughts during the review process can be fully recorded, and can be shared with other experts for viewing, which helps different experts understand each other’s analysis ideas and strategies .

當然，本發明還可有其它多種實施例，在不背離本發明精神及其實質的情況下，本發明所屬技術領域中具有通常知識者當可根據本發明作出各種相應的改變和變形，但這些相應的改變和變形都應屬於本發明所附的申請專利範圍。Of course, the present invention can also have various other embodiments. Without departing from the spirit and essence of the present invention, those with ordinary knowledge in the technical field to which the present invention belongs can make various corresponding changes and modifications according to the present invention, but these Corresponding changes and modifications should belong to the scope of the patent application attached to the present invention.

10:平台端10: Platform side

100:處理模組100: Processing module

101:儲存模組101: storage module

102:通訊模組102: Communication module

103、110:人機介面103, 110: Human-machine interface

104:web模組104: web module

105:管理模組105: Management Module

11:用戶端11: User side

12:資料庫12: Database

120:影像庫120: Image library

121:知識庫121: Knowledge Base

122:瀏覽記錄122: browsing history

123:用戶資料庫123: User database

13:AI模組13: AI module

130:學習模型130: learning model

131:資料轉換模組131: Data Conversion Module

132:訓練模組132: Training Module

20:影像瀏覽與記錄模組20: Image browsing and recording module

200:影像檢視模組200: Image viewing module

201:影像操作模組201: Image operation module

202:影像資訊處理模組202: Image Information Processing Module

21: 知識獲取模組21: Knowledge Acquisition Module

210:問答模組210: Q&A Module

211:動作模組211: Action Module

212:標的模組212: target module

213:知識處理與提供模組213: Knowledge Processing and Providing Module

22:知識紀錄與處理模組22: Knowledge Record and Processing Module

220:影像擷取模組220: Image capture module

221:訓練資料產生模組221: Training data generation module

30-32:目標影像30-32: Target image

40-42:顯示基準點40-42: Display reference point

50-51:範圍50-51: range

60-62、70-72、80-81:介面60-62, 70-72, 80-81: interface

800-802:區域800-802: area

S10-S13:第一訓練資料產生步驟S10-S13: The first training data generation step

S20-S27:第二訓練資料產生步驟S20-S27: Second training data generation step

S30-S34:顯示步驟S30-S34: Display steps

S40-S42:瀏覽步驟S40-S42: Browse steps

S50-S52:問答步驟S50-S52: Q&A Steps

S60、S70-S73:訓練步驟S60, S70-S73: training steps

S61、S80-S81:自動分析步驟S61, S80-S81: automatic analysis steps

圖1為本發明一實施例的系統的架構圖；FIG. 1 is an architecture diagram of a system according to an embodiment of the present invention;

圖2為本發明一實施例的平台端的部分架構圖；2 is a partial architecture diagram of the platform side according to an embodiment of the present invention;

圖3為本發明一實施例的學習模型的輸出入的示意圖；FIG. 3 is a schematic diagram of the input and output of a learning model according to an embodiment of the present invention;

圖4為本發明一實施例的學習模型的輸出入的示意圖；4 is a schematic diagram of the input and output of a learning model according to an embodiment of the present invention;

圖5為本發明一實施例的學習模型的輸出入的示意圖；FIG. 5 is a schematic diagram of the input and output of a learning model according to an embodiment of the present invention;

圖6為本發明一實施例的方法的流程圖；Figure 6 is a flowchart of a method according to an embodiment of the present invention;

圖7A為本發明一實施例的方法的第一部分流程圖；FIG. 7A is a flowchart of the first part of a method according to an embodiment of the present invention;

圖7B為本發明一實施例的方法的第二部分流程圖；FIG. 7B is a flowchart of the second part of the method according to an embodiment of the present invention;

圖8為本發明一實施例的方法的訓練與自動分析的流程圖；FIG. 8 is a flowchart of training and automatic analysis of a method according to an embodiment of the present invention;

圖9為本發明一實施例的目標影像的顯示視野的示意圖；FIG. 9 is a schematic diagram of a display field of view of a target image according to an embodiment of the present invention;

圖10為圖9的另一顯示視野的示意圖；FIG. 10 is another schematic diagram showing the field of view of FIG. 9;

圖11為圖9的另一顯示視野的示意圖；FIG. 11 is another schematic diagram showing the field of view of FIG. 9;

圖12為本發明一實施例的動作歷程的示意圖；及FIG. 12 is a schematic diagram of the action history of an embodiment of the present invention; and

圖13為本發明一實施例的熱點圖的示意圖。FIG. 13 is a schematic diagram of a heat map according to an embodiment of the present invention.

Claims

A method of generating training data through question and answer, including: a) Provide browsing of a target image on a platform; b) Accept the operation to adjust a display field of view of the target image; c) Present a question used to analyze the display field of view and obtain a response answer; and d) Associating the display field of view with a corresponding question and answer set to generate a training data for automatically analyzing the target image, wherein the question and answer set includes the question and the response answer.

The method for generating training data through question and answer as described in claim 1, wherein the step a) includes: a1) After logging in to the platform, select one of a plurality of images from an image library as the target image, or upload an image to the platform as the target image; a2) Determine the display field of view of the target image on the platform side based on at least one of a display reference point, a zoom level and a rotation angle; and a3) Output the image of the display field of view to present the display field of view.

The method for generating training data through question and answer as described in claim 1, wherein the target image is divided into a plurality of tiles; the step b) includes: b1) Accept an operation to adjust a field of view parameter, where the field of view parameter includes at least one of a display reference point, a zoom level, and a rotation angle; b2) Determine all the tiles that enter the display field of view based on the adjusted field of view parameters as the adjusted image of the display field of view; and b3) Output the adjusted image of the display field of view to present the adjusted display field of view.

The method for generating training data through question and answer as described in claim 1, wherein the step c) includes: c1) Obtain the corresponding question from a database based on an identification code of the display field of view, where the question is preset for an image feature of the display field of view or automatically generated through machine learning, and is used to analyze the Image feature c2) Show the problem; and c3) Receive the response answer.

The method for generating training data through question and answer as described in claim 4, wherein the question is a closed question; the step c3) is to display a plurality of answer options, and use the selected answer option as the response answer.

The method for generating training data through question and answer as described in claim 4, wherein the question is an open-ended question; the step c3) is to display a text input area, and use the text inputted in the text input area as the response answer.

The method of generating training data through question and answer as described in claim 1 further includes: e) When a recording condition is met, capture the image of the display field of view and record a corresponding piece of knowledge information, where the knowledge information includes the question and answer collection and a browsing action; Wherein, the step e) when recording the browsing action is to obtain the previous display reference point, the previous zoom level or the previous rotation angle of the previous display field of view, and the previous display reference point, the previous zoom level or the previous display reference point The previous rotation angle is compared with a display reference point, a zoom level or a rotation angle of the current display field of view to determine the browsing action of switching from the previous display field of view to the current display field of view; The step d) is to associate the display field of view and the knowledge information with an identification code for correlation.

The method for generating training data through question and answer as described in claim 1, further includes at least one of the following steps: f) Accept the operation to set a target information, where the target information is an analysis result of an image feature of the target image; and g) When judging that the current display field of view has a corresponding knowledge information in a database, display the knowledge information, wherein the knowledge information is used to explain the display field of view.

The method of generating training data through question and answer as described in claim 1, further includes: h1) Obtain a learning model; h2) Analyze an image feature of the display field of view of the training data, and generate a text feature based on the question and answer set; h3) Input the image feature and the text feature to the learning model to generate a predicted answer; and h4) Train the learning model by comparing the predicted answer with the response answer.

The method of generating training data through question and answer as described in claim 1, further includes: i1) Accept the operation to select an input image; and i2) Input the input image to a learning model to obtain a predictive knowledge information of the input image.

A system for generating training data through question and answer, including: A database for storing a target image; and A platform, connected to the database, and connected to a client via the network, the platform is configured to provide a browsing interface on the client to browse the target image, and accept operations from the client to adjust the target image A display field of view presents a question for analyzing the display field of view, obtains a response answer, and associates the display field of view with a corresponding question and answer set to generate a training data for automatically analyzing the target image, wherein the question set Include the answer to the question and the response.

The system for generating training data through question and answer as described in claim 11, wherein the database includes: A user database for storing multiple registration data; and An image library for storing multiple images; Among them, the platform side includes: A web module configured to interact with the client through a web page to receive a login data of the client, and select one of the images in the image library or upload an image from the client as the target Image; and A management module is configured to accept login when the login information matches any of the registration information.

As described in claim 11, the system for generating training data through question and answer, wherein the platform side includes: An image viewing module is set to obtain the target image, wherein the target image is divided into a plurality of tiles; An image operation module configured to adjust a field of view parameter according to the operation of the user terminal, wherein the field of view parameter includes at least one of a display reference point, a zoom level, and a rotation angle; and An image information processing module is set to determine all the tiles that enter the display field of view based on the adjusted field of view parameters as the adjusted image of the display field of view, and output the adjusted image of the display field of view to the client , To present the adjusted display field of view on the user side.

The system for generating training data through question and answer as described in claim 11, wherein the database includes a knowledge base for storing a plurality of questions corresponding to the different display fields of the target image, and the questions are specific to the display field An image feature is preset or automatically generated through machine learning, and is used to analyze the image feature, and the display field of view and the corresponding question are associated with an identification code; Among them, the platform side includes: A question and answer module is configured to obtain the corresponding question based on the identification code of the current display field of view, display the question on the client, and receive the response answer from the client.

The system for generating training data through question and answer as described in claim 14, wherein the question is a closed question or an open question; Among them, the question and answer module is set to display multiple answer options on the client when the question is a closed question, and use the selected answer option as the response answer, and when the question is a closed question , Display a text input area, and use the text content entered in the text input area as the response answer.

As described in claim 11, the system for generating training data through question and answer, wherein the platform side includes: An action module is set to obtain the previous display reference point, the previous zoom level or the previous rotation angle of the previous display field of view, and the previous display reference point, the previous zoom level or the previous rotation angle and Compare a display reference point, a zoom level or a rotation angle of the current display field of view to determine a browsing action of switching from the previous display field of view to the current display field of view; and A knowledge recording and processing module is set to capture an image of the display field of view and record a corresponding knowledge information when a recording condition is met, wherein the knowledge information includes the question and answer set and the browsing action, the display field of view and The knowledge information is corresponding to an identification code for association.

As described in claim 11, the system for generating training data through question and answer further includes an AI module, and the AI module includes: A learning model for automatically generating at least one predicted browsing action of an input image, simulating browsing of the input image based on the at least one predicted browsing action to transform the display field of view, and image analysis based on the transformed display field of view A prediction target information of the input image; Wherein, the learning model includes a DQN architecture model; Among them, the information of the prediction target is associated with the input image.

According to claim 11, the system for generating training data through question and answer, wherein the database includes a knowledge base for storing a plurality of knowledge information corresponding to different display fields, and the knowledge information is used to explain the display field ； Among them, the platform end includes: A target module is configured to accept operations from the client to set target information, where the target information is an analysis result of an image feature of the target image; and A knowledge processing and providing module is set to display the knowledge information on the client when it is determined that the current display field of view has corresponding knowledge information; Wherein, the platform is configured to associate the display field of view with the knowledge information to generate the training data, and the knowledge information includes the question and answer set and the set target information.

As described in claim 11, the system for generating training data through question and answer further includes an AI module, and the AI module includes: A learning model for automatically analyzing an input image to automatically generate predictive knowledge information of at least one display field of the input image, wherein the predictive knowledge information includes a predictive action history, a predictive target information, and any display At least one of a predicted question of the field of view and a predicted answer of any one of the displayed fields of view.

The system for generating training data through question and answer as described in claim 19, wherein the learning model includes a VQA framework model; Wherein, the AI module further includes a training module configured to analyze an image feature of the display field of view of the training data, and generate a text feature based on a knowledge information of the same display field of view of the training data, and input the image The feature and the text feature are sent to the learning model to generate the predicted answer, and the learning model is adjusted by comparing the predicted answer with the response answer.