TWI801982B - Classifying system and classifying method of automatically classifying digital file - Google Patents

Classifying system and classifying method of automatically classifying digital file Download PDF

Info

Publication number
TWI801982B
TWI801982B TW110131855A TW110131855A TWI801982B TW I801982 B TWI801982 B TW I801982B TW 110131855 A TW110131855 A TW 110131855A TW 110131855 A TW110131855 A TW 110131855A TW I801982 B TWI801982 B TW I801982B
Authority
TW
Taiwan
Prior art keywords
category
classification
input file
comparison
data structure
Prior art date
Application number
TW110131855A
Other languages
Chinese (zh)
Other versions
TW202309757A (en
Inventor
羅崇銘
Original Assignee
國立政治大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 國立政治大學 filed Critical 國立政治大學
Priority to TW110131855A priority Critical patent/TWI801982B/en
Publication of TW202309757A publication Critical patent/TW202309757A/en
Application granted granted Critical
Publication of TWI801982B publication Critical patent/TWI801982B/en

Links

Images

Landscapes

  • Sorting Of Articles (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A classifying method of automatically classifying a digital file is disclosed and includes following steps: performing a classification procedure to an input file in accordance with a deep-learning classification model to determine a first category that the input file probably belongs to; reading a relevant data structure to confirm at least one slave category relevant to the first category, wherein the relevant data structure records relationships of multiple pre-defined categories and one or more slave categories; performing a feature comparing procedure to compare the input file with at least one digital file of the slave category in accordance with a machine-learning comparing model; and, labelling the input file as the first category and modifying a confidence of the input file belonging the first category when a comparing result of the feature comparing procedure matches a classification condition.

Description

自動化數位檔案分類系統及分類方法 Automatic digital file classification system and classification method

本發明涉及一種分類系統及分類方法,尤其涉及一種對數位檔案進行分類的分類系統及分類方法。 The invention relates to a classification system and a classification method, in particular to a classification system and a classification method for classifying digital files.

隨著時代的演進以及科技的進步,近年來越來越多的類比資料被數位化,以利於使用者收藏以及日後進行搜尋。 With the evolution of the times and the advancement of technology, more and more analog data have been digitized in recent years for the convenience of users to collect and search in the future.

然而,隨著數位資料的資料量爆增,資料處理的相關問題開始產生。舉例來說,為了使數位資料容易被搜尋,必須先對資料進行分類,但傳統將類比資料數位化後再由人工進行分類的方式,於資料量龐大的情況下將難以實現。除了需要花費大量的人力成本及時間之外,也容易有人為分類錯誤的情況產生。因此,自動化分類系統開始受到重視。 However, as the amount of digital data explodes, problems related to data processing begin to arise. For example, in order to make digital data easy to search, the data must be classified first, but the traditional method of digitizing analog data and then manually classifying it will be difficult to achieve in the case of a huge amount of data. In addition to requiring a lot of labor costs and time, it is also prone to human error in classification. Therefore, automated classification systems began to receive attention.

另外,隨著資料量爆增,新形態的商品、物件、音訊、疾病等資料不斷出現,導致資料類別越來越繁雜,也使得市場上的自動化分類系統的分別準確率持續下降。由此顯示,目前單一階層的分類方式(即,僅單純進行一次的分類程序)已經不敷使用。 In addition, with the explosion of data volume, new forms of data such as commodities, objects, news, and diseases continue to appear, resulting in more and more complex data types, which also makes the classification accuracy of automated classification systems on the market continue to decline. This shows that the current single-level classification method (that is, a classification procedure that is only performed once) is no longer sufficient.

有鑑於此,市場上實迫切需要一種新穎且高準確率的自動化分類系統,以解決數位化資料的分類問題。 In view of this, there is an urgent need in the market for a novel and highly accurate automatic classification system to solve the classification problem of digital data.

本發明的主要目的,在於提供一種自動化數位檔案分類系統及分類方法,可於深度學習分類模型的分類結果的可信度不足時,通過機器學習比對模型來將數位檔案與具有相關性的其他類別進行特徵比對,藉此提高分類結果的準確率。 The main purpose of the present invention is to provide an automated digital archive classification system and classification method, which can compare digital archives with other relevant ones through machine learning comparison models when the reliability of the classification results of the deep learning classification model is insufficient. Compare the features of the categories to improve the accuracy of the classification results.

為了達成上述的目的,本發明的分類方法主要應用於一分類系統,並且包括:a)由該分類系統接收一輸入檔案;b)該分類系統基於預先訓練完成的一深度學習分類模型對該輸入檔案進行一分類程序,其中該分類程序輸出該輸入檔案可能的一第一類別以及對應的一可信度;c)該分類系統讀取一關聯性資料結構以確認與該第一類別相關聯的至少一從屬類別,其中該關聯性資料結構記錄預先定義的複數類別與一或多個該從屬類別間的一關聯性;d)該分類系統基於預先訓練完成的一機器學習比對模型對該輸入檔案與該從屬類別中的至少一個數位檔案進行一特徵比對程序;及e)於該特徵比對程序的一比對結果符合一分類條件時,將該輸入檔案標記為該第一類別,並提高該輸入檔案屬於該第一類別的該可信度。 In order to achieve the above-mentioned purpose, the classification method of the present invention is mainly applied to a classification system, and includes: a) receiving an input file by the classification system; the file undergoes a classification procedure, wherein the classification procedure outputs a possible first category of the input file and a corresponding confidence level; c) the classification system reads an associative data structure to identify the at least one subcategory, wherein the association data structure records an association between a predefined plurality of categories and one or more subcategories; d) the classification system is based on a pre-trained machine learning comparison model for the input performing a feature comparison procedure on the file and at least one digital file in the subordinate category; and e) marking the input file as the first category when a comparison result of the feature comparison procedure meets a classification condition, and The confidence that the input profile belongs to the first category is increased.

為了達成上述的目的,本發明的分類系統主要包括:一輸入單元,接收待分類的一輸入檔案; 一儲存單元,儲存一深度學習分類模型、一機器學習比對模型及一關聯性資料結構,其中該關聯性資料結構記錄預先定義的複數類別與一或多個從屬類別間的一關聯性;一處理單元,連接該輸入單元及該儲存單元,並且執行下列程序:基於預先訓練完成的該深度學習分類模型對該輸入檔案進行一分類程序,其中該分類程序輸出該輸入檔案可能的該第一類別以及對應的一可信度;讀取該關聯性資料結構以確認與該第一類別相關聯的至少一個該從屬類別;基於預先訓練完成的該機器學習比對模型對該輸入檔案與該從屬類別中的至少一個數位檔案進行一特徵比對程序;及於該特徵比對程序的一比對結果符合一分類條件時將該輸入檔案標記為該第一類別,並提高該輸入檔案屬於該第一類別的該可信度;及一輸出單元,連接該處理單元,輸出該輸入檔案所屬的該第一類別以及對應的該可信度。 In order to achieve the above-mentioned purpose, the classification system of the present invention mainly includes: an input unit, receiving an input file to be classified; A storage unit storing a deep learning classification model, a machine learning comparison model, and an associative data structure, wherein the associative data structure records an association between predefined plural categories and one or more subcategories; a The processing unit is connected to the input unit and the storage unit, and executes the following procedure: performing a classification procedure on the input file based on the pre-trained deep learning classification model, wherein the classification procedure outputs the possible first category of the input file and a corresponding reliability; reading the association data structure to confirm at least one subcategory associated with the first category; comparing the input file with the subcategory based on the pre-trained machine learning comparison model at least one of the digital files in a feature comparison program; and when a comparison result of the feature comparison program meets a classification condition, the input file is marked as the first category, and the input file belongs to the first category the credibility of the category; and an output unit connected to the processing unit to output the first category to which the input file belongs and the corresponding credibility.

相對於相關技術,本發明通過至少兩個階段的程序來對輸入檔案進行分類,可以有效提高分類結果的準確率。 Compared with the related technology, the present invention classifies the input files through at least two stages of procedures, which can effectively improve the accuracy of the classification results.

1:分類系統 1: classification system

11:處理單元 11: Processing unit

12:輸入單元 12: Input unit

13:網路單元 13: Network unit

14:儲存單元 14: storage unit

141:人工智慧演算法 141: Artificial Intelligence Algorithms

142:深度學習分類模型 142:Deep Learning Classification Model

143:機器學習比對模型 143:Machine Learning Comparison Model

144:關聯性資料結構 144:Associative Data Structure

15:輸出單元 15: Output unit

2:圖形資料結構 2: Graphic data structure

21:節點 21: node

22:邊 22: side

3:樹狀資料結構 3: Tree data structure

31:根節點 31: root node

32:子節點 32: child node

33:屬性 33: Attributes

34:關聯性連結 34:Associative link

41:家電類別 41: Home appliance category

411:洗衣機 411: washing machine

42:戶外類別 42: Outdoor Category

421:雨衣 421: raincoat

43:服飾類別 43: Apparel category

431:襯衫 431: shirt

5:衣服特徵 5: Clothing features

S10~S32、S40~S58:分類步驟 S10~S32, S40~S58: classification steps

圖1為本發明的分類系統的方塊圖的第一具體實施例。 Fig. 1 is a first embodiment of the block diagram of the classification system of the present invention.

圖2為本發明的圖形資料結構示意圖的第一具體實施例。 Fig. 2 is a first specific embodiment of the structure diagram of graphic data in the present invention.

圖3A為本發明的樹狀資料結構示意圖的第一具體實施例。 FIG. 3A is a first specific embodiment of the tree structure diagram of the present invention.

圖3B為本發明的樹狀資料結構示意圖的第二具體實施例。 FIG. 3B is a second specific embodiment of the tree structure diagram of the present invention.

圖4為本發明的分類方法的流程圖的第一具體實施例。 Fig. 4 is a first specific embodiment of the flow chart of the classification method of the present invention.

圖5為本發明的特徵比對示意圖的第一具體實施例。 Fig. 5 is a first specific embodiment of the feature comparison diagram of the present invention.

圖6為本發明的特徵關聯性示意圖的第一具體實施例。 FIG. 6 is a first specific embodiment of a schematic diagram of feature correlation of the present invention.

圖7為本發明的特徵關聯性示意圖的第二具體實施例。 FIG. 7 is a second specific embodiment of the feature correlation schematic diagram of the present invention.

圖8為本發明的分類方法的流程圖的第二具體實施例。 Fig. 8 is a second specific embodiment of the flowchart of the classification method of the present invention.

茲就本發明之一較佳實施例,配合圖式,詳細說明如後。 A preferred embodiment of the present invention will be described in detail below in conjunction with the drawings.

請參閱圖1,為本發明的分類系統的方塊圖的第一具體實施例。本發明揭露了一種自動化數位檔案分類系統(下面將於說明書中簡稱為分類系統1),所述分類系統1可設置於線上(online)或線下(offline),接收使用者欲進行分類的數位檔案(下稱為輸入檔案),並且自動對輸入檔案進行內容的辨識以完成分類。藉此,使用者可以通過分類系統1自動得到輸入檔案所屬的類別,進而利於使用者對輸入檔案進行歸檔與儲存,其中不需人為介入,故分類速度快並且不會發生人為錯誤,相當便利。 Please refer to FIG. 1 , which is the first specific embodiment of the block diagram of the classification system of the present invention. The present invention discloses an automatic digital file classification system (hereinafter referred to as the classification system 1 for short in the specification). files (hereinafter referred to as input files), and automatically identify the content of the input files to complete the classification. In this way, the user can automatically obtain the category of the input file through the classification system 1, which is convenient for the user to file and store the input file. Human intervention is not required, so the classification speed is fast and no human error occurs, which is quite convenient.

所述輸入檔案可為各式需要進行分類的數位多媒體檔案,例如數位文件、數位影像及數位音訊等,但不加以限定。數位文件中記錄了特定的文字、符號及圖形等參數,分類系統1可基於這些參數自動對數位文件進行分類。數位影像中記錄了特定的影像特徵,分類系統1可基於這些影像特徵自動 對數位影像進行分類。數位音訊中記錄了特定的波形特徵,分類系統1可基於這些波形特徵自動對數位音訊進行分類。 The input files may be various digital multimedia files that need to be classified, such as digital documents, digital images, and digital audio, but not limited thereto. Specific parameters such as characters, symbols, and graphics are recorded in digital files, and the classification system 1 can automatically classify digital files based on these parameters. Specific image features are recorded in digital images, based on which the classification system 1 can automatically Classify digital images. The classification system 1 can automatically classify digital audio based on certain waveform characteristics recorded in digital audio.

惟,上述僅為本發明中對於輸入檔案的類型的部分具體實施範例,但並不以上述類型為限。 However, the above are only some specific implementation examples of the types of input files in the present invention, but are not limited to the above types.

如圖1所示,本發明的分類系統1可包括處理單元11,以及與處理單元11連接的輸入單元12、網路單元13、儲存單元14及輸出單元15。於一實施例中,分類系統1可以個人電腦(Personal Computer,PC)、工業電腦(Industrial PC,IPC)、雲端伺服器(Cloud Server)、機櫃伺服器(Cabinet Server)、平板電腦(Tablet)、筆記型電腦(Laptop)或任何具有上述單元11-15的電子裝置來實現。 As shown in FIG. 1 , the classification system 1 of the present invention may include a processing unit 11 , and an input unit 12 connected to the processing unit 11 , a network unit 13 , a storage unit 14 and an output unit 15 . In one embodiment, the classification system 1 can be a personal computer (Personal Computer, PC), an industrial computer (Industrial PC, IPC), a cloud server (Cloud Server), a cabinet server (Cabinet Server), a tablet computer (Tablet), Notebook computer (Laptop) or any electronic device with the above-mentioned units 11-15.

處理單元11可例如為處理器(Processor)、微控制單元(Micro Control Unit,MCU)、中央處理單元(Central Processing Unit,CPU)、特殊應用積體電路(Application Specific Integrated Circuit,ASIC)、現場可程式化邏輯閘陣列(Field Programmable Gate Array,FPGA)等來實現,以對輸入單元12、網路單元13、儲存單元14與輸出單元15進行資料的整合與控制,但不以此為限。 The processing unit 11 can be, for example, a processor (Processor), a micro control unit (Micro Control Unit, MCU), a central processing unit (Central Processing Unit, CPU), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), and a field programmable Programmable logic gate array (Field Programmable Gate Array, FPGA), etc., to integrate and control the data of the input unit 12, the network unit 13, the storage unit 14 and the output unit 15, but not limited thereto.

輸入單元12可例如為各式的人機介面,例如鍵盤、滑鼠、觸控板、觸控螢幕等,用以接受使用者的操作,以將需要進行分類的輸入檔案匯入至分類系統1中。 The input unit 12 can be, for example, various man-machine interfaces, such as keyboards, mice, touch panels, touch screens, etc., for accepting user operations and importing input files that need to be classified into the classification system 1 middle.

網路單元13可為各式的有線或無線的網路連接介面。分類系統1可通過網路單元13連接網路,藉此,使用者可從遠端經過網路將需要進行分類的輸入檔案傳輸至分類系統1中。 The network unit 13 can be various wired or wireless network connection interfaces. The classification system 1 can be connected to the network through the network unit 13 , so that the user can transmit the input files to be classified to the classification system 1 from the remote end through the network.

儲存單元14可例如為硬碟(Hard-Drive Disk,HDD)、固態硬碟(Solid-State Disk,SSD)、快閃記憶體(FLASH Memory)、唯讀記憶體(Read Only Memory,ROM)、隨機存取記憶體(Random Access Memory,RAM)、非揮發性記憶體(Non-Volatile Memory)等,但不以此為限。 The storage unit 14 can be, for example, a hard disk (Hard-Drive Disk, HDD), a solid-state disk (Solid-State Disk, SSD), a flash memory (FLASH Memory), a read-only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), non-volatile memory (Non-Volatile Memory), etc., but not limited thereto.

本發明中,儲存單元14儲存有人工智慧演算法141、深度學習分類模型142、機器學習比對模型143及關聯性資料結構144。所述人工智慧演算法141可例如為卷積神經網路(Convolutional Neural Networks,CNN)。所述人工智慧演算法141通過監督式學習(Supervised Learning)技術,依據分類系統1的管理者預先給定的多個類別標籤的素材(圖未標示)進行深度學習,以產生所述深度學習分類模型142。分類系統1只要將輸入檔案匯入訓練完成的深度學習分類模型142中,就可以初步判斷輸入檔案屬於預先給定的多個類別中的哪一個類別,以及對應的可信度。 In the present invention, the storage unit 14 stores an artificial intelligence algorithm 141 , a deep learning classification model 142 , a machine learning comparison model 143 and an associated data structure 144 . The artificial intelligence algorithm 141 can be, for example, a convolutional neural network (Convolutional Neural Networks, CNN). The artificial intelligence algorithm 141 uses supervised learning (Supervised Learning) technology to perform in-depth learning according to the materials (not shown) with multiple category labels predetermined by the manager of the classification system 1, so as to generate the in-depth learning classification Model 142. As long as the classification system 1 imports the input file into the trained deep learning classification model 142, it can preliminarily determine which category the input file belongs to among the predetermined categories and the corresponding credibility.

並且,基於預先給定的多個類別,分類系統1的管理者可針對各個類別分別設計對應的物件特徵,例如顏色、形狀、紋理等,並且由人工智慧演算法141基於這些物件特徵進行機器學習,以產生所述機器學習比對模型143。分類系統1只要將輸入檔案匯入訓練完成的機器學習比對模型143中,就可以藉由輸入檔案的特徵向量、圖形等參數執行特徵比對程序,以判斷輸入檔案與哪一個類別(或多個類別)具有相似性。 Moreover, based on a plurality of predetermined categories, the manager of the classification system 1 can design corresponding object features for each category, such as color, shape, texture, etc., and the artificial intelligence algorithm 141 performs machine learning based on these object features , to generate the machine learning comparison model 143 . As long as the classification system 1 imports the input file into the machine learning comparison model 143 that has been trained, it can execute the feature comparison program by using parameters such as feature vectors and graphics of the input file to determine which category (or more) the input file matches. categories) are similar.

本發明的主要技術特徵在於,分類系統1首先藉由深度學習分類模型142來對輸入檔案進行第一階段的分類程序,以初步判斷輸入檔案所屬的類別。並且於分類程序完成後,再藉由機器學習比對模型143對輸入檔案與和此類別相關的從屬類別進行第二階段的比對程序,以確認深度學習分類模型142的分類結果是否可信。藉由至少兩階層的處理程序,可以有效提高分類系統1給出的分類結果的準確率。 The main technical feature of the present invention is that the classification system 1 first uses the deep learning classification model 142 to perform a first-stage classification procedure on the input files, so as to preliminarily determine the category to which the input files belong. And after the classification process is completed, the machine learning comparison model 143 is used to perform a second-stage comparison process on the input file and the subordinate categories related to this category to confirm whether the classification result of the deep learning classification model 142 is credible. With at least two levels of processing procedures, the accuracy of the classification results given by the classification system 1 can be effectively improved.

如前文所述,深度學習分類模型142用以判斷輸入檔案所屬的類別(下面稱為第一類別),而機器學習比對模型143用以判斷輸入檔案與和第一類別相關聯的其他類別(下面稱為從屬類別)間是否確實具有相似性。為了達成上述技術手段,分類系統1通過關聯性資料結構144來記錄各個預先定義的複數類別各自與一或多個從屬類別間的關聯性(容後詳述)。 As mentioned above, the deep learning classification model 142 is used to determine the category to which the input file belongs (hereinafter referred to as the first category), and the machine learning comparison model 143 is used to determine the input file and other categories associated with the first category ( Hereinafter referred to as subcategories) there is indeed similarity between. In order to achieve the above-mentioned technical means, the classification system 1 uses the association data structure 144 to record the association between each of the predefined plural categories and one or more subordinate categories (details will be described later).

所述深度學習與機器學習為人工智慧領域的常用技術手段,為令本發明的內容簡潔,於此不再贅述。 The deep learning and machine learning are commonly used technical means in the field of artificial intelligence. In order to keep the content of the present invention concise, they will not be repeated here.

處理單元11用以從輸入單元12或網路單元13接收要進行分類的輸入檔案,並且基於儲存單元14中的模型142、143以及關聯性資料結構144對輸入檔案進行分類,並且再通過輸出單元15輸出輸入檔案的分類結果。 The processing unit 11 is used to receive the input file to be classified from the input unit 12 or the network unit 13, and classify the input file based on the models 142, 143 and the relational data structure 144 in the storage unit 14, and then pass the output unit 15 Outputting the classification result of the input file.

於一實施例中,處理單元11在接收輸入檔案後,先基於預先訓練完成的深度學習分類模型142對輸入檔案進行分類程序。由於深度學習分類模型142是基於預先給定的多個類別標籤來訓練完成的,因此在所述分類程序執行完畢後,處理單元11可以從多個類別標籤中判斷輸入檔案可能的類別(下稱為第一類別),以及輸入檔案屬於第一類別的可信度。 In one embodiment, after receiving the input file, the processing unit 11 performs a classification procedure on the input file based on the pre-trained deep learning classification model 142 . Since the deep learning classification model 142 is trained based on a plurality of predetermined category labels, after the classification program is executed, the processing unit 11 can judge the possible category of the input file from the multiple category labels (hereinafter referred to as is the first category), and the confidence that the input file belongs to the first category.

在初步判斷輸入檔案所屬的第一類別後,處理單元11進一步讀取關聯性資料結構144,以確認第一類別是否與一或多個從屬類別相關聯。值得一提的是,所述從屬類別並非指第一類別所包含的子類別,而是指在所述多個類別標籤中,與第一類別具有一或多個共同或相似物件特徵的其他類別。 After preliminarily determining the first category to which the input file belongs, the processing unit 11 further reads the association data structure 144 to confirm whether the first category is associated with one or more subordinate categories. It is worth mentioning that the subordinate category does not refer to the subcategories included in the first category, but refers to other categories that have one or more common or similar object characteristics with the first category among the multiple category tags .

若在查詢關聯性資料結構144後,確認第一類別與至少一個從屬類別相關,則處理單元11進一步基於預先訓練完成的機器學習比對模型143對輸入檔案與這個從屬類別中的至少一個數位檔案進行特徵比對程序,並且 獲得一個比對結果。於一實施例中,此比對結果可以顯示輸入檔案與這個從屬類別是否確實具有相似性(即,是否具有相同或相似的物件特徵),以及其相似程度(即,物件特徵的相似度)。 If after querying the correlation data structure 144, it is confirmed that the first category is related to at least one subcategory, the processing unit 11 further compares the input file with at least one digital file in the subcategory based on the pre-trained machine learning comparison model 143. perform a feature comparison procedure, and Get a comparison result. In one embodiment, the comparison result can show whether the input file and the subcategory really have similarity (ie, whether they have the same or similar object features), and the similarity degree (ie, the similarity of the object features).

於特徵比對程序結束後,處理單元11可判斷特徵比對序的比對結果是否符合預設的分類條件,例如判斷所述關聯程序是否高於預設門檻值。若所述比對結果符合分類條件,則處理單元11可認定輸入檔案屬於所述第一類別的可能性極高,因此將輸入檔案直接標記為第一類別,並且提高深度學習分類模型142輸出的所述可信度。最後,分類系統1可通過輸出單元15輸出第一類別以及修正後的可信度,以令使用者做為對輸入檔案進行歸檔與管理時的參考依據。 After the feature comparison procedure ends, the processing unit 11 can determine whether the comparison result of the feature comparison sequence meets a preset classification condition, for example, whether the correlation procedure is higher than a preset threshold value. If the comparison result meets the classification conditions, the processing unit 11 can determine that the input file is highly likely to belong to the first category, so the input file is directly marked as the first category, and the output of the deep learning classification model 142 is increased. the credibility. Finally, the classification system 1 can output the first category and the revised reliability through the output unit 15, so that the user can use it as a reference when filing and managing the input file.

本發明通過深度學習分類模型142與機器學習比對模型143執行兩階層的處理程序,可以有效提升傳統在執行單一階層的分類程序時,因為數位檔案的內容複雜(例如數位影像的背景部分佔比大、數位音訊的背景雜訊多等),導致分類確準率降低的問題。 The present invention implements a two-level processing program through the deep learning classification model 142 and the machine learning comparison model 143, which can effectively improve the traditional single-level classification program, because the content of the digital file is complex (such as the proportion of the background part of the digital image). Large, background noise of digital audio, etc.), leading to the problem of lower classification accuracy.

並且,本發明在機器學習比對模型143給出的比對結果顯示輸入檔案符合分類條件時,會進一步修正深度學習分類模型142所給出的分類結果的可信度。因此,處理單元11可以將輸入檔案、第一類別以及修正後的可信度做為參數,基於人工智慧演算法141對深度學習分類模型142再次訓練,以逐步修正深度學習分類模型142。如此一來,隨著分類系統1的使用時間以及使用次數的增長,深度學習分類模型142會越來越精準。藉此,有助於分類系統1對於各式新商品、物件的知識建立。 Moreover, when the comparison result given by the machine learning comparison model 143 shows that the input file meets the classification conditions, the present invention will further correct the credibility of the classification result given by the deep learning classification model 142 . Therefore, the processing unit 11 can use the input file, the first category, and the revised credibility as parameters, and retrain the deep learning classification model 142 based on the artificial intelligence algorithm 141 , so as to gradually modify the deep learning classification model 142 . In this way, the deep learning classification model 142 will become more and more accurate as the classification system 1 is used for a longer period of time and the number of times of use increases. Thereby, it is helpful for the classification system 1 to establish knowledge for various new commodities and objects.

請參閱圖2,為本發明的圖形資料結構示意圖的第一具體實施例。於一實施例中,本發明的關聯性資料結構144可為圖形資料結構2。如圖2所示,圖形資料結構2主要由複數節點(node)21及複數邊(edge)22組成,各個邊22的兩端分別連接兩個節點21,藉此描述兩個節點21間的關聯性。 Please refer to FIG. 2 , which is a first specific embodiment of the graphic data structure diagram of the present invention. In one embodiment, the associative data structure 144 of the present invention can be a graph data structure 2 . As shown in Figure 2, the graphic data structure 2 is mainly composed of multiple nodes (node) 21 and multiple edges (edge) 22, and the two ends of each edge 22 are respectively connected to two nodes 21, thereby describing the relationship between the two nodes 21 sex.

具體地,各個節點21可分別對應至管理者預先定義的一個類別標籤,例如家電、戶外、服飾、電視、洗衣機等。各個邊22可分別描述兩個節點21間的關聯性為何(例如為主要關聯及次要關聯),以及兩個節點21間的關聯程度(例如高、低或是具體的關聯分數)。 Specifically, each node 21 may correspond to a category label predefined by the administrator, such as home appliance, outdoor, clothing, television, washing machine, and the like. Each edge 22 can respectively describe what is the relationship between the two nodes 21 (eg, primary relationship and secondary relationship), and the degree of relationship between the two nodes 21 (eg, high, low, or a specific relationship score).

舉例來說,代表洗衣機的第一節點可通過第一邊連接代表電器的第二節點,第一邊記錄第一節點與第二節點屬於主要關聯,並且關聯程度高。並且,由於與洗衣機相關的數位檔案(例如照片)中除了洗衣機的特徵外,往往會包含衣服的特徵,因此在建立圖形資料結構2時,管理者可令代表洗衣機的第一節點同時通過第二邊連接代表衣服的第三節點,並且由第二邊記錄第一節點與第三節點屬於次要關聯,並且關聯程度一般。 For example, a first node representing a washing machine may be connected to a second node representing an electrical appliance through a first edge, and the first edge records that the first node and the second node are mainly associated with a high degree of association. And, since the digital files (such as photos) related to the washing machine often contain the characteristics of the clothes in addition to the characteristics of the washing machine, when establishing the graphic data structure 2, the manager can make the first node representing the washing machine pass through the second node at the same time. The edge connection represents the third node of the clothes, and the second edge records that the first node and the third node belong to the secondary association, and the degree of association is general.

本發明中,深度學習分類模型142可以通過代表主要關聯的邊22來對輸入檔案進行初步分類,並且通過代表次要關聯的邊22來將輸入檔案與具有相關性的從屬類別進行比對。藉此,可基於圖形資料結構2來提高分類結果的準確率。 In the present invention, the deep learning classification model 142 can preliminarily classify the input files through the edges 22 representing the primary associations, and compare the input files with relevant subordinate categories through the edges 22 representing the secondary associations. In this way, the accuracy of classification results can be improved based on the graphic data structure 2 .

再請參閱圖3A,為本發明的樹狀資料結構示意圖的第一具體實施例。於另一實施例中,本發明的關聯性資料結構144可為樹狀資料結構3。如圖3A所示,樹狀資料結構3具有複數根節點(Root)31以及複數子節點32(或為子樹(subtree)),其中各個子節點32分別連接至一個根節點31,並且各個子節 點32可被設定有一個屬性。所述屬性用以描述這個子節點32與另一個子節點32之間的關聯性,然而本發明不以所有子節點32皆設定有對應的屬性為必要。 Please refer to FIG. 3A again, which is a first specific embodiment of the tree structure diagram of the present invention. In another embodiment, the relational data structure 144 of the present invention can be a tree data structure 3 . As shown in Figure 3A, the tree-like data structure 3 has a plurality of root nodes (Root) 31 and a plurality of child nodes 32 (or subtrees (subtree)), wherein each child node 32 is connected to a root node 31 respectively, and each child node Festival Point 32 can be set with an attribute. The attributes are used to describe the relationship between the child node 32 and another child node 32 , but the present invention does not require all child nodes 32 to be set with corresponding attributes.

具體地,各個根節點31可分別對應至管理者預先給定的一個廣義類別標籤,例如家電、戶外、服飾機等,各個子節點32可分別對應所述廣義類別標籤下的多個狹義類別標籤,例如電視洗衣機、登山用具、雨衣、襯衫、裙子、襪子等。所述屬性可用以描述這個子節點32與同一個根節點31下的其他子節點32的關聯性以及關聯程度,或是這個子節點32與不同根節點31下的其他子節點32的關聯性以及關聯程度。 Specifically, each root node 31 can correspond to a generalized category label predetermined by the manager, such as home appliances, outdoor, clothing machine, etc., and each child node 32 can correspond to multiple narrow category labels under the broad category label. , such as TV washing machines, climbing equipment, raincoats, shirts, skirts, socks, etc. The attribute can be used to describe the relevance and degree of association between this child node 32 and other child nodes 32 under the same root node 31, or the association and relationship between this child node 32 and other child nodes 32 under different root nodes 31 degree of association.

請同時參閱圖3B,為本發明的樹狀資料結構示意圖的第二具體實施例。於圖3B的實施例中,根節點31代表廣義的電器、廚具、戶外、服飾等類別標籤,子節點32代表狹義的電視、洗衣機、洗碗機、收納櫃、碗盤、登山用具、雨衣、襯衫、裙子、襪子等類別標籤,並且各個子節點32分別隸屬於一個根節點31。 Please also refer to FIG. 3B , which is a second specific embodiment of the tree structure diagram of the present invention. In the embodiment of FIG. 3B , the root node 31 represents category tags such as electrical appliances, kitchen utensils, outdoors, and clothing in a broad sense, and the sub-nodes 32 represent televisions, washing machines, dishwashers, storage cabinets, dishes, mountaineering equipment, raincoats, etc. in a narrow sense. Category tags such as shirts, skirts, socks, etc., and each child node 32 belongs to a root node 31 respectively.

如前文所述,由於與洗衣機相關的數位檔案(例如文件、照片等)中除了洗衣機外,有很高的機率會同時出現衣服,但是衣服並不屬於電器類別,也不屬於洗衣機類別。因此,管理者可在樹狀資料結構3中代表洗衣機的子節點32中設定屬性33,並且於屬性33中描述洗衣機的子節點32與衣服相關。即,將衣服的類別視為洗衣機的從屬類別。 As mentioned above, since there is a high probability of clothes appearing in the digital files related to washing machines (such as documents, photos, etc.) besides washing machines, but clothes do not belong to the category of electrical appliances, nor do they belong to the category of washing machines. Therefore, the administrator can set an attribute 33 in the child node 32 representing the washing machine in the tree data structure 3, and describe in the attribute 33 that the child node 32 of the washing machine is related to clothes. That is, consider the category of clothes as a subordinate category of the washing machine.

同樣的,雨衣的子節點32雖然連接至戶外的根節點31,與洗衣機不同,但雨衣與衣服相關,與雨衣相關的數位檔案中出現衣服的機率很高。因此,管理者同樣可藉由屬性33來描述雨衣的子節點32與衣服相關。即,將衣服的類別視為雨衣的從屬類別。並且,襯衫的子節點32以及裙子的子節點32 雖然連接至服飾的根節點31,與洗衣機以及雨衣不同,但襯衫及裙子皆與衣服相關,其數位檔案中出現衣服的機率很高。因此,管理者同樣可藉由屬性33來描述襯衫及裙子的子節點32與衣服相關。即,將衣服的類別視為襯衫及裙子的從屬類別。 Similarly, although the child node 32 of the raincoat is connected to the outdoor root node 31, it is different from the washing machine, but the raincoat is related to clothes, and the probability of clothes appearing in the digital files related to raincoats is very high. Therefore, the manager can also use the attribute 33 to describe that the child node 32 of the raincoat is related to clothes. That is, consider the category of clothes as a subordinate category of raincoats. And, the child node 32 of the shirt and the child node 32 of the skirt Although connected to the root node 31 of clothing, it is different from washing machines and raincoats, but shirts and skirts are related to clothing, and the probability of clothing appearing in their digital files is very high. Therefore, the manager can also use the attribute 33 to describe that the child nodes 32 of shirts and skirts are related to clothes. That is, the category of clothes is regarded as a subordinate category of shirts and skirts.

通過樹狀資料結構3中設定的屬性33,處理單元11在查詢樹狀資料結構3時,可以知道洗衣機、雨衣、襯衫及裙子的類別彼此具有關聯性連結34。因此,當深度學習分類模型142判斷一個輸入檔案屬於洗衣機、雨衣、襯衫或裙子的類別,但可信度不足時,可以通過基於樹狀資料結構3中的關聯性連結34,令機器學習比對模型143來對輸入檔案與具有關聯性的從屬類別進行比對,以確認輸入檔案所屬的類別。 Through the attributes 33 set in the tree data structure 3 , when the processing unit 11 inquires the tree data structure 3 , it can know that the categories of washing machine, raincoat, shirt and skirt have associated links 34 . Therefore, when the deep learning classification model 142 judges that an input file belongs to the category of washing machine, raincoat, shirt or skirt, but the reliability is not enough, it can make the machine learning comparison The model 143 is used to compare the input file with associated subordinate categories to confirm the category to which the input file belongs.

續請同時參閱圖4,為本發明的分類方法的流程圖的第一具體實施例。本發明進一步揭露了一種自動化數位檔案分類方法(下面將於說明書中簡稱為分類方法),所述分類方法應用於如圖1所示的分類系統1,並且採用如圖2、圖3A及圖3B所示的資料結構,以自動對輸入檔案進行分類。 Please also refer to FIG. 4 , which is a first specific embodiment of the flow chart of the classification method of the present invention. The present invention further discloses an automatic digital file classification method (hereinafter referred to simply as a classification method in the description), the classification method is applied to the classification system 1 shown in Figure 1, and uses the method shown in Figure 2, Figure 3A and Figure 3B The data structure shown to automatically classify the input files.

如圖4所示,要執行本發明的分類方法,首先分類系統1通過輸入單元12或網路單元13接收要進行分類的輸入檔案(步驟S10),所述輸入檔案為經過數位化的數位多媒體檔案。接著,分類系統1基於預先訓練完成的深度學習分類模型142對輸入檔案進行分類程序(步驟S12),分類程序執行完成後,會輸出一個初步分類結果,這個初步分類結果顯示輸入檔案可能的類別(下稱第一類別),以及對應的可信度。 As shown in Figure 4, to implement the classification method of the present invention, first the classification system 1 receives the input files to be classified through the input unit 12 or the network unit 13 (step S10), and the input files are digitized digital multimedia file. Next, the classification system 1 performs a classification procedure on the input file based on the pre-trained deep learning classification model 142 (step S12). After the classification program is executed, a preliminary classification result will be output, which shows the possible category of the input file ( hereinafter referred to as the first category), and the corresponding credibility.

舉例來說,當輸入檔案是與洗碗機相關的照片時,分類系統1於步驟S12中可藉由深度學習分類模型142判斷輸入檔案屬於洗碗機的類別, 並且可信度為99%。再例如,輸入檔案是與洗碗機相關的照片,但除了洗碗機外,照片中還包括了盤子、杯子、流理臺等物件特徵。於此實施例中,深度學習分類模型142可能在分類程序中判斷輸入檔案屬於洗碗機的類別,並且可信度為70%。 For example, when the input file is a photo related to a dishwasher, the classification system 1 can use the deep learning classification model 142 to determine that the input file belongs to the category of dishwasher in step S12, And the reliability is 99%. For another example, the input file is a photo related to the dishwasher, but in addition to the dishwasher, the photo also includes object features such as plates, cups, and counters. In this embodiment, the deep learning classification model 142 may determine that the input file belongs to the dishwasher category in the classification procedure, and the confidence level is 70%.

為了解決上述可信度不足,使得分類結果不準確的問題,分類系統1於步驟S12後,可從儲存單元14中讀取關聯性資料結構144,以基於關聯性資料結構144的內容確認是否有與所述第一類別相關聯的至少一個從屬類別。若關聯性資料結構144顯示第一類別與至少一個從屬類別具有關聯性(例如圖3B所示,洗碗機的類別與碗盤的類別具有關聯性),則分類系統1基於預先訓練完成的機器學習比對模型143對輸入檔案與這個從屬類別中的至少一個數位檔案進行特徵比對程序。若特徵比對程序的比對結果顯示輸入檔案符合預設的一個分類條件(例如,與從屬類別的相似性超過70%),分類系統1即可將輸入檔案標記為所述第一類別,並且可提高輸入檔案屬於第一類別的可信度。 In order to solve the problem of inaccurate classification results caused by insufficient credibility, the classification system 1 can read the association data structure 144 from the storage unit 14 after step S12 to confirm whether there is any At least one subordinate category associated with the first category. If the association data structure 144 shows that the first category is associated with at least one subordinate category (for example, as shown in FIG. 3B, the category of the dishwasher is associated with the category of the dishes), then the classification system 1 is based on the pre-trained machine. The learning comparison model 143 performs a feature comparison procedure on the input file and at least one digital file in the subordinate category. If the comparison result of the feature comparison program shows that the input file meets a preset classification condition (for example, the similarity with the subordinate category exceeds 70%), the classification system 1 can mark the input file as the first category, and The confidence that the input file belongs to the first category can be increased.

更具體地,於步驟S12後,分類系統1可先確認深度學習分類模型142的分類結果的可信度是否高於或等於預設的門檻值(步驟S14),例如70%、80%等,不加以限定。若可信度高於或等於門檻值,代表深度學習分類模型142的分類結果是準確的,因此分類系統1可直接將輸入檔案標記為所述第一類別(步驟S16),並且結束整個分類程序。 More specifically, after step S12, the classification system 1 can first confirm whether the reliability of the classification result of the deep learning classification model 142 is higher than or equal to a preset threshold value (step S14), such as 70%, 80%, etc., Not limited. If the reliability is higher than or equal to the threshold value, it means that the classification result of the deep learning classification model 142 is accurate, so the classification system 1 can directly mark the input file as the first category (step S16), and end the entire classification procedure .

若於步驟S14中判斷可信度低於門檻值,代表深度學習分類模型142的分類結果不夠準確。此時,分類系統1讀取關聯性資料結構144(步驟S18),並藉由關聯性資料結構144的內容(即,圖形資料結構2或樹狀資料結構3)確認是否有與第一類別相關聯的至少一個從屬類別(步驟S20)。 If it is determined in step S14 that the reliability is lower than the threshold value, it means that the classification result of the deep learning classification model 142 is not accurate enough. At this time, the classification system 1 reads the relational data structure 144 (step S18), and confirms whether there is a first category-related at least one subordinate category of the link (step S20).

若於步驟S20中判斷第一類別與至少一個從屬類別具有關聯性,則分類系統1可藉由機器學習比對模型143來執行特徵比對程序。具體地,機器學習比對模型143可基於這個從屬類別來擷取輸入檔案中對應的至少一個物件特徵(步驟S22),藉此,基於這個物件特徵來對輸入檔案與此從屬類別中的至少一個數位檔案進行特徵比對程序(步驟S24)。意即,判斷輸入檔案與這個從屬類別中的數位檔案是否具有相同或相似的物件特徵。於特徵比對程序完成後,分類系統1判斷特徵比對程序的比對結果是否符合預設的分類條件(步驟S26)。 If it is determined in step S20 that the first category is related to at least one subordinate category, the classification system 1 can execute a feature comparison procedure through the machine learning comparison model 143 . Specifically, the machine learning comparison model 143 can extract at least one object feature corresponding to the input file based on the subcategory (step S22), whereby at least one of the input file and the subcategory can be compared based on the object feature. The digital file is subjected to a feature comparison program (step S24). That is, it is determined whether the input file has the same or similar object characteristics as the digital files in this subordinate category. After the feature comparison program is completed, the classification system 1 judges whether the comparison result of the feature comparison program meets the preset classification conditions (step S26 ).

若特徵比對程序的比對結果顯示輸入檔案不符合預設的分類條件,則分類系統1可通過輸出單元15顯示分類失敗的訊息(步驟S28)。反之,若所述比對結果顯示輸入檔案符合所述分類條件,則分類系統1可將輸入檔案標記為所述第一類別(步驟S30),並且修正輸入檔案屬於第一類別的可信度(步驟S32)。例如,分類系統1可修正所述可信度,使得可信度高於或等於步驟S14中使用的門檻值。 If the comparison result of the feature comparison program shows that the input files do not meet the preset classification conditions, the classification system 1 can display a classification failure message through the output unit 15 (step S28 ). Conversely, if the comparison result shows that the input file meets the classification condition, the classification system 1 may mark the input file as the first category (step S30), and correct the credibility of the input file belonging to the first category ( Step S32). For example, the classification system 1 may revise the credibility so that the credibility is higher than or equal to the threshold value used in step S14.

值得一提的是,分類系統1於步驟S20中判斷所述第一類別沒有與任何從屬類別具有關聯性,則因為第一階段的分類程序(即,通過深度學習分類模型142執行的分類程序)的分類結果的可信度不足,而又不存在比對對象可以進行第二階段的比對程序(即,通過機器學習比對模型143執行的比對程序),因此分類系統1可直接通過輸出單元15顯示分類失敗的訊息(步驟S28)。 It is worth mentioning that the classification system 1 judges in step S20 that the first category is not associated with any subordinate category, because the classification procedure of the first stage (that is, the classification procedure executed by the deep learning classification model 142) The reliability of the classification results is insufficient, and there is no comparison object that can carry out the comparison program of the second stage (that is, the comparison program performed by the machine learning comparison model 143), so the classification system 1 can directly pass the output The unit 15 displays a message of classification failure (step S28).

續請參閱圖5,為本發明的特徵比對示意圖的第一具體實施例。圖5以網路商店使用的商品照片為例,舉例說明。 Please refer to FIG. 5 , which is the first specific embodiment of the feature comparison diagram of the present invention. Figure 5 takes the product photo used in the online store as an example to illustrate.

如圖5所示,家電類別41中包含了洗衣機411的類別,而洗衣機411的照片中包含了衣服特徵5。戶外類別42中包含了雨衣421的類 別,而雨衣421的照片中包含了衣服特徵5。服飾類別43中包含了襯衫431的類別,而襯衫431的照片中包含了衣服特徵5。因此,分類系統1的管理者在設計關聯性資料結構144時,可以建立洗衣機411的類別、雨衣421的類別以及襯衫431的類別的關聯性。 As shown in FIG. 5 , the appliance category 41 includes the category of the washing machine 411 , and the photo of the washing machine 411 includes the clothes feature 5 . Outdoor class 42 includes classes for rainwear 421 No, and the photo of the raincoat 421 contains the clothing feature 5. The clothing category 43 includes the category of the shirt 431 , and the photo of the shirt 431 includes the clothing feature 5 . Therefore, when designing the relational data structure 144 , the manager of the classification system 1 can establish the relation among the categories of the washing machine 411 , the category of the raincoat 421 and the category of the shirt 431 .

例如,管理者可於圖形資料結構2中通過邊22來連接代表洗衣機411、雨衣421以及襯衫431的節點21,或是於樹狀資料結構3中通過屬性33來描述代表洗衣機411、雨衣421以及襯衫431的子節點32間的關聯性。 For example, the manager can connect the nodes 21 representing the washing machine 411, the raincoat 421 and the shirt 431 through the edge 22 in the graph data structure 2, or describe the nodes 21 representing the washing machine 411, the raincoat 421 and the shirt 431 through the attribute 33 in the tree data structure 3. The relationship between the child nodes 32 of the shirt 431.

藉由上述關聯性,分類系統1可以在深度學習分類模型142判斷輸入檔案屬於洗衣機411的類別,但可信度不足時,通過機器學習比對模型143將輸入檔案與雨衣421及/或襯衫431的類別中的數位檔案進行特徵比對,以修正輸入檔案為洗衣機411的類別的可信度。相似地,當深度學習分類模型142判斷輸入檔案屬於雨衣421的類別但可信度不足時,分類系統1亦可通過機器學習比對模型143將輸入檔案與洗衣機411及/或襯衫431的類別中的數位檔案進行比對,以修正輸入檔案為雨衣421的類別的可信度,以此類推。 With the above correlation, the classification system 1 can compare the input file with the raincoat 421 and/or the shirt 431 through the machine learning comparison model 143 when the deep learning classification model 142 judges that the input file belongs to the category of the washing machine 411, but the reliability is not enough. The feature comparison is performed on the digital files in the category of , so as to correct the credibility that the input file is the category of the washing machine 411 . Similarly, when the deep learning classification model 142 judges that the input file belongs to the category of the raincoat 421 but the reliability is not enough, the classification system 1 can also use the machine learning comparison model 143 to compare the input file with the category of the washing machine 411 and/or the shirt 431 Compare the digital files of , to correct the credibility of the input file as the category of raincoat 421, and so on.

於一實施例中,分類系統1在完成分類程序後,可以通過輸出單元15輸出分類結果以及對應的可信度,以令使用者自行對輸入檔案進行後續處理。於另一實施例中,分類系統1在完成分類程序後,可以依據分類結果將輸入檔案搬移並儲存至所屬類別的對應資料庫中。藉此,可自動對輸入檔案進行歸檔,並且利於對深度學習分類模型142進行再次訓練。 In one embodiment, after the classification process is completed, the classification system 1 can output the classification result and the corresponding reliability through the output unit 15, so that the user can perform subsequent processing on the input file. In another embodiment, after the classification process is completed, the classification system 1 can move and store the input file to the corresponding database of the category according to the classification result. In this way, the input file can be archived automatically, and it is beneficial to retrain the deep learning classification model 142 .

續請參閱圖6及圖7,分別為本發明的特徵關聯性示意圖的第一具體實施例及第二具體實施例。 Please refer to FIG. 6 and FIG. 7 , which are respectively the first specific embodiment and the second specific embodiment of the feature correlation schematic diagram of the present invention.

圖6的實施例以對醫療領域的數位多媒體檔案進行分類所使用的資料結構為例。於圖6的實施例中,超音波的類別下包含了良性腫瘤的類別以及惡性腫瘤的類別,電腦斷層的類別下包含了良性腫瘤的類別以及惡性腫瘤的類別,而核磁共振的類別下也包含了良性腫瘤的類別以及惡性腫瘤的類別。並且,深度學習分類模型142已經由上述類別標籤訓練完成。 The embodiment in FIG. 6 takes the data structure used for classifying digital multimedia files in the medical field as an example. In the embodiment of Figure 6, the category of ultrasound includes the category of benign tumors and the category of malignant tumors, the category of computerized tomography includes the category of benign tumors and the category of malignant tumors, and the category of nuclear magnetic resonance also includes categories of benign tumors and malignant tumors. Moreover, the deep learning classification model 142 has been trained by the above category labels.

當一張腫瘤影像輸入分類系統1後,深度學習分類模型142初步判斷此腫瘤影像屬於超音波類別中的良性腫瘤,但可信度不高。於此情況下,分類系統1可以參考關聯性資料結構144中描述的關聯性,以通過機器學習比對模型143對此腫瘤影像與超音波類別中的其他良性腫瘤的影像進行比對,並且對此腫瘤影像與核磁共振類別中的良性腫瘤的影像進行特徵比對程序。通過特徵比對程序(即,判斷兩者是否具有相同或相似的物件特徵),可以確認此腫瘤影像是否確實為良性腫瘤。 When a tumor image is input into the classification system 1, the deep learning classification model 142 preliminarily judges that the tumor image belongs to a benign tumor in the ultrasound category, but the reliability is not high. In this case, the classification system 1 can refer to the association described in the association data structure 144 to compare the tumor image with other benign tumor images in the ultrasound category through the machine learning comparison model 143, and compare This tumor image is subjected to a feature comparison procedure with images of benign tumors in the MRI category. Whether the tumor image is indeed a benign tumor can be confirmed through a feature comparison program (ie, judging whether the two have the same or similar object features).

具體地,由於超音波影像與核磁共振影像可能具有相同的物件特徵,但此物件特徵可能在超音波影像中較不明顯,而在核磁共振影像中較明顯。因此,分類系統1的管理者可於關聯性資料結構144中建立超音波類別中的良性腫瘤與核磁共振類別中的良性腫瘤的關聯性。藉由將腫瘤影像與具有關聯性的其他類別中的良性腫瘤的影像進行比對,可以修正深度學習分類模型142的分類結果的可信度。 Specifically, since the ultrasonic image and the nuclear magnetic resonance image may have the same object feature, the object feature may be less obvious in the ultrasonic image but more obvious in the nuclear magnetic resonance image. Therefore, the manager of the classification system 1 can establish the correlation between the benign tumor in the ultrasound category and the benign tumor in the MRI category in the correlation data structure 144 . The reliability of the classification results of the deep learning classification model 142 can be revised by comparing the tumor images with images of benign tumors in other related categories.

圖7的實施例以對生產線上的產品進行分類所使用的資料結構為例。於圖7的實施例中,第一零件的類別下包含了良品的類別以及瑕疪 品的類別,而瑕疪品的類別下包含了組裝不合、異物及機台老化等瑕疪類別。第二零件的類別下也包含了良品的類別以及瑕疪品的類別,而瑕疪品的類別下包含了組裝不合、異物及機台老化等瑕疪類別。並且,深度學習分類模型142已經由上述類別標籤訓練完成。 The embodiment of FIG. 7 takes the data structure used for classifying products on the production line as an example. In the embodiment of Fig. 7, the category of the first part includes the category of good products and the category of defective products. The category of products, and the category of defective products includes defective categories such as improper assembly, foreign objects, and machine aging. The category of the second part also includes the category of good products and the category of defective products, and the category of defective products includes the categories of defects such as improper assembly, foreign objects, and machine aging. Moreover, the deep learning classification model 142 has been trained by the above category labels.

當一張瑕疪影像輸入分類系統1後,深度學習分類模型142初步判斷此瑕疪影像屬於第一零件類別中的瑕疪品,並且屬於機台老化的瑕疪,但可信度不高。於此情況下,分類系統1可以參考關聯性資料結構144中描述的關聯性,通過機器學習比對模型143對此瑕疪影像與第一零件類別中的瑕疪品類別下的其他機台老化的影像進行特徵比對程序,或是對此瑕疪影像與第二零件類別中的瑕疪品類別下的機台老化的影像進行特徵比對程序。通過特徵比對程序,可以確認此瑕疪影像是否確實屬於機台老化的瑕疪類別。 When a defect image is input into the classification system 1, the deep learning classification model 142 preliminarily judges that the defect image belongs to the defective product in the first part category, and belongs to the aging defect of the machine, but the reliability is not high. In this case, the classification system 1 can refer to the association described in the association data structure 144, and use the machine learning comparison model 143 to compare the defective image with other machines under the defective product category in the first component category A feature comparison program is performed on the aging image, or a feature comparison program is performed on the defective image and the machine aging image under the defective product category in the second part category. Through the feature comparison program, it can be confirmed whether the defect image really belongs to the defect category of machine aging.

如前文所述,本發明的分類系統1是在深度學習分類模型142的分類結果的可信度不足時,才會通過機器學習比對模型143來將輸入檔案與具有關聯性的從屬類別進行比對。當輸入檔案所屬的類別同時與多個從屬類別具有關聯(即,具有多個從屬類別)時,分類系統1可以執行多次的特徵比對程序,以避免單一次的特徵比對程序的誤判。 As mentioned above, the classification system 1 of the present invention will use the machine learning comparison model 143 to compare the input file with the associated subordinate category when the reliability of the classification result of the deep learning classification model 142 is insufficient. right. When the category to which the input file belongs is associated with multiple subcategories (that is, has multiple subcategories), the classification system 1 can perform multiple feature comparison procedures to avoid misjudgment by a single feature comparison procedure.

如圖8所示,於圖4的步驟S20中,分類系統1可能基於關聯性資料結構144的內容判斷輸入檔案所屬的第一類別同時與複數個從屬類別具有關聯(步驟S40)。此時,機器學習比對模型143基於複數從屬類別的其中之一擷取輸入檔案的至少一個物件特徵(步驟S42),並且基於至少一物件特徵來對輸入檔案及此從屬類別中的至少一個數位檔案進行特徵比對程序(步驟S44),並且獲得一個相似性分數(步驟S46)。 As shown in FIG. 8 , in step S20 of FIG. 4 , the classification system 1 may determine based on the content of the association data structure 144 that the first category to which the input file belongs is associated with a plurality of subordinate categories at the same time (step S40 ). At this point, the machine learning comparison model 143 extracts at least one object feature of the input file based on one of the plurality of subcategories (step S42), and at least one digit of the input file and the subcategory is compared based on at least one object feature. The profile undergoes a feature comparison process (step S44) and obtains a similarity score (step S46).

於步驟S46後,分類系統1通過機器學習比對模型143完成了輸入檔案與其中一個從屬類別的比對程序,並且判斷與第一類別相關的複數從屬類別是否皆比對完成(步驟S48)。 After step S46, the classification system 1 completes the comparison process between the input file and one of the subcategories through the machine learning comparison model 143, and judges whether the multiple subcategories related to the first category are all compared (step S48).

於複數從屬類別皆比對完成前,分類系統1重覆執行所述步驟S42、步驟S44及步驟S46,以通過機器學習比對模型143對輸入檔案與其他從屬類別進行特徵比對程序,並且獲得對應的相似性分數。當複數從屬類別皆比對完成後,分類系統1即可依據各個從屬類別的相似性分數以及一個預設權重值來計算所述特徵比對程序的比對結果(步驟S50)。 Before the comparison of the plurality of subcategories is completed, the classification system 1 repeatedly executes the steps S42, S44 and S46 to perform a feature comparison procedure on the input files and other subcategories through the machine learning comparison model 143, and obtain Corresponding similarity scores. After the comparison of the plurality of subordinate categories is completed, the classification system 1 can calculate the comparison result of the feature comparison program according to the similarity scores of each subordinate category and a preset weight value (step S50 ).

所述預設權重值可例如由圖形資料結構2中的各個邊22來描述,或由樹狀資料結構3中的各個屬性33來描述,但不以此為限。並且,於一實施例中,所述比對結果可為一筆數值,例如百分比,但不加以限定。換句話說,所述比對結果是基於第一類別與各個從屬類別的相似性分數以及各自的重要性所計算而成的。 The preset weight value can be described by, for example, each edge 22 in the graph data structure 2 , or described by each attribute 33 in the tree data structure 3 , but not limited thereto. Moreover, in one embodiment, the comparison result may be a numerical value, such as a percentage, but it is not limited thereto. In other words, the comparison result is calculated based on the similarity scores between the first category and each subordinate category and their importance.

步驟S50後,分類系統1可判斷比對結果是否符合預設的分類條件(步驟S52),即,依據比對結果確認深度學習分類模型142的分類結果是否可信。若於步驟S52中判斷比對結果不符合分類條件,則分類系統1可通過輸出單元15輸出分類失敗的訊息(步驟S54)。反之,若於步驟S52中判斷比對結果符合分類條件,則分類系統1可直接將輸入檔案標記為第一類別(步驟S56),並且修正輸入檔案屬於第一類別的可信度(步驟S58)。 After step S50, the classification system 1 can judge whether the comparison result meets the preset classification condition (step S52), that is, confirm whether the classification result of the deep learning classification model 142 is credible according to the comparison result. If it is determined in step S52 that the comparison result does not meet the classification condition, the classification system 1 may output a classification failure message through the output unit 15 (step S54 ). On the contrary, if it is judged in step S52 that the comparison result meets the classification condition, the classification system 1 can directly mark the input file as the first category (step S56), and correct the credibility of the input file belonging to the first category (step S58) .

本發明通過深度學習與機器學習來實現第一階段的分類程序以及第二階段的比對程序,可以有效提高單一階段的分類程序的分類結果的準確率。 The present invention realizes the classification program of the first stage and the comparison program of the second stage through deep learning and machine learning, and can effectively improve the accuracy rate of the classification result of the single-stage classification program.

以上所述僅為本發明之較佳具體實例,非因此即侷限本發明之專利範圍,故舉凡運用本發明內容所為之等效變化,均同理皆包含於本發明之範圍內,合予陳明。 The above descriptions are only preferred specific examples of the present invention, and are not intended to limit the patent scope of the present invention. Therefore, all equivalent changes made by using the content of the present invention are all included in the scope of the present invention. bright.

S10~S32:分類步驟 S10~S32: classification steps

Claims (9)

一種自動化數位檔案分類方法,應用於一分類系統,並且包括:a)由該分類系統接收一輸入檔案;b)該分類系統基於預先訓練完成的一深度學習分類模型對該輸入檔案進行一分類程序,其中該分類程序輸出該輸入檔案可能的一第一類別以及對應的一可信度;b1)判斷該可信度是否高於或等於一門檻值;b2)於該可信度高於或等於該門檻值時,直接將該輸入檔案標記為該第一類別;c)於該可信度低於該門檻值時,由該分類系統讀取一關聯性資料結構以確認與該第一類別關聯的至少一從屬類別,其中該關聯性資料結構記錄預先定義的複數類別與一或多個該從屬類別間的一關聯性,其中該從屬類別為在多個類別標籤中與該第一類別具有一或多個共同或相似物件特徵的其他類別;d)該分類系統基於預先訓練完成的一機器學習比對模型對該輸入檔案與該從屬類別中的至少一個數位檔案進行一特徵比對程序;及e)於該特徵比對程序的一比對結果符合一分類條件時將該輸入檔案標記為該第一類別,並且修正該輸入檔案屬於該第一類別的該可信度。 An automatic digital file classification method, applied to a classification system, and comprising: a) receiving an input file by the classification system; b) the classification system performing a classification procedure on the input file based on a pre-trained deep learning classification model , wherein the classification program outputs a possible first category of the input file and a corresponding reliability; b1) judging whether the reliability is higher than or equal to a threshold; b2) when the reliability is higher than or equal to When the threshold value is reached, the input file is directly marked as the first category; c) when the reliability is lower than the threshold value, the classification system reads a relational data structure to confirm that it is associated with the first category , wherein the association data structure records an association between a predefined plurality of classes and one or more of the subordinate classes, wherein the subordinate class is one of the plurality of class tags with the first class or other categories of common or similar object characteristics; d) the classification system performs a feature comparison process on the input file and at least one digital file in the subordinate category based on a machine learning comparison model that has been pre-trained; and e) marking the input file as the first category when a comparison result of the feature comparison program meets a classification condition, and revising the confidence that the input file belongs to the first category. 如請求項1所述的自動化數位檔案分類方法,其中該關聯性資料結構為具有複數根節點及複數子節點的一樹狀資料結構,其中一或多個該子節點分別設定有一屬性,該屬性描述該子節點與另一子節點之間的關聯性。 The automatic digital file classification method as described in claim item 1, wherein the associated data structure is a tree-like data structure with multiple root nodes and multiple child nodes, wherein one or more of the child nodes are respectively set with an attribute, and the attribute describes The association between this child node and another child node. 如請求項1所述的自動化數位檔案分類方法,其中該關聯性資料結構為具有複數節點及複數邊的一圖形資料結構,各該邊分別連接兩個該節點以描述該兩個節點間的關聯性。 The automated digital file classification method as described in Claim 1, wherein the association data structure is a graph data structure with multiple nodes and multiple edges, each of which connects two nodes to describe the relationship between the two nodes sex. 如請求項2或3所述的自動化數位檔案分類方法,其中該步驟d)是由該機器學習比對模型基於該從屬類別來擷取該輸入檔案的至少一物件特徵,並且基於該至少一物件特徵對該輸入檔案及該從屬類別中的至少一個數位檔案進行該特徵比對程序,以判斷該輸入檔案是否與該從屬類別具有相似性。 The automatic digital file classification method as described in claim 2 or 3, wherein the step d) is to extract at least one object feature of the input file based on the subordinate category by the machine learning comparison model, and based on the at least one object The feature comparison procedure is performed on the input file and at least one digital file in the subcategory to determine whether the input file is similar to the subcategory. 如請求項2或3所述的自動化數位檔案分類方法,其中該步驟c)讀取該關聯性資料結構並確認與該第一類別相關聯的複數從屬類別,並且該步驟d)包括:d1)由該機器學習比對模型基於該複數從屬類別的其中之一擷取該輸入檔案的至少一物件特徵;d2)基於該至少一物件特徵對該輸入檔案及該從屬類別中的至少一個數位檔案進行該特徵比對程序,並獲得一相似性分數;d3)判斷該複數從屬類別是否皆比對完成;d4)於該複數從屬類別皆比對完成前,重覆執行該步驟d1)至該步驟d2),以獲得該輸入檔案與其他該從屬類別的該相似性分數;及d5)於該複數從屬類別皆比對完成後,依據各該從屬類別的該相似性分數以及一預設權重值計算該特徵比對程序的該比對結果。 The automatic digital archive classification method as described in claim 2 or 3, wherein the step c) reads the association data structure and confirms the plurality of subordinate categories associated with the first category, and the step d) includes: d1) Extracting at least one object feature of the input file based on one of the plurality of subcategories by the machine learning comparison model; d2) performing an operation on the input file and at least one digital file in the subcategory based on the at least one object feature The feature comparison program, and obtain a similarity score; d3) determine whether the plurality of subcategories are compared; d4) before the plurality of subcategories are compared, repeat step d1) to step d2 ) to obtain the similarity score between the input file and other subcategories; and d5) after the comparison of the plurality of subcategories is completed, calculate the The result of this alignment for the feature alignment program. 一種自動化數位檔案分類系統,包括:一輸入單元,接收待分類的一輸入檔案; 一儲存單元,儲存一深度學習分類模型、一機器學習比對模型及一關聯性資料結構,其中該關聯性資料結構記錄預先定義的複數類別與一或多個從屬類別間的一關聯性;一處理單元,連接該輸入單元及該儲存單元,並且執行下列程序:基於預先訓練完成的該深度學習分類模型對該輸入檔案進行一分類程序,其中該分類程序輸出該輸入檔案可能的該第一類別以及對應的一可信度;在執行該分類程序後,判斷該可信度是否高於或等於一門檻值;於判斷該可信度高於或等於該門檻值時直接將該輸入檔案標記為該第一類別;於判斷該可信度低於該門檻值時,讀取該關聯性資料結構以確認與該第一類別相關聯的至少一個該從屬類別,其中該從屬類別為在多個類別標籤中與該第一類別具有一或多個共同或相似物件特徵的其他類別;基於預先訓練完成的該機器學習比對模型對該輸入檔案與該從屬類別中的至少一個數位檔案進行一特徵比對程序;及於該特徵比對程序的一比對結果符合一分類條件時將該輸入檔案標記為該第一類別,並且修正該輸入檔案屬於該第一類別的該可信度;及一輸出單元,連接該處理單元,輸出該輸入檔案所屬的該第一類別以及對應的該可信度。 An automatic digital file classification system, comprising: an input unit for receiving an input file to be classified; A storage unit storing a deep learning classification model, a machine learning comparison model, and an associative data structure, wherein the associative data structure records an association between predefined plural categories and one or more subcategories; a The processing unit is connected to the input unit and the storage unit, and executes the following procedure: performing a classification procedure on the input file based on the pre-trained deep learning classification model, wherein the classification procedure outputs the possible first category of the input file and a corresponding credibility; after executing the classification program, it is judged whether the credibility is higher than or equal to a threshold; when it is judged that the credibility is higher than or equal to the threshold, the input file is directly marked as the first category; when it is determined that the reliability is lower than the threshold value, reading the association data structure to confirm at least one subcategory associated with the first category, wherein the subcategory is in a plurality of categories Other categories having one or more common or similar object characteristics with the first category in the label; performing a feature comparison between the input file and at least one digital file in the subordinate category based on the pre-trained machine learning comparison model a matching program; and marking the input file as the first category when a comparison result of the feature matching program meets a classification condition, and revising the confidence that the input file belongs to the first category; and an output A unit, connected to the processing unit, outputs the first category to which the input file belongs and the corresponding reliability. 如請求項6所述的自動化數位檔案分類系統,其中該關聯性資料結構為一樹狀資料結構或一圖形資料結構,其中該樹狀資料結構具有複數根節點及複數子節點,一或多個該子節點分別設定有一屬性,該屬性描述該子節 點與另一子節點之間的關聯性;其中該圖形資料結構具有複數節點及複數邊,各該邊分別連接兩個該節點以描述該兩個節點間的關聯性。 The automatic digital file classification system as described in claim 6, wherein the associative data structure is a tree-like data structure or a graph data structure, wherein the tree-like data structure has multiple root nodes and multiple child nodes, and one or more of the The sub-nodes are respectively set with an attribute, which describes the sub-section A relationship between a point and another child node; wherein the graph data structure has a plurality of nodes and a plurality of edges, and each of the edges connects two nodes to describe the relationship between the two nodes. 如請求項7所述的自動化數位檔案分類系統,其中該機器學習比對模型被設置成基於該從屬類別來擷取該輸入檔案的至少一物件特徵,並且基於該至少一物件特徵對該輸入檔案及該從屬類別中的至少一個數位檔案進行該特徵比對程序。 The automated digital file classification system as described in claim 7, wherein the machine learning comparison model is configured to extract at least one object feature of the input file based on the subordinate category, and to extract the input file based on the at least one object feature and at least one digital file in the subordinate category to perform the feature comparison procedure. 如請求項7所述的自動化數位檔案分類系統,其中該處理單元被設置成在讀取該關聯性資料結構後確認與該第一類別相關聯的複數從屬類別,並且在基於預先訓練完成的該機器學習比對模型將該輸入檔案與該從屬類別中的至少一個數位檔案進行一特徵比對程序的程序中,執行下列步驟:a)由該機器學習比對模型基於該複數從屬類別的其中之一擷取該輸入檔案的至少一物件特徵;b)基於該至少一物件特徵對該輸入檔案及該從屬類別中的至少一個數位檔案進行該特徵比對程序,並獲得一相似性分數;c)判斷該複數從屬類別是否皆比對完成;d)於該複數從屬類別皆比對完成前,重覆執行該步驟a)及該步驟b),以獲得該輸入檔案與其他該從屬類別的該相似性分數;及e)於該複數從屬類別皆比對完成後,依據各該從屬類別的該關聯性分數以及一預設權重值計算該特徵比對程序的該比對結果。 The automated digital archive classification system as described in claim 7, wherein the processing unit is configured to confirm the plurality of subordinate categories associated with the first category after reading the association data structure, and based on the pre-trained completed The machine learning comparison model performs a feature comparison procedure on the input file and at least one digital file in the subcategory, performing the following steps: a) the machine learning comparison model based on one of the plurality of subcategories - extracting at least one object feature of the input file; b) performing the feature comparison procedure on the input file and at least one digital file in the subordinate category based on the at least one object feature, and obtaining a similarity score; c) Judging whether the comparison of the plurality of subcategories is completed; d) before the completion of the comparison of the plurality of subcategories, repeat step a) and step b) to obtain the similarity between the input file and other subcategories and e) after the comparison of the plurality of subcategories is completed, the comparison result of the feature comparison program is calculated according to the relevance score of each of the subcategories and a preset weight value.
TW110131855A 2021-08-27 2021-08-27 Classifying system and classifying method of automatically classifying digital file TWI801982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW110131855A TWI801982B (en) 2021-08-27 2021-08-27 Classifying system and classifying method of automatically classifying digital file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW110131855A TWI801982B (en) 2021-08-27 2021-08-27 Classifying system and classifying method of automatically classifying digital file

Publications (2)

Publication Number Publication Date
TW202309757A TW202309757A (en) 2023-03-01
TWI801982B true TWI801982B (en) 2023-05-11

Family

ID=86690777

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110131855A TWI801982B (en) 2021-08-27 2021-08-27 Classifying system and classifying method of automatically classifying digital file

Country Status (1)

Country Link
TW (1) TWI801982B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183303A (en) * 2020-09-24 2021-01-05 南方电网数字电网研究院有限公司 Transformer equipment image classification method and device, computer equipment and medium
CN112381818A (en) * 2020-12-03 2021-02-19 浙江大学 Medical image identification enhancement method for subclass diseases
US20210182659A1 (en) * 2019-12-16 2021-06-17 NB Ventures, Inc., dba GEP Data processing and classification
US20210216745A1 (en) * 2020-01-15 2021-07-15 DeePathology Ltd. Cell Detection Studio: a system for the development of Deep Learning Neural Networks Algorithms for cell detection and quantification from Whole Slide Images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210182659A1 (en) * 2019-12-16 2021-06-17 NB Ventures, Inc., dba GEP Data processing and classification
US20210216745A1 (en) * 2020-01-15 2021-07-15 DeePathology Ltd. Cell Detection Studio: a system for the development of Deep Learning Neural Networks Algorithms for cell detection and quantification from Whole Slide Images
CN112183303A (en) * 2020-09-24 2021-01-05 南方电网数字电网研究院有限公司 Transformer equipment image classification method and device, computer equipment and medium
CN112381818A (en) * 2020-12-03 2021-02-19 浙江大学 Medical image identification enhancement method for subclass diseases

Also Published As

Publication number Publication date
TW202309757A (en) 2023-03-01

Similar Documents

Publication Publication Date Title
Wang et al. Planit: Planning and instantiating indoor scenes with relation graph and spatial prior networks
Mao et al. Visual aware hierarchy based food recognition
CN105844283B (en) Method, image search method and the device of image classification ownership for identification
Liu et al. Learning to describe scenes with programs
Chen et al. Veram: View-enhanced recurrent attention model for 3d shape classification
Zhang et al. Multi-class ground truth inference in crowdsourcing with clustering
Sener et al. Unsupervised semantic parsing of video collections
Wan et al. Industrial image anomaly localization based on Gaussian clustering of pretrained feature
WO2018014759A1 (en) Method, device and system for presenting clustering data table
CN109145097A (en) A kind of judgement document's classification method based on information extraction
US11200444B2 (en) Presentation object determining method and apparatus based on image content, medium, and device
CA2929180A1 (en) Image object category recognition method and device
CN114565807B (en) Method and device for training target image retrieval model
US20230214679A1 (en) Extracting and classifying entities from digital content items
TWI801982B (en) Classifying system and classifying method of automatically classifying digital file
WO2020135054A1 (en) Method, device and apparatus for video recommendation and storage medium
Wang et al. Food image recognition and food safety detection method based on deep learning
US20220222924A1 (en) Scalable pipeline for machine learning-based base-variant grouping
US10824986B2 (en) Auto-suggesting IT asset groups using clustering techniques
Ren et al. Uncertainty-guided boundary learning for imbalanced social event detection
Gilbert et al. Image and video mining through online learning
Jiao et al. Ieye: Personalized image privacy detection
Parikh et al. Hierarchical semantics of objects (hSOs)
JP2018013886A (en) Recognition easiness index calculation device, method, and program
Hu et al. SAMCL: Subgraph-Aligned Multiview Contrastive Learning for Graph Anomaly Detection