TWI721331B - Classification device and classification method - Google Patents
Classification device and classification method Download PDFInfo
- Publication number
- TWI721331B TWI721331B TW107139402A TW107139402A TWI721331B TW I721331 B TWI721331 B TW I721331B TW 107139402 A TW107139402 A TW 107139402A TW 107139402 A TW107139402 A TW 107139402A TW I721331 B TWI721331 B TW I721331B
- Authority
- TW
- Taiwan
- Prior art keywords
- decision tree
- feature set
- classification
- feature
- features
- Prior art date
Links
Images
Abstract
Description
本發明是有關於一種分類裝置以及分類方法。The invention relates to a classification device and a classification method.
目前,大數據的應用逐漸普及。許多行業開始根據大數據的分析結果來進行決策。一般來說,人們會將大數據作為製作數學模型的輸入參數,而後再根據製作完成的模型來進行分類及評估。然而,大數據龐大、複雜且瑣碎。若僅是單純地輸入巨量的資料來建立模型,往往會造成垃圾進,垃圾出(garbage in, garbage out,GIGO)的現象,從而使得製作出的模型失真。At present, the application of big data is gradually popularized. Many industries have begun to make decisions based on the analysis results of big data. Generally speaking, people will use big data as input parameters for making mathematical models, and then classify and evaluate them based on the completed models. However, big data is huge, complex, and trivial. If only a large amount of data is simply input to build a model, it will often cause garbage in, garbage out (GIGO) phenomenon, which will make the produced model distorted.
因此,如何從巨量的資料中找出特別重要或有意義的特徵,是本領域人員致力達到的目標之一。Therefore, how to find particularly important or meaningful features from a huge amount of data is one of the goals that people in the field strive to achieve.
本發明提供一種分類裝置以及分類方法。The invention provides a classification device and a classification method.
本發明的一種分類裝置,適於分類多筆資料,其中多筆資料中的每一者關聯於第一特徵集合,分類裝置包括儲存媒體以及處理器。儲存媒體儲存多個模組。處理器耦接儲存媒體,且存取並執行多個模組。多個模組包括特徵篩選模組、分群模組、隨機森林選擇模組以及分類模組。特徵篩選模組分別計算第一特徵集合中的多個第一特徵的相關性,並且根據多個第一特徵的相關性和權重對第一特徵集合進行篩選,藉以產生第二特徵集合。分群模組根據第二特徵集合對多筆資料進行分群,藉以產生分群結果。隨機森林選擇模組根據第二特徵集合和分群結果產生第一決策樹集合,並且根據第一決策樹集合中的多個第一決策樹的錯誤率對第二特徵集合進行篩選,藉以產生第三特徵集合。分類模組根據分群結果和第三特徵集合產生分類結果。A sorting device of the present invention is suitable for sorting multiple pieces of data, wherein each of the multiple pieces of data is associated with a first feature set, and the sorting device includes a storage medium and a processor. The storage medium stores multiple modules. The processor is coupled to the storage medium, and accesses and executes a plurality of modules. The multiple modules include a feature screening module, a clustering module, a random forest selection module, and a classification module. The feature screening module respectively calculates the correlations of the multiple first features in the first feature set, and screens the first feature set according to the correlations and weights of the multiple first features, so as to generate the second feature set. The grouping module groups the multiple pieces of data according to the second feature set, so as to generate a grouping result. The random forest selection module generates the first decision tree set according to the second feature set and the clustering result, and screens the second feature set according to the error rate of the multiple first decision trees in the first decision tree set, thereby generating the third feature set. Feature collection. The classification module generates a classification result according to the grouping result and the third feature set.
本發明的一種分類方法,適於分類多筆資料,其中多筆資料中的每一者關聯於第一特徵集合。分類方法包括:分別計算第一特徵集合中的多個第一特徵的相關性,並且根據多個第一特徵的相關性和權重對第一特徵集合進行篩選,藉以產生第二特徵集合。根據第二特徵集合對多筆資料進行分群,藉以產生分群結果。根據第二特徵集合和分群結果產生第一決策樹集合。根據第一決策樹集合中的多個第一決策樹的錯誤率對第二特徵集合進行篩選,藉以產生第三特徵集合。根據分群結果和第三特徵集合產生分類結果。The classification method of the present invention is suitable for classifying a plurality of data items, wherein each of the plurality of data items is related to the first feature set. The classification method includes: respectively calculating the correlations of multiple first features in the first feature set, and screening the first feature sets according to the correlations and weights of the multiple first features, so as to generate the second feature set. According to the second feature set, multiple pieces of data are grouped into groups to generate grouping results. A first set of decision trees is generated according to the second feature set and the clustering result. The second feature set is screened according to the error rate of the multiple first decision trees in the first decision tree set, so as to generate the third feature set. The classification result is generated according to the grouping result and the third feature set.
基於上述,本發明能有效地對大量資料的進行縮減,更能精確地取得重要資料,進而能依據重要資料找出有意義的規則。此外,本發明能在眾多建立規則的演算法中,找出對目前的資料分類有一定的可信度之決策樹,且決策樹不會有太複雜而導致過度學習的情況。Based on the above, the present invention can effectively reduce a large amount of data, can obtain important data more accurately, and can find meaningful rules based on the important data. In addition, the present invention can find a decision tree that has a certain degree of credibility for the current data classification among many algorithms for establishing rules, and the decision tree will not be too complicated to cause over-learning.
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.
圖1是根據本發明的實施例繪示分類裝置10的示意圖。分類裝置10可包括處理器100和儲存媒體300。FIG. 1 is a schematic diagram illustrating a
處理器100耦接儲存媒體300,且存取並執行儲存媒體300儲存的多個模組。處理器100可例如是中央處理單元(central processing unit,CPU),或是其他可程式化之一般用途或特殊用途的微處理器(microprocessor)、數位信號處理器(digital signal processor,DSP)、可程式化控制器、特殊應用積體電路(application specific integrated circuit,ASIC)或其他類似元件或上述元件的組合,本發明不限於此。The
儲存媒體300可儲存多個模組,該些模組可包括特徵篩選模組310、分群模組330、隨機森林選擇模組350以及分類模組370。關於該些模組的功能,將於後續說明之。儲存媒體300可例如是任何型態的固定式或可移動式的隨機存取記憶體(random access memory,RAM)、唯讀記憶體(read-only memory,ROM)、快閃記憶體(flash memory)、硬碟(hard disk drive,HDD)、固態硬碟(solid state drive,SSD)或類似元件或上述元件的組合,本發明不限於此。The
圖2是根據本發明的實施例繪示分類方法20的流程圖,其中分類方法20可由分類裝置10實施。分類裝置10和分類方法20適於分類多筆資料,其中所述多筆資料中的每一者關聯於由多個特徵組成的第一特徵集合。以表1的資料為例,表1的資料代表總共23,837筆的客群資料,且每筆資料具有227個欄位,其中每個欄位代表一種特徵。換言之,表1的資料共有227個特徵。亦即,表1之資料中的每一者關聯於由227個特徵組成的第一特徵集合(以下將稱第一特徵集合中的特徵為「第一特徵」)。表 1
在步驟S210,特徵篩選模組310可分別計算第一特徵集合中的多個第一特徵的相關性,並且根據所述多個第一特徵的所述相關性和權重對所述第一特徵集合進行篩選,藉以產生第二特徵集合,其中相關性可例如是以相關係數的形式表示。In step S210, the
圖3是根據本發明的實施例繪示分類方法20的步驟S210的詳細流程圖。具體來說,在步驟S211,特徵篩選模組310可計算每個第一特徵彼此之間的相關係數,並取得每個第一特徵的權重。若第一特徵集合包括m個第一特徵,則特徵篩選模組310可計算m個第一特徵之間的相關係數(例如:第一特徵i與第一特徵j的相關係數為ri,j
),並扣除掉重複的相關係數以及自相關係數,從而得到由個相關係數所組成的相關係數矩陣,如表2所示。另一方面,可以根據每一個第一特徵的重要程度來設定該第一特徵的權重。例如,若對表1資料的使用者來說,性別是較重要的特徵,且市話發話比率是較不重要的特徵,則可將性別的權重設定為較高,且將市話發話比率的權重設定為較低,如表3所示。表3揭示了本實施例所使用的針對表1之各個第一特徵的權重表,其中該些權重可由使用者依實際需求調整之,本發明不限於此。表 2
在取得第一特徵集合中各個第一特徵對應的相關性和權重後,在步驟S212,特徵篩選模組310可以特徵集合資訊的資料格式儲存各個第一特徵的相關資訊。特徵集合資訊的資料格式如下列公式(1)所示。…公式(1) 其中z表示總共z個第一特徵、y表示第一特徵的索引、xy
表示第y個第一特徵的特徵集合資訊、ry ,(y+1)
表示第y個第一特徵和第(y+1)個第一特徵的相關係數(同理,ry,z
表示第y個第一特徵和第z個第一特徵的相關係數)、wy
表示第y個第一特徵的權重、dy
表示第y個第一特徵的是否要刪除的註記。在本實施例中,表1中的部分特徵之特徵集合資訊可整理為如表4所示。表 4
回到圖3,在步驟S213,特徵篩選模組310可根據特徵集合資訊來對各個第一特徵進行篩選,篩選的方式可例如是基於表5的編碼來實施。主要的原理是利用相關係數和權重作為篩選依據。相關係數可看出某一個特徵與其他特徵的相關程度,舉例來說,假設第一特徵包括手機綁約期限以及手機綁約到期月數,兩者具有高度的正相關性,但是因為使用者較重視手機綁約到期月數,故最終僅會留下的特徵為手機綁約到期月數,手機綁約期限的特徵之特徵集合資訊將會被標上刪除的註記。具體來說,特徵篩選模組310會先比對每兩個第一特徵的相關係數,從而由多個第一特徵中挑選出相關性大於閾值的多個特徵,其中所述多個特徵可被視為是高度正相關的一組特徵集合。接著,特徵篩選模組310可從所述多個特徵中挑選出具有最大權重的關鍵特徵,並刪除所述多個特徵中除了所述關鍵特徵以外的其他特徵(例如:在其他特徵的特徵集合資訊標上刪除的註記)。重覆執行上述的步驟直到所有的特徵都被比較過,最後所得到的一或多個關鍵特徵即可組成第二特徵集合。在上述步驟中被標上刪除註記的特徵,可被視作為較不重要的特徵。因此,步驟S210可篩選掉大量不重要之特徵,有效地將龐大的資料縮減為真正重要的資料。舉例來說,在本實施例中,第二特徵集合(即:第一特徵集合經步驟S210篩選後產生的集合)可由表6表示的30個特徵組成。表 5
在步驟S220,分群模組330可根據第二特徵集合對多筆資料進行分群,藉以產生分群結果。具體來說,分群模組330可基於(但不限於)K平均演算法(K-means)以根據所述第二特徵集合對所述多筆資料進行分群,藉以產生所述分群結果。舉例來說,分群模組330可基於如表6的第二特徵集合,根據K平均演算法對表1的資料進行分群,分群結果可如表7所示。表 7
在產生分群結果後,在步驟S230,隨機森林選擇模組350可根據第二特徵集合和分群結果產生第一決策樹集合。具體來說,隨機森林選擇模組350可基於隨機森林演算法(Random Forests)以根據第二特徵集合和分群結果產生具有多個第一決策樹的所述第一決策樹集合。在本實施例中,隨機森林選擇模組350根據第二特徵集合和分群結果產生了30個決策樹模型,且每個決策樹模型的錯誤率可如表8所示。表 8
接著,在步驟S240,隨機森林選擇模組350可根據第一決策樹集合中的多個第一決策樹的錯誤率對第二特徵集合進行篩選,藉以產生第三特徵集合。Next, in step S240, the random
圖4是根據本發明的實施例繪示分類方法20的步驟S240的詳細流程圖。具體來說,在步驟S241,隨機森林選擇模組350可從多個第一決策樹中挑選出具有最低錯誤率的決策樹。在本實施例中,隨機森林選擇模組350可根據表8的資訊挑選出具有最低錯誤率的決策樹「第12個決策樹」。接著,在步驟S242,隨機森林選擇模組350可使用對應於具有所述最低錯誤率的決策樹的特徵組成第三特徵集合。在本實施例中,隨機森林選擇模組350可使用對應於第12個決策樹的特徵組成第三特徵集合。第12個決策樹的特徵(即:第三特徵集合中的特徵),可如表9所示。表 9
回到圖2,在步驟S250,分類模組370可根據分群結果和第三特徵集合產生分類結果。Returning to FIG. 2, in step S250, the
圖5是根據本發明的實施例繪示分類方法的步驟S250的詳細流程圖。在步驟S251,分類模組370可根據分群結果和第三特徵集合產生第二決策樹集合。在本實施例中,分類模組370可將表7的分群結果以及表9的第三特徵集合分別帶入一或多種決策樹演算法,從而產生包括了一或多個第二決策樹的第二決策樹集合。舉例來說,分類模組370可將表7的分群結果以及表9的第三特徵集合分別帶入(但不限於)CART分類決策樹演算法、C5.0分類決策樹演算法以及CHAID分類決策樹演算法,從而得到一CART分類決策樹、一C5.0分類決策樹以及一CHAID分類決策樹,且該些分類決策樹的正確率如表10所示。表 10
在步驟S252,分類模組370可計算第二決策樹集合中的第二決策樹的評分,並且根據評分挑選出最終決策樹。在本實施例中,第二決策樹的評分可由可信度、複雜度以及正確率等三個要素決定。In step S252, the
針對可信度的計算,首先,分類模組370可根據分群結果計算多筆資料在各個分群所對應的第一比率。舉例來說,假設表7的分群結果代表所述多筆資料(即:表1的23,837筆資料)共被分為5群,則分類模組370可計算所述多筆資料分別對應於第1、2、3、4及5群的第一比率,如表11所示。表 11
接著,分類模組370可根據分群結果計算多筆資料在第二決策樹的各個節點所對應的第二比率。具體來說,分類模組370可在各個第二決策樹產生後,直接讀取各個節點的資訊,以取得各分群在節點上所佔的第二比率。以CART分類決策樹為例,假設在步驟S251產生的CART分類決策樹共有16個節點。在本實施例中,CART分類決策樹的第1個節點可包括如表12顯示的資訊。表 12
在取得第一比率和第二比率後,分類模組370可基於公式(2)以根據第一比率和第二比率計算第二決策樹的可信度。…公式(2) 其中j為分群結果的索引、m為分群結果的總分群數量、p代表第二決策樹的第p個節點、k代表第二決策樹的節點的總數量、i代表第二決策樹的索引。在本實施例中,分群共有5群,故m = 5。第1個第二決策樹的節點共有16個,故k = 16,其中,第1個第二決策樹(i = 1)為CART分類決策樹、第2個第二決策樹(i = 2)為C5.0分類決策樹並且第3個第二決策樹(i = 3)為CHAID分類決策樹。After obtaining the first ratio and the second ratio, the
舉例來說,分類模組370可基於公式(2)、表11以及表12的資訊計算出為CART分類決策樹的節點1對應的可信度為。以此類推,分類模組370可分別計算出CART分類決策樹、C5.0分類決策樹以及CHAID分類決策樹的節點對應的可信度,分別如表13、14及15所示。表 13 CART 分類決策樹
再以公式(2)計算完各個第二決策樹的各個節點對應的可信度後,可以公式(3)對第i個第二決策樹的節點之可信度進行正規化,藉以得到對應於第i個第二決策樹的可信度。…公式(3) 其中j為分群結果的索引、m為分群結果的總分群數量、p代表第二決策樹的第p個節點、k代表第二決策樹的節點的總數量、i代表第二決策樹的索引、n為第二決策樹的總數。在本實施例中,共計有3個第二決策樹,故n = 3。以表13、14和15為例,對應表13的CART分類決策樹的可信度計算列表的合計值3.44661719,正規化後為0.263655231。對應表14的C5.0分類決策樹的可信度計算列表的合計值13.07244,正規化後為1。對應表15的CHAID分類決策樹的可信度計算列表的合計值8.1228693,正規化後為0.621373587。After calculating the credibility of each node of each second decision tree with formula (2), the credibility of the node of the i-th second decision tree can be normalized by formula (3) to obtain the corresponding The credibility of the i-th second decision tree. …Formula (3) where j is the index of the clustering result, m is the total number of clusters of the clustering result, p represents the p-th node of the second decision tree, k represents the total number of nodes of the second decision tree, and i represents the second The index of the decision tree, n is the total number of the second decision tree. In this embodiment, there are a total of 3 second decision trees, so n=3. Taking Tables 13, 14 and 15 as examples, the total value of the credibility calculation list of the CART classification decision tree corresponding to Table 13 is 3.44661719, which is 0.263655231 after normalization. The total value of the credibility calculation list of the C5.0 classification decision tree corresponding to Table 14 is 13.07244, which is 1 after normalization. The total value of the trustworthiness calculation list of the CHAID classification decision tree corresponding to Table 15 is 8.1228693, which is 0.621373587 after normalization.
針對複雜度,分類模組370可在產生第二決策樹的過程中取得第i個第二決策樹的複雜度si
,再對此複雜度si
以公式(4)進行正規化。…公式(4) 其中i為第二決策樹的索引、n為第二決策樹的總數。在本實施例中,共計有3個第二決策樹,故n = 3。以表13、14和15為例,對應表13的CART分類決策樹的總節點數為16,則正規化後的複雜度為0.219178082。對應表14的C5.0分類決策樹的總節點數為73,則正規化後的複雜度為1。對應表15的CHAID分類決策樹的總節點數為35,則正規化後的複雜度為0.479452055。For complexity,
針對正確率,分類模組370可在產生第二決策樹的過程中取得第i個第二決策樹的正確率ci 。以表10為例,對應表10的CART分類決策樹的正確率為0.9871,C5.0分類決策樹的正確率為0.9933,CHAID分類決策樹的正確率為0.9787。 For the correctness rate, the classification module 370 can obtain the correctness rate c i of the i-th second decision tree in the process of generating the second decision tree. Taking Table 10 as an example, the correct rate of the CART classification decision tree corresponding to Table 10 is 0.9871, the correct rate of the C5.0 classification decision tree is 0.9933, and the correct rate of the CHAID classification decision tree is 0.9787.
在取得各個第二決策樹的可信度、複雜度以及正確率後,分類模組370可根據第二決策樹的可信度、複雜度以及正確率計算第二決策樹的評分。分類模組370可根據如公式(5)的評分公式計算出各個第二決策樹的評分。…公式(5) 其中j為分群結果的索引、m為分群結果的總分群數量、p代表第二決策樹的第p個節點、k代表第二決策樹的節點的總數量、i代表第二決策樹的索引。計算好的各個第二決策樹的評分,如表16所示。表 16
再計算完各個第二決策樹的評分後,分類模組370可根據評分挑選出最終決策樹。例如,分類模組370可從表16中挑選出評分分數最高的CHAID決策樹作為最終決策樹。After calculating the score of each second decision tree, the
在決定了最終決策樹後,在步驟S253,分類模組370可根據最終決策樹對多筆資料進行分類,藉以產生分類結果。After the final decision tree is determined, in step S253, the
特點及功效Features and effects
本發明之重要特徵選擇方法係採用權重設定法及相關係數檢定法,可保留事先設定的重要特徵參數,同時也能使用相關係數檢定法計算特徵之間的相關係數值,提供大量特徵參數的有效縮減,以使能在大量資料中依據重要特徵參數找出更有意義的分類規則。The important feature selection method of the present invention adopts the weight setting method and the correlation coefficient verification method, which can retain the important feature parameters set in advance, and can also use the correlation coefficient verification method to calculate the correlation coefficient value between features, providing a large number of feature parameters. Reduce to enable more meaningful classification rules to be found in a large amount of data based on important characteristic parameters.
本發明運用隨機森林處理大量輸入特徵參數及評估特徵重要性的優點,找出建立模型錯誤率最低的方法,來挑出最重要的特徵。The invention uses the advantages of random forest to process a large number of input feature parameters and evaluate the importance of features, find the method with the lowest error rate of establishing the model, and pick out the most important features.
本發明採用可信度、複雜度及正確率評分方法,能找出樹的建立不會太複雜而導致過度學習情況,而有一定的可信度並且正確率在一定水準以上的分類決策樹,得到最好的客群資料分類方法。The present invention adopts credibility, complexity and correct rate scoring methods, and can find out the classification decision tree that has certain credibility and accuracy rate above a certain level, and the establishment of the tree will not be too complicated to cause over-learning. Get the best customer group data classification method.
本發明並不侷限於客群資料,可應用在不同種類的資料,能找出資料中隱藏的規則,進而挖掘出大數據資料中的重要規則。The present invention is not limited to customer group data, and can be applied to different types of data, can find hidden rules in the data, and then dig out important rules in big data data.
綜上所述,本發明可透過各個特徵的相關性以及權重來篩選出較為關鍵的特徵,還可以利用分群演算法以及決策樹演算法來對該些較為關鍵的特徵作進一步地篩選,從而找出對所建立的模型來說最為重要的特徵。本發明還可以針對多種不同的決策樹演算法進行綜合性的評分,並根據評分來挑選出用來對資料進行分類的最終決策樹。基此,本發明能有效地對大量資料的進行縮減,更能精確地取得重要資料,進而能依據重要資料找出有意義的規則。此外,本發明能在眾多建立規則的演算法中,找出對目前的資料分類有一定的可信度之決策樹,且決策樹不會有太複雜而導致過度學習的情況。In summary, the present invention can filter out more critical features through the correlation and weight of each feature, and can also use grouping algorithm and decision tree algorithm to further filter these more critical features, so as to find Identify the most important features for the established model. The present invention can also perform comprehensive scoring for a variety of different decision tree algorithms, and select the final decision tree used to classify the data according to the score. Based on this, the present invention can effectively reduce a large amount of data, can obtain important data more accurately, and can find meaningful rules based on the important data. In addition, the present invention can find a decision tree that has a certain degree of credibility for the current data classification among many algorithms for establishing rules, and the decision tree will not be too complicated to cause over-learning.
雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention shall be determined by the scope of the attached patent application.
10:分類裝置100:處理器20:分類方法300:儲存媒體310:特徵篩選模組330:分群模組350:隨機森林選擇模組370:分類模組S210、S211、S212、S213、S220、S230、S240、S241、S242、S250、S251、S252、S253:步驟10: Classification device 100: Processor 20: Classification method 300: Storage medium 310: Feature screening module 330: Grouping module 350: Random forest selection module 370: Classification module S210, S211, S212, S213, S220, S230 , S240, S241, S242, S250, S251, S252, S253: steps
圖1是根據本發明的實施例繪示分類裝置的示意圖。 圖2是根據本發明的實施例繪示分類方法的流程圖。 圖3是根據本發明的實施例繪示分類方法的步驟S210的詳細流程圖。 圖4是根據本發明的實施例繪示分類方法的步驟S240的詳細流程圖。 圖5是根據本發明的實施例繪示分類方法的步驟S250的詳細流程圖。Fig. 1 is a schematic diagram illustrating a classification device according to an embodiment of the present invention. Fig. 2 is a flowchart illustrating a classification method according to an embodiment of the present invention. FIG. 3 is a detailed flowchart of step S210 of the classification method according to an embodiment of the present invention. FIG. 4 is a detailed flowchart of step S240 of the classification method according to an embodiment of the present invention. FIG. 5 is a detailed flowchart of step S250 of the classification method according to an embodiment of the present invention.
20:分類方法 20: Classification method
S210、S220、S230、S240、S250:步驟 S210, S220, S230, S240, S250: steps
Claims (12)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW107139402A TWI721331B (en) | 2018-11-06 | 2018-11-06 | Classification device and classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW107139402A TWI721331B (en) | 2018-11-06 | 2018-11-06 | Classification device and classification method |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202018527A TW202018527A (en) | 2020-05-16 |
TWI721331B true TWI721331B (en) | 2021-03-11 |
Family
ID=71895754
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW107139402A TWI721331B (en) | 2018-11-06 | 2018-11-06 | Classification device and classification method |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI721331B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090281981A1 (en) * | 2008-05-06 | 2009-11-12 | Chen Barry Y | Discriminant Forest Classification Method and System |
TW201737058A (en) * | 2016-03-31 | 2017-10-16 | Alibaba Group Services Ltd | Method and apparatus for training model based on random forest |
US20180176243A1 (en) * | 2016-12-16 | 2018-06-21 | Patternex, Inc. | Method and system for learning representations for log data in cybersecurity |
-
2018
- 2018-11-06 TW TW107139402A patent/TWI721331B/en active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090281981A1 (en) * | 2008-05-06 | 2009-11-12 | Chen Barry Y | Discriminant Forest Classification Method and System |
TW201737058A (en) * | 2016-03-31 | 2017-10-16 | Alibaba Group Services Ltd | Method and apparatus for training model based on random forest |
US20180176243A1 (en) * | 2016-12-16 | 2018-06-21 | Patternex, Inc. | Method and system for learning representations for log data in cybersecurity |
Non-Patent Citations (1)
Title |
---|
https://blog.csdn.net/zjupeco/article/details/77371645 * |
Also Published As
Publication number | Publication date |
---|---|
TW202018527A (en) | 2020-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108898479B (en) | Credit evaluation model construction method and device | |
CN108363810B (en) | Text classification method and device | |
CN109508374B (en) | Text data semi-supervised clustering method based on genetic algorithm | |
TW202022716A (en) | Clustering result interpretation method and device | |
WO2010061537A1 (en) | Search device, search method, and recording medium on which programs are stored | |
US8639643B2 (en) | Classification of a document according to a weighted search tree created by genetic algorithms | |
CN108280236A (en) | A kind of random forest visualization data analysing method based on LargeVis | |
WO2023155508A1 (en) | Graph convolutional neural network and knowledge base-based paper correlation analysis method | |
CN110956277A (en) | Interactive iterative modeling system and method | |
CN108733745A (en) | A kind of enquiry expanding method based on medical knowledge | |
CN111797267A (en) | Medical image retrieval method and system, electronic device and storage medium | |
WO2020147259A1 (en) | User portait method and apparatus, readable storage medium, and terminal device | |
Zada et al. | Performance evaluation of simple K-mean and parallel K-mean clustering algorithms: big data business process management concept | |
CN110222177A (en) | A kind of initial cluster center based on K- means clustering algorithm determines method and device | |
TWI721331B (en) | Classification device and classification method | |
JP5929532B2 (en) | Event detection apparatus, event detection method, and event detection program | |
Zhang et al. | Research on borrower's credit classification of P2P network loan based on LightGBM algorithm | |
CN112214684A (en) | Seed-expanded overlapped community discovery method and device | |
CN111611228A (en) | Load balance adjustment method and device based on distributed database | |
CN111339294B (en) | Customer data classification method and device and electronic equipment | |
KR101085066B1 (en) | An Associative Classification Method for detecting useful knowledge from huge multi-attributes dataset | |
JP4125951B2 (en) | Text automatic classification method and apparatus, program, and recording medium | |
CN114048796A (en) | Improved hard disk failure prediction method and device | |
Le et al. | Optimizing genetic algorithm in feature selection for named entity recognition | |
CN114024912A (en) | Network traffic application identification analysis method and system based on improved CHAMELEON algorithm |