TWI358639B - Malware detection system, data mining module, malw - Google Patents

Malware detection system, data mining module, malw Download PDF

Info

Publication number
TWI358639B
TWI358639B TW96138249A TW96138249A TWI358639B TW I358639 B TWI358639 B TW I358639B TW 96138249 A TW96138249 A TW 96138249A TW 96138249 A TW96138249 A TW 96138249A TW I358639 B TWI358639 B TW I358639B
Authority
TW
Taiwan
Prior art keywords
feature
program
malicious
features
programs
Prior art date
Application number
TW96138249A
Other languages
Chinese (zh)
Other versions
TW200917020A (en
Inventor
Shi Jinn Houng
Kun Asien Hsiao
Original Assignee
Univ Nat Taiwan Science Tech
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Taiwan Science Tech filed Critical Univ Nat Taiwan Science Tech
Priority to TW96138249A priority Critical patent/TWI358639B/en
Publication of TW200917020A publication Critical patent/TW200917020A/en
Application granted granted Critical
Publication of TWI358639B publication Critical patent/TWI358639B/en

Links

Description

13586391358639

三達編號:TW3715PA * 九、發明說明: 【發明所屬之技術領域】 本發明是有關於一種惡意程式偵測系統、資料採礦模 組與惡意程式偵測模組,且特別是有關於一種使用基於資 料採礦技術之惡意程式偵測系統、資料採礦模組與惡意程 式偵測模組。 【先前技術】 • 近年來,惡意程式的演變十分快速。傳統防毒系統係 由已知惡意程式中取出對應之樣版(Pat tern ),存於其資 料庫中。每個程式,包括惡意程式與非惡意程式,均對應 一獨一無二的樣版。當欲偵測一待測程式時,傳統防毒系 統係比對此待測程式所對應的樣版與存於資料庫中的樣 版。當傳統防毒系統比對到此待測程式所對應的樣版與資 料庫中的某一樣版完全相同,傳統防毒系統即偵測出此待 測程式為已知的惡意程式。 ® 然而,惡意程式往往以極快的速度演變成不同的變種 惡意程式。變種惡意程式與其惡意程式的行為係十分類 似,但兩者所對應的樣版仍有差異。舉例來說,當一已知 惡意程式A演變成新的變種惡意程式A’時,即使變種惡 意程式A’與已知惡意程式A的行為類似,且傳統防毒系 統已有已知惡意程式A的樣版’傳統防毒糸統仍無法成功 偵測到變種惡意程式A’ 。如此,傳統防毒系統僅能出偵 測已知惡意程式,無法偵測由已知惡意程式所演變而來, 6 1358639达达编号号: TW3715PA * IX, invention description: [Technical field of the invention] The present invention relates to a malware detection system, a data mining module and a malicious program detection module, and in particular to a use based on Data mining technology malware detection system, data mining module and malware detection module. [Prior Art] • In recent years, the evolution of malicious programs has been very rapid. The traditional anti-virus system removes the corresponding pattern (Pat tern) from the known malware and stores it in its database. Each program, including both malicious and non-malware, corresponds to a unique template. When a program to be tested is to be detected, the conventional anti-virus system is compared to the sample stored in the database for the sample corresponding to the program to be tested. When the traditional anti-virus system is identical to the version of the sample corresponding to the program to be tested, the traditional anti-virus system detects that the program is a known malware. ® However, malware often evolves into very different variants of malware at an extremely fast rate. The variant malware is very similar to the behavior of its malware, but the corresponding versions of the two are still different. For example, when a known malware A evolves into a new variant malware A', even if the variant malware A' is similar to the known malware A, and the conventional antivirus system has a known malware A The sample 'traditional anti-virus system still can't successfully detect the variant malware A'. In this way, the traditional anti-virus system can only detect known malicious programs, and cannot detect the evolution of known malicious programs. 6 1358639

三達編號:TW3715PA 且行為與已知惡意程式類似的新變種惡意程式。因此,傳 統防毒系統無法應付日益增多的變種惡意程式。當新的變 種惡意程式出現時,在傳統防毒系統取得此變種惡意程式 的樣版之前,此惡意程式早已對用戶端的電腦造成傷害。 【發明内容】 本發明係有關於一種惡意程式偵測系統。本發明之惡 意程式偵測系統,僅使用已知惡意程式與已知非惡意程式 的特徵即可偵測出與已知惡意程式同類型但從未出現過 的變種惡意程式。 根據本發明(之第一方面),提出一種資料採礦模組, 用以依據數個已知惡意程式(Ma 1 ware)與數個已知非惡意 程式,輸出一分類(Classif ication )模型(Model)。一 待測程式係依據分類模型被分類為惡意程式與非惡意程 式其中之一。資料採礦模組包括一程式資料庫、一特徵採 礦單元、一特徵篩選單元與一分類模型訓練單元。程式資 料庫用以儲存已知惡意程式與已知非惡意程式。特徵採礦 單元用以由已知惡意程式與已知非惡意程式中萃取出N個 待篩選特徵(feature)。第i個待篩選特徵係為已知惡意 程式和已知非惡意程式中之至少其一與一檔案系統之互 動行為。已知惡意程式與已知非惡意程式中之至少其一係 具有第i個待篩選特徵。i為一小於或等於N之正整數。 特徵篩選單元由N個待篩選特徵篩選出數個有效特徵。每 個有效特徵係實質上主要為已知惡意程式與已知非惡意 7 1358639 三達編號:TW3715PA 已知惡意程式與已知非惡意其中之— ,型分=型訓練單元用以依據有效特二練得η類 吴^。4程式债測模組包括一特徵分析單元、 徵篩選單元與-分類器。特徵分析單元用以由待I墓 寺徵。初步特徵係為待測程式與槽案二之 二特徵筛選單元用以依據有效特徵,由初步 ::!特徵將待測程式分類為惡意程式4:5其依 用以二發明(之第四方面)’提出-種資料採礦方法, 個已:惡意程式與數個已知非惡意程式,二 式:二式係依據分類模型被_ 式:、非心4式其中之一。資料採礦方法包括 广第意程广與已知非惡意程式中萃取出Ν個待』特 = =惡意程式和已知非惡意 或等於Ν之正整數。接:由、Ν之互動订為。1為一小於 右—數接考,由Ν個待篩選特徵篩選出數個 ===== 據有效特徵訓練得到分類模型。 又 根據本發明(之第 方法,用以_—待測程式 測方法包括:首先,由佐、目丨和八心思耘式偵 步特徵係為待測程式與 9 1358639A new variant of the third-generation TW3715PA that behaves like a known malware. As a result, traditional antivirus systems are unable to cope with the growing number of variants. When a new variant of the malware appears, the malicious program has already caused damage to the client's computer before the traditional antivirus system obtained the variant of the malware. SUMMARY OF THE INVENTION The present invention is directed to a malware detection system. The malware detection system of the present invention can detect variants of the same type but never seen by known malicious programs using only the characteristics of known malicious programs and known non-malicious programs. According to a first aspect of the present invention, a data mining module is provided for outputting a classification model based on a number of known malicious programs (Ma 1 ware) and a plurality of known non-malicious programs (Model) ). A program to be tested is classified into one of a malicious program and a non-malicious program according to a classification model. The data mining module includes a program database, a feature mining unit, a feature screening unit and a classification model training unit. The program library is used to store known malware and known non-malicious programs. The feature mining unit is used to extract N features to be screened from known malware and known non-malicious programs. The i-th to-be-screened feature is an interaction behavior between at least one of a known malicious program and a known non-malicious program and a file system. It is known that at least one of the malicious programs and the known non-malicious programs has the i-th to-be-screened feature. i is a positive integer less than or equal to N. The feature screening unit filters out several valid features from the N features to be filtered. Each valid feature is essentially a known malware and a known non-malicious 7 1358639 three-numbered: TW3715PA known malware and known non-malicious - the type = type training unit is used to validate the second Practice η class Wu ^. The program debt testing module includes a feature analyzing unit, a screening unit and a classifier. The feature analysis unit is used to levy the tomb. The preliminary feature is the second program of the program to be tested and the second feature of the slot screening unit. According to the effective feature, the program to be tested is classified into a malicious program by the preliminary::! feature. Aspects] 'Proposed-type data mining methods, one has: malicious programs and several known non-malicious programs, two types: two types are based on the classification model is one of _:: non-heart 4. Data mining methods include the wide-ranging and wide-ranging and non-malicious programs that extract a single positive == malware and a known non-malicious or equal positive integer. Pick up: by the interaction of Ν 订. 1 is a less than right-number reference, and several features are selected by a feature to be screened ===== According to the effective feature training, the classification model is obtained. According to the present invention (the first method, the method for measuring _ to be tested includes: first, the program of the test, the target, and the eight-hearted detection type is the program to be tested and 9 1358639

» 三達編號:TW3715PA f少其中-個程式萃取而得。每個待韩選特徵ρι至fn係 為所有已知惡意程式pm和所有已知非惡意程式⑼中之至 少其一與一檔案系統之互動行為。 舉例來說,所有已知惡意程式的其中數個已知惡意程 式與所有已知非惡意程式的其中數個已知非惡意程式具 有待筛選特徵FW匕表示上述數個已知惡意程式Pm和上 述數個已知非惡意程式P b均具有相同的與檔案系統的互 動行為。 由於惡意程式與非惡意程式使用動態連結檔 (Dynamic Link Library,DLL)的方式不同,因此,在 本發明實施例中,特徵採礦單元丨丨2係萃取每個已知惡意 私式Pm與每個已知非惡意程式此所使用的動態連結檔的 路徑與每個程式所使用的應'用#式介号(AppHcati〇n Program Interface,API) ’ 作為待轉選特徵 π 至 FN。 在本發明實施例中,特徵採礦單元112由一程式,即 φ 已知惡思釭式或已知非惡意程式,所萃取出來的待篩選特 徵共分四種。第一種為此程式直接使用的第一層動態連結 檔。第二種為此程式所使用的第一層動態連結檔至最後一 層動fe連結檔的路控。第三種為上述第一層動態連結檔 中’被此秋式所使用的應用程式介面。第四種為上述第一 層動恕連結檔中,被其他動態連結檔所使用的應用程式介 面。 以萃取某一程式Fi 1 emon. exe與視窗作業系統 (W i n d 〇 w s )的檔案系統的互動行為作為待篩選特徵為 11 1358639» Sanda number: TW3715PA f Less than one of the programs extracted. Each feature to be selected ρι to fn is an interaction behavior of at least one of all known malicious programs pm and all known non-malicious programs (9) with a file system. For example, several known malwares of all known malicious programs and several known non-malicious programs of all known non-malicious programs have a feature to be filtered FW, indicating the above-mentioned several known malicious programs Pm and The above several known non-malicious programs P b have the same interaction behavior with the file system. Since the malicious program and the non-malicious program use a dynamic link library (DLL) in a different manner, in the embodiment of the present invention, the feature mining unit 丨丨 2 extracts each known malicious private Pm and each It is known that the path of the dynamic link file used by the non-malicious program and the AppHcati〇n Program Interface (API) used by each program are to be selected as feature to be selected π to FN. In the embodiment of the present invention, the feature mining unit 112 is divided into four types by a program, that is, φ known as a bad thinking or a known non-malicious program. The first type of dynamic link file that is used directly by this program. The second type of dynamic link file used for this program is the route to the last layer of the link. The third type is the application interface used by the autumn type in the first layer of the dynamic link file. The fourth type is the application interface used by other dynamic link files in the first layer of the first move. To extract the interaction behavior of a program Fi 1 emon. exe and the file system of the Windows operating system (W i n d 〇 w s ) as the feature to be screened is 11 1358639

三達編號:TW3715PA 例’ Filemon.exe所使用的第一層動態連結檔包括 C0MCTL32. DLL、KERNAL32. DLL 與 USER32. DLL 等動態連結 檔。因此,特徵採礦單元112萃取上述第一層動態連結檔 作為Filemon. exe的待篩選特徵。 上述苐一層動恕連結樓可能會使用到第二層動態連 結檔,而上述第二層動態連結檔可能會使用到第三層動態 連結槽’其餘狀況依此類推至最後一層動態連結槽。特徵 採礦單元112即萃取第一層的每個動態連結檔至最後一層 的每個動態連結檔的路徑作為此程式的待篩選特徵。 舉例來說’第一層動態連結檔中的USER32.DLL,係 使用到第二層動態連結檔GDI32.DLL、KERNAL32.DLL與 MS IMG32· DLL等動態連結樓。而上述第二層動態連結樓中 的KERNAL32. DLL係使用到最後一層動態連結檔 NTDLL· DLL。因此,特徵採礦單元112係萃取第一層動態 連結標中的USER32. DLL、第二層動態連結檔中的 KERNAL32. DLL至最後一層動態連結檔中的NTDLL. DLL所形 成的路徑作為F i 1 emon. exe的待篩選特徵。 上述係以萃取第一層動態連結檔中的USER32.DLL所 使用的動態連結檔路徑為例’對於第一層動態連結檔中的 其他動態連結槽,例如C0MCTL32· DLL,亦以相同方式萃取 其所使用的動態連結檔路徑。 特徵採礦單元112亦萃取上述第一層動態連結檔 中,Filemon, exe所使用到的應用程式介面,例如 RtlFreeHeap、RtlAllocateHeap 與 RtlGetLastWin32Error 12 1358639Sanda number: TW3715PA Example ' The first layer of dynamic link files used by Filemon.exe includes dynamic links such as C0MCTL32.DLL, KERNAL32.DLL and USER32.DLL. Therefore, the feature mining unit 112 extracts the above-mentioned first layer dynamic link file as a feature to be screened of Filemon.exe. The second layer of dynamic linking links may use the second layer of dynamic linking files, and the second layer of dynamic linking files may use the third layer of dynamic linking slots, and the rest of the conditions may be pushed to the last layer of dynamic linking slots. Features The mining unit 112 extracts the path of each dynamic link from the first layer to the dynamic link of the last layer as the feature to be screened for this program. For example, USER32.DLL in the first layer of dynamic link file uses dynamic link building such as the second layer dynamic link files GDI32.DLL, KERNAL32.DLL and MS IMG32·DLL. The KERNAL32.DLL in the second layer of the dynamic link building uses the last layer of dynamic link file NTDLL·DLL. Therefore, the feature mining unit 112 extracts the path formed by the USER32.DLL in the first layer dynamic link, the KERNAL32.DLL in the second layer dynamic link file, and the NTDLL.DLL in the last layer dynamic link file as F i 1 The feature of emon. exe to be filtered. The above is an example of extracting the dynamic link path used by USER32.DLL in the first layer dynamic link file. For other dynamic link slots in the first layer dynamic link file, such as C0MCTL32·DLL, the same is also extracted in the same manner. The dynamic link path used. The feature mining unit 112 also extracts the application interface used by Filemon, exe in the first layer dynamic link file, such as RtlFreeHeap, RtlAllocateHeap and RtlGetLastWin32Error 12 1358639.

三達編號:TW3715PA 等等,作為Filemon.exe的待篩選特徵。特徵採礦單元112 並萃取上述第一層動態連結檔中,被其他動態連結檔所使 用的應用程式介面’例如是CsrAllocateCaptureBuffer、 CsrAllocateMessagePointer 與 RtlSizeHeap 等等,作為 Filemon. exe的待篩選特徵。 特徵採礦單元112係由每個已知惡意程式Pm與每個 已知非惡意程式Pb萃取出待篩選特徵F1至fn後,第一Sanda number: TW3715PA, etc., as a feature to be filtered by Filemon.exe. The feature mining unit 112 extracts the application interfaces used by other dynamic links in the first layer dynamic link file, such as CsrAllocateCaptureBuffer, CsrAllocateMessagePointer and RtlSizeHeap, etc., as the to-be-screened feature of Filemon.exe. The feature mining unit 112 first extracts the features to be screened F1 to fn by each known malicious program Pm and each known non-malicious program Pb, first

特徵筛選單元113即由待篩選特徵F1至FN篩選出數個有 效特徵Fe。 ^詳述第一特徵篩選單元113之動作。在本發明實施 例中’由於待篩選特徵的數量很多,且許多待篩選特徵可 /月b同時為已知非惡意程式與已知非惡意程式所具有的特 徵/因此’第—特徵篩選單元113係逐一決定每個待篩選 =徵F1至FN疋否為有效特徵。其中,有效特徵Fe係 實=上主要為已知惡意程式與已知非惡意程式其中之一 所具有的特徵。亦即,有效特徵Fe係僅符合以下 兩個情況其一。筮 ^ ^ 立。 矛一種情況是有效特徵Fe實質上主要為 已二:思、私式所具有的特徵。第二種情況是有效特徵Fe 實質主要為已知非惡意程式所具有的特徵。 例如,在1 Π η η / Λ . υϋ個已知惡意程式Pm與1050個已知非 惡思私式P b,有如n xta … ΟΟΛ d00個已知惡意程式Pm具有待篩選特徵 F1,有320個已知非 lL α , 非惡意程式Pb也具有待篩選特徵F1。The feature screening unit 113 filters out a plurality of effective features Fe from the features to be screened F1 to FN. ^Details of the action of the first feature screening unit 113. In the embodiment of the present invention, 'the number of features to be screened is large, and many features to be screened/month b are both features of known non-malicious programs and known non-malicious programs. Therefore, the first feature filtering unit 113 One by one determines whether each to be filtered = sign F1 to FN is a valid feature. Among them, the effective feature Fe system is mainly characterized by one of the known malicious programs and one of the known non-malicious programs. That is, the effective feature Fe is only one of the following two cases.筮 ^ ^ Standing. One case of spears is that the effective feature Fe is essentially the second one: the characteristics of thinking and private. The second case is that the effective feature Fe is essentially a feature of a known non-malicious program. For example, at 1 Π η η / Λ . 已知 a known malware Pm and 1050 known non-spoofed P b, like n xta ... ΟΟΛ d00 known malware Pm has to be filtered feature F1, there are 320 The known non-lL α , non-malicious program Pb also has the feature F1 to be filtered.

如此,已知惡音鉬斗I Λ ^ , 〜式出現待篩選特徵F1的機率與已知非 惡意程式出現待餘 币選特徵F1的機率相當。待筛選特徵F1 13 13^8639 Ξ達編號:TW3715PA 為已知惡意私式所具有的特徵的確定程度很低,且待筛選 特徵Fi為已知非惡意程式所具有的特徵的確定程度也很 低。亦即’存師選特徵F1並非實質上 式所具有的特徵,亦非實質上主要為已知非惡意程;^ 且 有的特’弟一特徵筛選單元ιΐ3將待筛選特徵^ 剔除’不作為有效特徵pe。 另外’舉例來說,有500個已知惡意程式pm具有待 篩選特徵F2,而僅有2〇個已知非惡意程式托具有待筛選 特徵如此’已知惡意程式出現待_選特徵^的機率, 實質上通大於已知非惡意程式出現待篩 率。待;選特徵打為已知惡意程式所具有的特徵: 程度很阿亦即,待篩選特徵F2實質上主4 择式所具有的特徵。因此,第-特徵意 歸選特徵F2為一有效特徵Fe。 &早疋113決定待 類似地,舉例來說’僅有50個已知亞咅 符歸選特徵F3,而卻有個已知非惡具有 篩選特徵F3。如此,已知非惡意程式出現二阳具有待 的機率,實質上遠大於已知亞音浐 ,師忠特徵F3 的機率。㈣選特徵選特徵F3 嫁定租度很高。亦即,待篩選特徵U ^ 的特徵的 #恶意程式所具有的特徵。因此,第—特徵^要為已知 亦決定待筛選特徵F3為-有效特徵Fe。、*〜早70113 如此’第一特徵筛選單元113即 符_徵F1至㈣中,筛選出 辨方式,由N個 刀辨已知惡意程式 1358639Thus, it is known that the probability of the appearance of the characteristic F1 to be screened by the snoring mop I Λ ^ , 〜 is equivalent to the probability that the known non-malicious program appears to be the remaining feature F1. Feature to be filtered F1 13 13^8639 Ξ达号: TW3715PA The degree of certainty of the features known to be malicious private is very low, and the feature to be screened Fi is the degree of certainty of the characteristics of known non-malicious programs. Very low. That is to say, the feature of the teacher selection F1 is not a feature of the substantive formula, nor is it mainly a known non-malicious process; ^ and some special features of the feature screening unit ιΐ3 remove the feature to be screened ^ Not as a valid feature pe. In addition, for example, there are 500 known malicious programs pm having the feature F2 to be filtered, and only 2 known non-malicious programs have the feature to be filtered, so that the known malware appears to be selected. The probability, in fact, is greater than the known non-malicious program. The selection feature is characterized by a known malware: The degree is very high, that is, the feature to be screened F2 is essentially a feature of the main alternative. Therefore, the first feature means that the feature F2 is an effective feature Fe. & 113 decided to wait similarly, for example, 'only 50 known Aachen character selection features F3, but there is a known non-evil with screening feature F3. In this way, it is known that the non-malicious program has a chance to wait for the yang, which is substantially greater than the probability of the known sub-sounds and the loyalty characteristics F3. (4) The feature selection feature F3 has a high degree of rent. That is, the feature of the # malicious program of the feature of the feature U ^ to be filtered. Therefore, the first feature is also known to determine that the feature to be screened F3 is the effective feature Fe. , *~ early 70113 so that the first feature screening unit 113 is in the symbol _ sign F1 to (four), the screening method is selected, and the known malware is identified by N knives 1358639

三達編號:TW3715PA 與已知非惡意程式的有效特徵Fe。 在本發明實施例中,第一特徵篩選單元113係依據對 應每個待篩選特徵的篩選參數,判斷每個待篩選特徵是否 為一有效特徵。一個待篩選特徵所對應的篩選參數係相關 於此待篩選特徵為已知惡意程式與已知非惡意程式其中 之一類程式所具有的特徵之一確定程度。 舉例來說,在本發明實施例中,當欲決定待篩選特徵 F1至FN中之待篩選特徵Fi是否為有效特徵,第一特徵篩 選單元113係依據具有待篩選特徵Fi的已知惡意程式的 個數與具有待篩選特徵Fi的非惡意程式的個數產生對應 待筛選特徵Fi之筛選參數Pi (未繪示)。筛選參數Pi係 相關於待篩選特徵Fi為已知惡意程式與已知非惡意程式 其中之一類程式所具有的特徵之確定程度。其中,i為一 正整數,。 — 若篩選參數Pi高於一門檻值,表示待筛選特徵Fi為 已知惡意程式與已知非惡意程式其中之一類程式所具有 的特徵之確定程度係足夠高,待篩選特徵Fi係實質上主 要為已知惡意程式與已知非惡意程式其中之一類程式所 具有的特徵,特徵篩選單元113即決定待篩選特徵Fi為 有效特徵F e。 在本發明實施例中,第一特徵篩選單元113係計算每 個待篩選特徵之資訊增益(Information gain),作為每 第1式 個待篩選特徵所對應的篩選參數。 j Gain(S, Fi) = Info(S) - InfoFi (S) 15 1358639Sanda number: TW3715PA and the effective feature Fe of known non-malicious programs. In the embodiment of the present invention, the first feature screening unit 113 determines whether each feature to be selected is a valid feature according to a screening parameter corresponding to each feature to be selected. The screening parameter corresponding to a feature to be selected is related to the degree to which the feature to be screened is one of the characteristics of a known malware and one of the known non-malicious programs. For example, in the embodiment of the present invention, when it is determined whether the feature to be selected Fi in the features to be selected F1 to FN is a valid feature, the first feature screening unit 113 is based on a known malicious program having the feature Fi to be filtered. The number of non-malicious programs having the feature Fi to be filtered generates a screening parameter Pi (not shown) corresponding to the feature to be screened Fi. The screening parameter Pi is related to the feature to be screened, Fi, which is the degree of certainty of the features of the known malware and one of the known non-malicious programs. Where i is a positive integer. - If the screening parameter Pi is higher than a threshold, the degree of certainty of the feature to be filtered Fi is one of the known malicious programs and one of the known non-malicious programs. The degree of certainty of the feature to be screened is sufficiently high. The feature screening unit 113 determines that the feature to be screened Fi is a valid feature F e, which is mainly a feature of a known malicious program and one of the known non-malicious programs. In the embodiment of the present invention, the first feature screening unit 113 calculates the information gain of each feature to be filtered as the screening parameter corresponding to each feature to be selected. j Gain(S, Fi) = Info(S) - InfoFi (S) 15 1358639

三達編號·· TW3715PA 、,第1式係為本發明實施例中,待篩選特徵Fi之資訊 增益。其中,S為所有已知惡意程式p m與所有已知非惡意 程式Pb所成的集合。第!式的鄉)為上述集合 (Entropy),其數學描述如第2式所示。The ternary number TW3715PA and the first formula are the information gains of the feature Fi to be screened in the embodiment of the present invention. Where S is the set of all known malware p m and all known non-malicious programs Pb. The first! The township is the above set (Entropy), and its mathematical description is shown in the second formula.

InMS) = -Yip.\〇g^p) y=1 第2式InMS) = -Yip.\〇g^p) y=1 2nd

亞-二第/ί中’ J係等於1或等於2。A為在所有已知 …已知非惡意程式中,所有已知惡意程式職的比 :二2 =所有已知惡意與已知非惡意程式中,所有已知 非惡意程式所佔的比例。 ,另外’第1式中的/♦#)為待篩選特徵以的 1 數學描述如第3式所示。 ’、 娜·-第3式 在第3式中,k係等於〇或卜&為集合3中,具 有待筛選特徵Fl的已知惡意程式與已知非惡意程式所成、 =集合。〜為集合s中’不具有待_特徵Η的已知惡 思程式與已知非惡意程式所成的集合。因此,^為在; 有已知惡思與已知非惡意程式中,具有待筛選特徵Η的 已知惡,5程式與已知非惡意程式所佔的比例;而㈤ 私士 1 , |5| °惡思與已知非惡意程式中,不具有待_選特徵^ 的已知惡意程式與已知非惡意程式所㈣比例。 另外,第3式中的卿為集合%的烟,其數學 1358639The sub-JD / ί ' J system is equal to 1 or equal to 2. A is the ratio of all known malicious programs in all known...known non-malicious programs: 2 2 = the proportion of all known non-malicious programs among all known malicious and known non-malicious programs. Further, /♦# in the first formula is a mathematical description of the feature to be screened as shown in the third formula. ', 娜·- 3rd Formula In the 3rd formula, k is equal to 〇 or 卜 & is the set 3, the known malware with the feature F1 to be filtered and the known non-malicious program, = set. ~ is a collection of known malicious programs in the set s that do not have a feature to be _ and a known non-malicious program. Therefore, ^ is in; there are known evils and known non-malicious programs, the known evils with the features to be filtered, the proportion of 5 programs and known non-malicious programs; and (5) the privates 1, | 5| ° The ratio of known malware and known non-malicious programs (4) that do not have the feature to be selected in the malicious and known non-malware programs. In addition, the Qing in the third formula is a collection of smoke, its mathematics 1358639

三達編號·· TW37丨5PA 描述如第4式所示。其中,為具有待篩選特徵Fi 的已知惡意程式與已知非惡意程式所成的集合的熵;而 為不具有待篩選特徵Fi的已知惡意程式與已知非 惡意程式所成的集合的熵。 1 1〇§2(^7上·^) 第 4 式 /«=0 在第4式中,m係等於0或1。其中,對於/«V·;), 如%.表示在所有具有待篩選特徵Fi的已知惡意程式與已 知非惡意程式中,具有待篩選特徵Fi的已知惡意程式所 佔的比例;而表示在所有具有待篩選特徵Fi的已知 惡意程式與已知非惡意程式中,具有待篩選特徵Fi的已 知非惡意程式所佔的比例。同理,知X&f;)亦以相同方式得 到。 舉例來說,在1000個已知惡意程式Pm與1050個已 知非惡意程式Pb中,有300個已知惡意程式Pm具有待篩 選特徵F1,有320個已知非惡意程式Pb也具有待篩選特 徵F1。則2050個程式中共有620個程式具有待篩選特徵 Fi,有1430個程式不具有待篩選特徵Fi。1000個已知惡 意程式中有700個已知惡意程式不具有待篩選特徵Fi,而 1050個已知惡意程式中有730個已知惡意程式不具有待篩The three-digit number··TW37丨5PA description is shown in the fourth formula. Wherein, the entropy of the set of known malware and known non-malicious programs having the feature Fi to be filtered; and the set of known malware and known non-malicious programs not having the feature Fi to be filtered entropy. 1 1〇§2(^7上·^) 4th formula /«=0 In the 4th formula, m is equal to 0 or 1. Wherein, for /«V·;), such as %. indicates the proportion of known malicious programs having the feature Fi to be filtered among all known malicious programs and known non-mali programs having the feature Fi to be filtered; Represents the proportion of known non-malicious programs with the feature Fi to be filtered among all known malicious programs and known non-mali programs with feature Fi to be filtered. Similarly, X&f;) is also obtained in the same way. For example, among 1000 known malware Pm and 1050 known non-malicious programs Pb, there are 300 known malware Pm with feature F1 to be filtered, and 320 known non-malicious programs Pb also have to be screened. Feature F1. A total of 620 programs in the 2050 programs have the feature Fi to be filtered, and 1430 programs do not have the feature Fi to be selected. 700 known malicious programs in 1000 known malware do not have the feature Fi to be filtered, and 730 known malware in 1050 known malware do not have to be screened.

Adt V · L LL r ,1000 , 1000 1050 , 1050、 選特徵 F 1 。女口 it匕,Info(S) = -(-log2-+-log2-) < J J 2050 2 2050 2050 2 2050 r , 620 300, 300 320, 320、1430,700 , 700 730 , InfoFi (S) = (— log — + — log —) + (77^7 log + 77^7 log ,而 730 ^2050 620 620 620 °620 2050 1430 1430 1430 1430 ))。如 17 1358639Adt V · L LL r , 1000 , 1000 1050 , 1050, feature F 1 . Female mouth it匕, Info(S) = -(-log2-+-log2-) < JJ 2050 2 2050 2050 2 2050 r , 620 300, 300 320, 320, 1430, 700 , 700 730 , InfoFi (S) = (- log — + — log —) + (77^7 log + 77^7 log , and 730 ^2050 620 620 620 °620 2050 1430 1430 1430 1430 )). Such as 17 1358639

三達編號:TW3715PA 此,即得到待篩選特徵Fi的資訊增益,作為其篩 選參數Pi。 4 由於/_f,〇s)係為待篩選特徵Fi的熵,當待篩選特徵 Fi的熵越大,表示待篩選特徵Fi的資料混亂程度越高。 即表示已知惡意程式出現待篩選特徵Fi的機率與已知非 惡意程式出現待篩選特徵Fi的機率越相近。如此,待篩 選特徵Fi的資訊增益越低。待篩選特徵Fi為已知惡意程 I 式所具有的特徵的確定程度很低,且待篩選特徵Fi為已 知非惡意程式所具有的特徵的確定程度也很低。 因此,在本發明實施例中,當待篩選特徵Fi的資訊 增益低於一門檻值時,第一特徵篩選單元113即剔除待篩 選特徵Fi,不作為有效特徵Fe。 反之,當待篩選特徵Fi的熵越大,表示已知惡意程 式出現待篩選特徵Fi的機率與已知非惡意程式出現待篩 選特徵Fi的機率差距越大。如此,待篩選特徵Fi的資訊 φ 增益越高。如此,對於待篩選特徵Fi,以下兩情況只有其 一會成立。第一種情況是待篩選特徵Fi為已知惡意程式 所具有的特徵的球定程度很南’亦即’待師選特徵F i貫 質上主要為已知惡意程式所具有的特徵。第二種情況是待 篩選特徵Fi為已知非惡意程式所具有的特徵的確定程度 很高,亦即,待篩選特徵Fi實質上主要為已知非惡意程 式所具有的特徵。 因此,當待篩選特徵Fi的資訊增益高於門檻值時, 第一特徵篩選單元113即決定待篩選特徵Fi為有效特徵 18Sanda number: TW3715PA This is the information gain of the feature Fi to be filtered as its screening parameter Pi. 4 Since /_f, 〇s) is the entropy of the feature Fi to be filtered, when the entropy of the feature Fi to be screened is larger, the degree of data confusion of the feature Fi to be screened is higher. That is to say, the probability that a known malicious program has a feature to be filtered Fi is similar to the probability that a known non-malicious program has a feature to be screened Fi. Thus, the lower the information gain of the feature Fi to be screened. The feature to be screened Fi is a known malicious program. The degree of certainty of the feature is very low, and the degree of certainty that the feature to be screened Fi is known to be a non-malicious program is also low. Therefore, in the embodiment of the present invention, when the information gain of the feature to be screened Fi is lower than a threshold, the first feature screening unit 113 rejects the feature to be screened, Fi, and does not serve as the effective feature Fe. On the other hand, when the entropy of the feature to be screened Fi is larger, the probability that the known malicious program has a feature to be screened Fi is larger than the probability that a known non-malicious program appears to be screened for the feature Fi. Thus, the information φ gain of the feature Fi to be filtered is higher. Thus, for the feature Fi to be screened, only the following two cases will be established. The first case is that the feature to be screened Fi is a feature of a known malware that is very south. That is, the feature to be selected is mainly a feature of a known malware. The second case is that the feature to be screened Fi is a highly deterministic feature of a known non-malicious program, i.e., the feature to be screened Fi is essentially a feature of a known non-malicious program. Therefore, when the information gain of the feature to be screened Fi is higher than the threshold, the first feature screening unit 113 determines that the feature to be selected Fi is a valid feature.

Claims (1)

1358639 三達編號:TW3715PA 十、申請專利範圍: 1. 一種資料採礦模組,用以依據複數個已知惡意程 式(Malware)與複數個已知非惡意程式,輸出一分類 (Classification)模型(Model),一待測程式係依據該 分類模型被分類為惡意程式與非惡意程式其中之一,該資 料採礦模組包括: 一程式資料庫,用以儲存該些已知惡意程式與該些已 知非惡意程式; • 一特徵採礦單元,用以由該些已知惡意程式與該些已 知非惡意程式中萃取出N個待篩選特徵(feature ),一第 i個待篩選特徵係為該些已知惡意程式和該些已知非惡意 程式中之至少其一與一檔案系統之互動行為,該些已知惡 意程式與該些已知非惡意程式中之至少其一係具有該第i 個待篩選特徵,i為一小於或等於N之正整數; 一特徵篩選單元,由該N個待篩選特徵篩選出複數個 有效特徵,每該有效特徵係實質上主要為該些已知惡意程 * 式與該些已知非惡意程式其中之一類程式所具有的特 徵;以及 一分類模型訓練單元,用以依據該些有效特徵訓練得 到該分類模型。 2. 如申請專利範圍第1項所述之資料採礦模組,其 中,對於該第i個待篩選特徵,該特徵篩選單元係依據具 有該第i個待篩選特徵的該些已知惡意程式的個數與具有 該第i個待篩選特徵的非惡意程式的個數產生對應該第i 26 13^8639 三達編號:TW3715PA 個待篩選特徵之-第i個篩選參數 相關於該第i個待_選特徵為該广個師選參數係 知非惡意程式其中之一類程式所具有的;=該些已 度1該第i個筛選參數高於—門檻值,該 H 係決定該第i個待篩選特徵為該 s師選單元 3. 如申請專利範圍第2項所述之^料 中,該特徵筛選單元係更依據該些已知亞組,其 知非惡意程式之個數,產生對舞哕 心心柽式與該些已 訊增益(InfomaUcKi gain)作為固待:選特徵之資 4. 如申請專利範圍第1項所述二數。 中,該分類模型訓練單元俜為_ 貝枓才木廣杈組,其 —lhlne,讓)訓機^_n 元係依據魅有則植麟得 4&向|機訓練單 Plane),作為該分類模型。 。平面(Hyper 5·如申請專利範圍第丨項 中,每該有效特徵係為一向量二之貧料採礦模組,其 ^ 該特徵篩選單元更對該些 有效特徵執盯一維度降低運算(Dimensi〇n Reducti〇n), 以降低每該有效特徵之向量維度。 6. 如申請專利範圍第5項所述之資料採礦模組,其 中’該特徵I帛選單元係依據—主成分分析 運算(Principle Component Analysis ’ pCA)分析該些有效特徵之主成分, 並依據該主成分分析運算之結果,降低該些有效特徵之向 量維度。 7. 如申σ月專利範圍第1項所述之資料採礦模組,其 27 : TW3715PA 動關係; —特徵篩選單元,田 + 中夕扣 "、思4王式與複數個已知非亞立和々甘 5式所具有的特徵;以及 ^1358639 Sanda number: TW3715PA X. Patent application scope: 1. A data mining module for outputting a classification model based on a plurality of known malicious programs (Malware) and a plurality of known non-malicious programs. The data to be tested is classified into one of a malicious program and a non-malicious program according to the classification model. The data mining module includes: a program database for storing the known malicious programs and the known ones. a non-malicious program; a feature mining unit for extracting N features to be selected from the known malicious programs and the known non-malicious programs, and an i-th feature to be screened is Knowing at least one of the malware and the known non-malicious programs interacting with a file system, the known malware and the at least one of the known non-malicious programs having the i-th For the feature to be filtered, i is a positive integer less than or equal to N; a feature screening unit that filters a plurality of valid features from the N features to be selected, each of the effective features being substantially A feature to be known for one of the known malicious programs and one of the known non-malicious programs; and a classification model training unit for training the classification model based on the effective features. 2. The data mining module of claim 1, wherein, for the i-th feature to be screened, the feature screening unit is based on the known malicious programs having the i-th to-be-screened feature The number and the number of non-malicious programs having the i-th feature to be filtered are generated corresponding to the i-th 13 13^8639 three-number: TW3715PA to be filtered features - the i-th filter parameter is related to the i-th The _ selection feature is that the general teacher selection parameter is known to be one of the non-malware programs; = the degree 1 has the ith filter parameter higher than the threshold value, and the H system determines the ith number The feature to be screened is the s teacher selection unit. 3. In the material described in claim 2, the feature screening unit is further based on the known subgroups, and the number of non-malicious programs is generated. InfomaUcKi gain is used as a solidification: the characteristics of the selection feature. 4. The second number mentioned in item 1 of the patent application. In the middle, the classification model training unit is _ 枓 枓 木 木 木 , , 其 , , , , , , , , , , , , , , , , , , , , , , , l l l l l ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ model. . Plane (Hyper 5), as in the scope of the patent application, each of the effective features is a vector two lean mining module, and the feature screening unit performs a one-dimensional reduction operation on the effective features (Dimensi 〇n Reducti〇n), to reduce the vector dimension of each valid feature. 6. The data mining module described in claim 5, wherein 'the feature I is selected based on the principal component analysis operation ( Principle Component Analysis 'pCA) analyzes the principal components of the effective features and reduces the vector dimension of the effective features according to the results of the principal component analysis operation. 7. Data mining as described in item 1 of the patent scope Module, 27: TW3715PA dynamic relationship; - feature screening unit, Tian + Zhong Xi buckle ", think 4 king style and a plurality of known non-Ali and 々甘5 style features; and ^ 據該些參考特,用以參考—分類模型,依 程式复中測程式分類為惡意程式與非亞音 得/、 該分類模㈣依據該些有效特徵娜:; 組,Γ中如範圍第12項所述之惡意程式_莫 依據該些有效:;:徵係為一向量,該特徵筛選單元更 14 ^徵’降低每該參考特徵之向量維度。According to these reference features, the reference-classification model is classified into a malicious program and a non-sub-sound according to the program-resolved program, and the classification module (4) is based on the effective features:; The malware described in the item is valid according to the:: the eigen is a vector, and the feature screening unit further reduces the vector dimension of each of the reference features. 組,其中請專利範圍第13項所述之惡隸式债測模 分析結果寺徵篩選單元更依據該些有效特徵之主成分 15降低每該參考特徵之向量維度。 組,其巾如申請專利範圍帛12項所述之惡意程式偵測模 該刀類裔係為一支援向量機分類器(SVM aSSlfler),該分類模型係為一超平面。 16. 如申請專利範圍第12項所述之惡意程式偵測模 組,复φ 、 —、τ ’該些已知惡意程式與該已知非惡意程式係存於 、程式資料庫中,該惡意程式偵測模組更包括一惡意程式 通知显- . 。、早疋’當該待測程式被判定為惡意程式時’將該待測 私式存入該程式資料庫中,作為一新的已知惡意程式。 17. 如申請專利範圍第12項所述之惡意程式偵測模 29 : TW3715PA 組’其中’該特徵分析單元係以 剩程式萃取出該些初步特徵。静心刀析方式由^亥二待 組,Γφ如^專利範圍第12項所述之惡意程式债測模 ’母該初步特徵係為該待測程式使用該檔荦年统 之動態連結稽與應用程式介面的行為。 糸、,'充 έ且,盆中如^專利乾圍第12項所述之惡意程式债測模 二意程式係為病毒程式、螺蟲程式、 寺伊木馬程式與後Η程式的其中一種。 2〇.種惡意程式偵測系統,包括: 一資料採鑛模組,包括: ^式讀庫,用以儲存複數個已知意程式 /、硬數個已知非惡意程式; 1此㈠—特徵採礦單元,用以由該些已知惡意程式盘 選 =程式中萃取出Ν個待_特徵,每該待筛 =ί:Γ槽案系統之互動行為,該些已知惡二 與該些已知非惡意程式令之至少苴一 式 特徵之-,Ν為—正整數;/、係〜有㈣個待筛選 —一第-特徵篩選單元’由該Ν個待篩選特徵篩 二=_效特徵’每該有效特徵係實質上主要為該扭 的二、,Γ與該些已知㈣意其中之—練式所具有一 ^ 一分類模型訓練單元,用以_該些有 糾丨練得到一分類模型;以及 1 30 丄 三達編號:TW3715PA 一赶‘ Π:機5川練單元係依據該些有效特徵訓練得到 分模Γ該分親為一支援向量機 意徵將該待測程式分類為惡意程 雜,1 月專利辜巳圍第20項所述之惡意程式債測系 -母。亥有政特徵係為一向量,該第一特徵篩選單 特徵效特徵執行一維度降低運算’以降低每該有 效特徵之向量維度。 統,JL巾如月專利觀圍第24項所述之惡意程式4貞測系 柄,〜紅特徵篩選單元更依據該些有效特徵,降 低每該參考特徵之向量維度。 降 如申明專利範圍S 20項所述之惡意程式價測系 it主料意程式仙m組更包括—惡意程式通知單 二:ΐ=ί被判定為惡意程式時,將該待測程式存 入該私式貝枓庫中,作為—新的已知惡意程式。 ^ μ請專利範圍第2()項所述之惡意程式铺測系 U,母該初步賴係_待_ =連結樓與應用程式介面的行為,每該峨= 為^已知惡意程式與該些已知非惡意程式中之至少其 -使用該齡系統之動態連結槽與應靠式介㈣行為。 28.如申請專利範圍第2〇項所述之惡意程式_ 統’其中,每該已知惡意程式係為病毒程式、場蟲程式: 特洛伊木馬程式與後門程式的其中一種。 32 Π58639 i達編號:TW3715PA 29. —種育料採礦方法,用以依據複數個已知惡意程 式與複數個已知非惡意程式,輸出一分類模型,一待測程 式係依據該分類模型被分類為惡意程式與非惡意程式其 中之一’該資料採礦方法包括: (a)由該些已知惡意程式與該些已知非惡意程式中 萃取出N個待筛選特徵,一第土個待師選特徵係為該些已 知惡意程式和該些已知非惡意程式中之至少其一與—槽 案系統之互動行為,i為一小於或等於^之正整數;田 (b) 由該N個待缔選特徵篩選出複數個有效特徵,每 該有效特徵係實質上主要為該些已知惡意程式與該些已 知非惡意程式其巾之-類程式所具有的特徵;以及 (c) 依據軸有效餘·得職分類模型。 乂 j I專錄11第29項所述之資料採礦方法, 其t,在步驟(b)中,包括: 乃电 數,:個待筛選特徵所對應之_筛選參 特徵的該些已知亞音m諸U該第^待薛選 徵的非亞音程式的個數與具有該第1個待筛選特 ㈣非4程式的個數產生對應該第 、丹 選參數,該第i.個篩選參數係相‘=徵之一 ,特徵為該些已知惡意程式與該些待 令之了触切具麵频之—確技度;^^式其 ⑽)依據該N個篩選參數,決定該 否為該有效待徵,當該第i個筛選參數選特徵是 定§亥第1個待筛選特徵為該有效特徵。、檀值,決 33 1338639 二達編號:TW3715PA 31. 如申請專利範圍第30項所述之資料採礦方法, ’在步驟(bl)中’該第1個筛選參數係為該第i個 待師選特徵之資訊增益(Information gain)。 32. 如申請專利範圍第29項所述之資料採礦方法, :在步驟(c)中,依據該些有效特徵,以支援向量 機訓練得到-超平面,作為該分類模型。支⑽里 33. 如申請專利範圍第別項所述之The group, wherein the evil levy bond model analysis result described in the thirteenth patent range, the temple sign screening unit further reduces the vector dimension of each of the reference features according to the principal components of the effective features. The group, the towel is as described in the patent application 帛12 item of the malicious program detection mode. The knife type is a support vector machine classifier (SVM aSSlfler), the classification model is a hyperplane. 16. The malicious program detection module described in claim 12, the complex φ, —, τ 'the known malicious programs and the known non-malicious program are stored in the program database, the malicious The program detection module further includes a malicious program notification display. Early, when the program to be tested is judged to be a malicious program, the private entity to be tested is stored in the program database as a new known malware. 17. The malware detection module 29 of claim 12: TW3715PA group 'where' the feature analysis unit extracts the preliminary features with a residual program. The meditation knife analysis method is composed of ^hai two standby group, Γφ, such as the malicious program debt test model described in item 12 of the patent scope. The parental feature is the dynamic link and application of the file. The behavior of the program interface.糸,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 2. A malware detection system comprising: a data mining module comprising: a reading library for storing a plurality of known programs/hard numbers of known non-malicious programs; 1 (a)- a feature mining unit for extracting a feature from the known malware selection=programs, each of which is to be screened=ί: the interaction behavior of the system, the known evils and the It is known that non-malicious programs have at least one characteristic of -, Ν is a positive integer; /, system ~ has (four) to be screened - a first - feature screening unit 'by the two features to be screened two = _ effect The feature 'per effective feature system is essentially the second, the Γ and the known (four) meanings - the training has a categorical model training unit, a classification model; and 1 30 丄 达达号: TW3715PA 一赶' Π: Machine 5 Chuan training unit is trained according to the effective features to obtain the model, the denomination is a support vector machine to classify the program to be tested For the malicious process, the malware test described in item 20 of the patent in January -mother. The Haizheng characteristic is a vector, and the first feature screening single feature feature performs a one-dimensional reduction operation to reduce the vector dimension of each effective feature. In the case of the JL towel, the malicious program described in Item 24 of the monthly patent view, the red feature screening unit further reduces the vector dimension of each reference feature according to the effective features. For example, the malicious program price measurement system described in the S20 patent scope is the main program of the program, and the malware notification group 2 includes: the malicious program notification 2: ΐ=ί is determined to be a malicious program, and the program to be tested is stored. In the private shellfish library, as a new known malware. ^ μ Please refer to the malicious program mapping system described in item 2 () of the patent scope. The mother should initially rely on the behavior of the link building and the application interface. Each of the 峨= is ^known malware and the At least one of the known non-malicious programs - using the dynamic link slot and the dependent mode (4) behavior of the system of age. 28. The malicious program described in claim 2, wherein each known malicious program is a virus program or a worm program: one of a Trojan horse program and a back door program. 32 Π58639 i达号: TW3715PA 29. An nurturing mining method for outputting a classification model based on a plurality of known malicious programs and a plurality of known non-malicious programs, a program to be tested is classified according to the classification model One of the malware and non-malicious programs' mining methods include: (a) extracting N features to be selected from the known malicious programs and the known non-malicious programs. The teacher selection feature is an interaction behavior between the known malicious program and at least one of the known non-malicious programs and the slot system, i is a positive integer less than or equal to ^; the field (b) is The N features to be selected are selected into a plurality of valid features, and each of the valid features is substantially a feature of the known malicious programs and the known non-malicious programs; and (c) According to the axis effective remainder · job classification model. The data mining method described in Item 29 of Ij I, Item 11, in step (b), includes: the number of electricity, the corresponding features of the screening characteristics corresponding to the screening characteristics The number of the non-subsonic programs of the syllabary syllabus and the number of the non-subsequences of the first (four) non-four programs to be screened are corresponding to the first and second selection parameters, the i. The screening parameter is one of the characteristics of the '= sign, which is characterized by the known malicious programs and the certain touch-to-face frequency-authenticity; ^^(10)) is determined according to the N screening parameters. The no-signal is the valid feature. When the ith filter parameter is selected, the first feature to be selected is the valid feature. , Tan value, decision 33 1338639 Erda number: TW3715PA 31. As in the data mining method described in claim 30, 'in step (bl)' the first screening parameter is the i-th waiting Information gain of the teacher's characteristics. 32. For the data mining method described in claim 29, in step (c), based on the effective features, the support vector machine is trained to obtain a hyperplane as the classification model. In the branch (10) 33. As stated in the scope of the patent application 其中’於步驟㈤之後,該方法更包括:’、方法 (b )對該些有效特徵執行-維度降低運算,以降 低母該有效特徵之向量維度。 降 34:如申請專利範圍第33項所述之資料採 非待筛選特徵係為該些已知惡意程式與該歧已知 :广式至少其中之一使用該 : 與應用程式介面的行為。 心運、Μ當 犯.一種惡意程式偵測方法, 否為惡意程式,該惡意程式偵測方法包括彳’ ^抓式是 (a)由該待測程式萃取出複數個初步特徵 步特徵係為該待測程式與— ’ 初 rKx ^ 檔案糸統之互動關係; ⑴依據複數財則後, 複數個參考特徵,該些有效 -初一寺徵師選出 知惡意程式與複數個已知非亞立’、只貝上主要為複數個已 具有的特徵;以及 之續程式所 (c)參考-分類模型,依 程式分類為惡意程式與非惡意程式盆—中/特徵將該待剛 U之—’該分類模型 34 三達編號:丁 W3 715PA 係依據該些有效特徵訓練而得。 沐甘+如申叫專利氣圍S 35項所述之惡意程式偵測方 其中’在步驟⑴之後,該方法更包括: ^據該些有效特徵,降低每該參考特徵之向量維度。 7*如中請專利範圍第&項所述之惡意程式侦測方 袁考;:平型係為一超平面,在步驟⑷中, 二據該些參考特徵,以—支援向量機分類 類為惡#程式與非惡意程式其中之一。 法,其中之― 者料、, )之後,该方法更包括: 為-新的已知定為惡意程式時,將該待測程式作 法二=二第二項广惡意程式偵測方 之動態連结樓舆應用程;介=程式使用該構案系統Wherein after step (five), the method further comprises: ', method (b) performing a dimension reduction operation on the valid features to reduce the vector dimension of the parent effective feature. Descending 34: The information to be selected as described in claim 33 is the known malware and the knowledge is known: at least one of the broad forms uses this: the behavior of the interface with the application. A malicious program detection method, whether it is a malicious program, the malicious program detection method includes: ^ ^ Grab is (a) the test program extracts a plurality of preliminary feature step features The interaction between the program to be tested and the 'initial rKx^ file system; (1) after the plural financial rules, a plurality of reference features, the effective - the first-year temple recruiter selects the malicious program and a plurality of known non-Yalli ', only on the shell is mainly a plurality of already existing features; and the continuation program (c) reference-category model, according to the program classified as malware and non-malicious pots - the middle / feature will be just - The classification model 34 Sanda number: Ding W3 715PA is based on the training of these effective features. The method of detecting a malicious program as described in the patent stipulation S 35, wherein after the step (1), the method further comprises: reducing the vector dimension of each of the reference features according to the effective features. 7* The method for detecting the malicious program described in the patent scope & item is: a flat plane is a hyperplane, in step (4), according to the reference features, the support vector machine classification class One of the evil #programs and non-malicious programs. After the law, among them, the method, the method further includes: For the new known as a malicious program, the program to be tested is the second to the second结 舆 application; 介 = program using the construction system
TW96138249A 2007-10-12 2007-10-12 Malware detection system, data mining module, malw TWI358639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW96138249A TWI358639B (en) 2007-10-12 2007-10-12 Malware detection system, data mining module, malw

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW96138249A TWI358639B (en) 2007-10-12 2007-10-12 Malware detection system, data mining module, malw

Publications (2)

Publication Number Publication Date
TW200917020A TW200917020A (en) 2009-04-16
TWI358639B true TWI358639B (en) 2012-02-21

Family

ID=44726250

Family Applications (1)

Application Number Title Priority Date Filing Date
TW96138249A TWI358639B (en) 2007-10-12 2007-10-12 Malware detection system, data mining module, malw

Country Status (1)

Country Link
TW (1) TWI358639B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI404374B (en) * 2009-12-11 2013-08-01 Univ Nat Taiwan Science Tech Method for training classifier for detecting web spam
US9158919B2 (en) * 2011-06-13 2015-10-13 Microsoft Technology Licensing, Llc Threat level assessment of applications
TWI461952B (en) * 2012-12-26 2014-11-21 Univ Nat Taiwan Science Tech Method and system for detecting malware applications
TWI515598B (en) * 2013-08-23 2016-01-01 國立交通大學 Method of generating distillation malware program, method of detecting malware program and system thereof

Also Published As

Publication number Publication date
TW200917020A (en) 2009-04-16

Similar Documents

Publication Publication Date Title
CN103839003B (en) Malicious file detection method and device
Ye et al. CIMDS: adapting postprocessing techniques of associative classification for malware detection
US9348998B2 (en) System and methods for detecting harmful files of different formats in virtual environments
US9419996B2 (en) Detection and prevention for malicious threats
CN101986324B (en) Asynchronous processing of events for malware detection
JP5281717B2 (en) Using file prevalence in behavioral heuristic notification of aggression
US8108931B1 (en) Method and apparatus for identifying invariants to detect software tampering
CN102841999B (en) A kind of file method and a device for detecting macro virus
US20150172303A1 (en) Malware Detection and Identification
US20120174227A1 (en) System and Method for Detecting Unknown Malware
CN107679403B (en) Lesso software variety detection method based on sequence comparison algorithm
CN102034043A (en) Novel file-static-structure-attribute-based malware detection method
US20190188381A9 (en) Machine learning model for malware dynamic analysis
KR101851233B1 (en) Apparatus and method for detection of malicious threats included in file, recording medium thereof
CN109271780A (en) Method, system and the computer-readable medium of machine learning malware detection model
TW201712586A (en) Method and system for analyzing malicious code, data processing apparatus and electronic apparatus
JP6711000B2 (en) Information processing apparatus, virus detection method, and program
KR101132197B1 (en) Apparatus and Method for Automatically Discriminating Malicious Code
US9152791B1 (en) Removal of fake anti-virus software
TWI358639B (en) Malware detection system, data mining module, malw
WO2008098519A1 (en) A computer protection method based on a program behavior analysis
Darshan et al. Windows malware detection based on cuckoo sandbox generated report using machine learning algorithm
Jang et al. Mal-netminer: malware classification based on social network analysis of call graph
CN104504334B (en) System and method for assessing classifying rules selectivity
CN108959930A (en) Malice PDF detection method, system, data storage device and detection program

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees