TWI658372B - Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof - Google Patents

Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof Download PDF

Info

Publication number
TWI658372B
TWI658372B TW106143548A TW106143548A TWI658372B TW I658372 B TWI658372 B TW I658372B TW 106143548 A TW106143548 A TW 106143548A TW 106143548 A TW106143548 A TW 106143548A TW I658372 B TWI658372 B TW I658372B
Authority
TW
Taiwan
Prior art keywords
program operation
operation sequence
detection model
algorithm
abnormal behavior
Prior art date
Application number
TW106143548A
Other languages
Chinese (zh)
Other versions
TW201928744A (en
Inventor
魏得恩
謝志宏
孔祥重
Original Assignee
財團法人資訊工業策進會
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 財團法人資訊工業策進會 filed Critical 財團法人資訊工業策進會
Priority to TW106143548A priority Critical patent/TWI658372B/en
Application granted granted Critical
Publication of TWI658372B publication Critical patent/TWI658372B/en
Publication of TW201928744A publication Critical patent/TW201928744A/en

Links

Abstract

一種異常行為偵測模型生成裝置及其異常行為偵測模型生成方法。異常行為偵測模型生成裝置對與異常行為相關聯之複數程式操作序列資料中複數程式操作序列,進行詞性分析,以產生複數詞向量,並將該等詞向量分群。基於分群後的結果,異常行為偵測模型生成裝置獲得各程式操作序列資料之特徵向量,以使用該等特徵向量,對一分類演算法進行監督式學習,來生成異常行為偵測模型。 An abnormal behavior detection model generating device and an abnormal behavior detection model generating method thereof. The abnormal behavior detection model generating device performs part-of-speech analysis on the plural program operation sequence in the plural program operation sequence data associated with the abnormal behavior to generate a plural word vector and group the word vectors. Based on the results after clustering, the abnormal behavior detection model generating device obtains feature vectors of the operating sequence data of each program, and uses these feature vectors to perform supervised learning on a classification algorithm to generate an abnormal behavior detection model.

Description

異常行為偵測模型生成裝置及其異常行為偵測模型生成方法 Device for generating abnormal behavior detection model and method for generating abnormal behavior detection model

本發明係關於異常行為偵測模型生成裝置及其異常行為偵測模型生成方法。具體而言,本發明之異常行為偵測模型生成裝置基於與異常行為相關聯之複數程式操作序列資料中之複數程式操作序列,生成異常行為偵測模型。 The invention relates to a device for generating abnormal behavior detection models and a method for generating abnormal behavior detection models. Specifically, the abnormal behavior detection model generating device of the present invention generates an abnormal behavior detection model based on a complex program operation sequence in a complex program operation sequence data associated with the abnormal behavior.

隨著科技的快速發展,人們對於電腦及網路的依賴也與日俱增。基於各式各樣的目的,有心人士會透過系統漏洞或惡意程式入侵網路上的伺服器/電腦,以竊取資料或癱瘓系統。 With the rapid development of technology, people's dependence on computers and the Internet has also increased. For a variety of purposes, people with intentions can invade servers / computers on the network through system vulnerabilities or malicious programs to steal data or paralyze the system.

針對這些入侵行為,目前的習知技術採用專家規則式(signature-based)或靜態特徵(Static Feature)的偵測機制來防護。然而,這些偵測機制係基於預先決定之專家規則或靜態特徵來判斷異常的程式操作行為,故偵測手段上受限於固定的形式且難以抵禦特徵混淆的惡意程式。此外,動態行為序列分析(Dynamic Analysis)常受限於沙箱(Sandbox)環境設定的不同,故當惡意程式的行為序列長度不一且富含雜質的時候,難有泛用型的特徵表達式作為判斷異常的程式操作行為之依據。 In response to these intrusions, current known technologies use expert signature-based or static feature detection mechanisms to protect them. However, these detection mechanisms are based on pre-determined expert rules or static characteristics to determine abnormal program operation behaviors. Therefore, the detection methods are limited to fixed forms and difficult to resist malicious programs with confusing features. In addition, Dynamic Analysis is often limited by the Sandbox environment settings. Therefore, when the behavior sequence of malicious programs is different in length and rich in impurities, it is difficult to have general-purpose characteristic expressions. As a basis for judging abnormal program operation behavior.

有鑑於此,如何建立一種異常行為偵測模型,其無需依賴預先決定之專家規則或靜態特徵,亦不受沙箱(Sandbox)環境設定的不同而有所影響,乃是業界亟待解決的問題。 In view of this, how to establish an abnormal behavior detection model does not need to rely on pre-determined expert rules or static features, and is not affected by the different settings of the sandbox environment. This is a problem that the industry needs to solve urgently.

本發明之目的在於提供一種異常行為偵測模型。本發明透過對與異常行為相關聯之複數程式操作序列資料中之複數程式操作序列進行詞性分析,以產生複數詞向量,並將該等詞向量分群。基於分群後的結果,本發明可獲得各程式操作序列資料之特徵向量,以根據該等特徵向量,對一分類演算法進行監督式學習,來生成異常行為偵測模型。不同於習知技術,本發明所生成之異常行為偵測模型可基於程式操作序列的詞性分群結果,來獲得程式操作序列資料的特徵向量,故可有效地偵測抵禦特徵混淆的惡意程式,且無需依賴預先決定之專家規則或靜態特徵,亦不受沙箱(Sandbox)環境設定的不同而有所影響。 The object of the present invention is to provide an abnormal behavior detection model. The present invention performs part-of-speech analysis on plural program operation sequences in plural program operation sequence data associated with abnormal behavior to generate plural word vectors and group the word vectors into groups. Based on the results after clustering, the present invention can obtain the feature vectors of the operating sequence data of each program, and perform supervised learning of a classification algorithm based on the feature vectors to generate an abnormal behavior detection model. Unlike the conventional technology, the abnormal behavior detection model generated by the present invention can obtain the feature vector of the program operation sequence data based on the part-of-speech clustering result of the program operation sequence, so it can effectively detect a malicious program that resists feature confusion and There is no need to rely on pre-determined expert rules or static characteristics, and it is not affected by the different sandbox environment settings.

為達上述目的,本發明揭露一種異常行為偵測模型生成裝置,其包含:一儲存器以及一處理器。該儲存器用以儲存複數程式操作序列資料及複數行為標籤。各該程式操作序列資料記載複數程式操作序列。各該程式操作序列資料對應至該等行為標籤其中之一。該處理器,電性連接至該儲存器,並用以執行下列操作:透過一詞嵌入(word embedding)模型運算該等程式操作序列資料之該等程式操作序列,以產生複數詞向量,各該詞向量對應至該等程式操作序列其中之一;基於一分群演算法,將該等詞向量分群為複數詞向量群組;將各該程式操作序列資料之該等程式操作序列分別與各該詞向量群組所包含之至少一該等詞向量所對應之至少一該等程式操 作序列進行一比對,以產生各該程式操作序列資料之一特徵向量;基於該等特徵向量及該等行為標籤,進行一分類演算法之一監督式學習,以生成一分類器,該分類器係用以將該等特徵向量分類以對應至該等行為標籤;以及基於該等詞向量群組及該分類器,生成一異常行為偵測模型。 To achieve the above object, the present invention discloses an abnormal behavior detection model generating device, which includes: a memory and a processor. The memory is used to store plural program operation sequence data and plural behavior tags. Each of the program operation sequence data records a plurality of program operation sequences. Each program operation sequence data corresponds to one of the behavior tags. The processor is electrically connected to the memory and is used to perform the following operations: the program operation sequence of the program operation sequence data is calculated by a word embedding model to generate a complex word vector, each of the words The vector corresponds to one of the program operation sequences; based on a group algorithm, the word vectors are grouped into a plurality of word vector groups; the program operation sequences of each program operation sequence data are respectively associated with the word vectors At least one such program operation corresponding to at least one of the word vectors contained in the group Make a comparison of the sequences to generate a feature vector of each program operation sequence data; based on the feature vectors and the behavior labels, perform a supervised learning of a classification algorithm to generate a classifier, the classification The classifier is used for classifying the feature vectors to correspond to the behavior labels; and generating an abnormal behavior detection model based on the word vector group and the classifier.

此外,本發明更揭露一種用於一異常行為偵測模型生成裝置之異常行為偵測模型生成方法。該異常行為偵測模型生成裝置包含一儲存器及一處理器。該儲存器儲存複數程式操作序列資料及複數行為標籤。各該程式操作序列資料記載複數程式操作序列。各該程式操作序列資料對應至該等行為標籤其中之一。該異常行為偵測模型生成方法由該處理器執行且包含下列步驟:透過一詞嵌入(word embedding)模型,運算該等程式操作序列資料之該等程式操作序列,以產生複數詞向量,各該詞向量對應至該等程式操作序列其中之一;基於一分群演算法,將該等詞向量分群為複數詞向量群組;將各該程式操作序列資料之該等程式操作序列分別與各該詞向量群組所包含之至少一該等詞向量所對應之至少一該等程式操作序列進行一比對,以產生各該程式操作序列資料之一特徵向量;基於該等特徵向量及該等行為標籤,進行一分類演算法之一監督式學習,以生成一分類器,該分類器係用以將該等特徵向量分類以對應至該等行為標籤;以及基於該等詞向量群組及該分類器,生成一異常行為偵測模型。 In addition, the present invention further discloses an abnormal behavior detection model generation method for an abnormal behavior detection model generation device. The abnormal behavior detection model generating device includes a memory and a processor. The memory stores plural program operation sequence data and plural behavior tags. Each of the program operation sequence data records a plurality of program operation sequences. Each program operation sequence data corresponds to one of the behavior tags. The abnormal behavior detection model generation method is executed by the processor and includes the following steps: through a word embedding model, computing the program operation sequences of the program operation sequence data to generate plural word vectors, each of which The word vector corresponds to one of the program operation sequences; based on a group algorithm, the word vectors are grouped into plural word vector groups; the program operation sequences of each program operation sequence data are respectively associated with each of the words. Perform a comparison of at least one of the program operation sequences corresponding to at least one of the word vectors contained in the vector group to generate a feature vector of each of the program operation sequence data; based on the feature vectors and the behavior labels To perform a supervised learning of one of the classification algorithms to generate a classifier, which is used to classify the feature vectors to correspond to the behavior labels; and based on the word vector group and the classifier To generate an abnormal behavior detection model.

在參閱圖式及隨後描述之實施方式後,此技術領域具有通常知識者便可瞭解本發明之其他目的,以及本發明之技術手段及實施態樣。 After referring to the drawings and the embodiments described later, those with ordinary knowledge in the technical field can understand other objectives of the present invention, as well as technical means and implementation modes of the present invention.

1‧‧‧異常行為偵測模型生成裝置 1‧‧‧ abnormal behavior detection model generating device

11‧‧‧儲存器 11‧‧‧Memory

13‧‧‧處理器 13‧‧‧ processor

AL‧‧‧行為標籤 AL‧‧‧ Behavior Label

POSD‧‧‧程式操作序列資料 POSD‧‧‧Program operation sequence data

WVD‧‧‧詞向量分布空間 WVD‧‧‧Word Vector Distribution Space

G1-G4‧‧‧詞向量群組 G1-G4‧‧‧Word Vector Group

V1-V11‧‧‧詞向量 V1-V11‧‧‧ word vectors

S501-S509‧‧‧步驟 S501-S509‧‧‧step

第1圖係本發明之異常行為偵測模型生成裝置1之示意圖;第2A圖係一程式操作序列資料之示意圖;第2B圖係另一程式操作序列資料之示意圖;第3圖係描繪各詞向量於一二維空間中之分布;第4圖係描繪分群後之各詞向量群組;以及第5圖係本發明之異常行為偵測模型生成方法之流程圖。 Figure 1 is a schematic diagram of the abnormal behavior detection model generating device 1 of the present invention; Figure 2A is a schematic diagram of a program operation sequence data; Figure 2B is a schematic diagram of another program operation sequence data; and Figure 3 is a drawing depicting each word The distribution of vectors in a two-dimensional space; FIG. 4 is a flowchart depicting groups of word vectors after grouping; and FIG. 5 is a flowchart of the method for generating an abnormal behavior detection model of the present invention.

以下將透過實施例來解釋本發明內容,本發明的實施例並非用以限制本發明須在如實施例所述之任何特定的環境、應用或特殊方式方能實施。因此,關於實施例之說明僅為闡釋本發明之目的,而非用以限制本發明。需說明者,以下實施例及圖式中,與本發明非直接相關之元件已省略而未繪示,且圖式中各元件間之尺寸關係僅為求容易瞭解,並非用以限制實際比例。 The content of the present invention will be explained below through embodiments. The embodiments of the present invention are not intended to limit the present invention to be implemented in any specific environment, application or special manner as described in the embodiments. Therefore, the description of the embodiments is only for the purpose of explaining the present invention, rather than limiting the present invention. It should be noted that in the following embodiments and drawings, components not directly related to the present invention have been omitted and not shown, and the dimensional relationship between the components in the drawings is only for easy understanding, and is not intended to limit the actual proportion.

本發明第一實施例如第1-4圖所示。第1圖係本發明之異常行為偵測模型生成裝置1之示意圖。異常行為偵測模型生成裝置1包含一儲存器11以及一處理器13。處理器13電性連接至儲存器11。儲存器11用以儲存複數程式操作序列資料POSD及複數行為標籤AL。各程式操作序列資料POSD記載複數程式操作序列。舉例而言,該等程式操作序列可為一動態操程式操作序列,例如:一應用程式介面(Application Programming Interface;API)序列、一系統呼叫(System Call)序列,但不限於此。於一實施例中,動態程式操作序列可透過一追蹤程式擷取。再舉例而言,該等程式操作序列亦可為一靜態程式操作序列,例如:一運算碼(Operation Code;Opcode)序列, 但不限於此。於一實施例中,靜態程式操作序列可透過一反編譯程式獲得。 A first embodiment of the present invention is shown in Figs. 1-4. FIG. 1 is a schematic diagram of the abnormal behavior detection model generating device 1 of the present invention. The abnormal behavior detection model generating device 1 includes a memory 11 and a processor 13. The processor 13 is electrically connected to the storage 11. The memory 11 is used to store plural program operation sequence data POSD and plural behavior labels AL. Each program operation sequence data POSD records a plurality of program operation sequences. For example, the program operation sequence may be a dynamic program operation sequence, such as: an Application Programming Interface (API) sequence, a System Call sequence, but is not limited thereto. In one embodiment, the dynamic program operation sequence can be retrieved by a tracking program. For another example, the program operation sequences can also be a static program operation sequence, such as an Operation Code (Opcode) sequence. But it is not limited to this. In one embodiment, the static program operation sequence can be obtained by a decompiler.

程式操作序列資料POSD對應至該等行為標籤AL(例如:一正常行為標籤、一異常行為標籤等,但不限於此)。於一實施例中,該等程式操作序列資料POSD中包含複數異常程式操作序列資料,且各異常程式操作序列資料與一惡意程式相關聯。在此情況下,該等行為標籤AL可更包含一惡意廣告(Adware)程式、一蠕蟲(Worm)程式、一木馬(Trojan)程式等,但不限於此。 The program operation sequence data POSD corresponds to the behavior labels AL (for example, a normal behavior label, an abnormal behavior label, etc., but is not limited thereto). In one embodiment, the program operation sequence data POSD includes a plurality of abnormal program operation sequence data, and each abnormal program operation sequence data is associated with a malicious program. In this case, the behavior labels AL may further include an Adware program, a Worm program, a Trojan program, and the like, but are not limited thereto.

以Opcode序列作為說明,如第2A圖所示,其係程式操作序列資料POSD之一範例,其所包含之該等程式操作序列係為Opcode序列。須說明者,基於版面的限制,第2A圖中所顯示之Opcode序列僅為程式操作序列資料POSD中的一部分。處理器13透過一詞嵌入(word embedding)模型,例如:一詞至向量(Word2Vec)模型或一獨熱編碼(One-Hot Encoding)模型,運算該等程式操作序列資料POSD之該等程式操作序列,以產生複數詞向量。各詞向量對應至該等程式操作序列其中之一。 The Opcode sequence is used as an illustration. As shown in FIG. 2A, it is an example of the program operation sequence data POSD, and the program operation sequences contained therein are Opcode sequences. It should be noted that, due to layout restrictions, the Opcode sequence shown in Figure 2A is only a part of the program operation sequence data POSD. The processor 13 uses the word embedding model, for example, a Word2Vec model or a One-Hot Encoding model, to calculate the program operation sequences of the program operation sequence data POSD. To generate a plural word vector. Each word vector corresponds to one of the program operation sequences.

舉例而言,該等程式操作序列包含「xor」、「sub」、「add」、「and」、「push」、「pop」、「xchg」、「inc」、「cmp」、「jmp」、「jz」,處理器13透過詞嵌入模型對該等程式操作序列進行運算,並產生對應程式操作序列之詞向量V1-V11。在此假設詞向量V1對應至「xor」,詞向量V2對應至「sub」,詞向量V3對應至「add」,詞向量V4對應至「and」,詞向量V5對應至「push」,詞向量V6對應至「pop」,詞向量V7對應至「xchg」,詞向量V8對應至「inc」,詞向量V9對應至「cmp」,詞向量V10對應至「jmp」,詞向量V11對應至「jz」。 For example, these program operation sequences include "xor", "sub", "add", "and", "push", "pop", "xchg", "inc", "cmp", "jmp", "Jz", the processor 13 operates on these program operation sequences through the word embedding model, and generates word vectors V1-V11 corresponding to the program operation sequences. It is assumed here that the word vector V1 corresponds to "xor", the word vector V2 corresponds to "sub", the word vector V3 corresponds to "add", the word vector V4 corresponds to "and", the word vector V5 corresponds to "push", and the word vector V6 corresponds to "pop", word vector V7 corresponds to "xchg", word vector V8 corresponds to "inc", word vector V9 corresponds to "cmp", word vector V10 corresponds to "jmp", and word vector V11 corresponds to "jz" ".

另外,以API序列作為說明,如第2B圖所示,其係程式操作 序列資料POSD之一範例,其所包含之該等程式操作序列係為API序列。須說明者,基於版面的限制,第2B圖中所顯示之API序列僅為程式操作序列資料POSD中的一部分。同樣地,處理器13可透過詞嵌入模型,運算該等程式操作序列資料POSD之該等程式操作序列,以產生複數詞向量。各詞向量對應至該等程式操作序列其中之一。 In addition, the API sequence is used as an illustration. As shown in Figure 2B, it is a program operation. An example of sequence data POSD, the sequence of program operations contained in it is an API sequence. It should be noted that the API sequence shown in Figure 2B is only a part of the program operation sequence data POSD due to layout restrictions. Similarly, the processor 13 may calculate the program operation sequence of the program operation sequence data POSD through the word embedding model to generate a complex word vector. Each word vector corresponds to one of the program operation sequences.

舉例而言,該等程式操作序列包含「GetSystemInfo」、「GetFileSize」、「GetSystemDirectoryW」、「GetSystemMetrics」、「RegQueryValueExA」、「RegOpenKeyExA」、「LdrLoadDll」、「NtCreatFile」、「NtReadfile」、「NtClose」、「NtOpenDirectoryObject」,處理器13透過詞嵌入模型對該等程式操作序列進行運算,並產生對應各程式操作序列之詞向量V1-V11。在此假設詞向量V1對應至「GetSystemInfo」,詞向量V2對應至「GetFileSize」,詞向量V3對應至「GetSystemDirectoryW」,詞向量V4對應至「GetSystemMetrics」,詞向量V5對應至「RegQueryValueExA」,詞向量V6對應至「RegOpenKeyExA」,詞向量V7對應至「LdrLoadDll」,詞向量V8對應至「NtCreatFile」,詞向量V9對應至「NtReadfile」,詞向量V10對應至「NtClose」,詞向量V11對應至「NtOpenDirectoryObject」。 For example, these program operation sequences include "GetSystemInfo", "GetFileSize", "GetSystemDirectoryW", "GetSystemMetrics", "RegQueryValueExA", "RegOpenKeyExA", "LdrLoadDll", "NtCreatFile", "NtReadfile", "NtClose", "NtOpenDirectoryObject", the processor 13 operates on these program operation sequences through the word embedding model, and generates word vectors V1-V11 corresponding to each program operation sequence. Here it is assumed that the word vector V1 corresponds to "GetSystemInfo", the word vector V2 corresponds to "GetFileSize", the word vector V3 corresponds to "GetSystemDirectoryW", the word vector V4 corresponds to "GetSystemMetrics", the word vector V5 corresponds to "RegQueryValueExA", and the word vector V6 corresponds to "RegOpenKeyExA", word vector V7 corresponds to "LdrLoadDll", word vector V8 corresponds to "NtCreatFile", word vector V9 corresponds to "NtReadfile", word vector V10 corresponds to "NtClose", and word vector V11 corresponds to "NtOpenDirectoryObject ".

第3圖顯示一詞向量分布空間WVD。須說明者,為簡化說明,本實施例中詞向量分布空間WVD係以二維空間來表示詞向量的分布。然而,於實際操作上,基於程式操作序列資料的類型,開發者可自行決定詞向量分布空間WVD之維度。由於所屬技術領域中具有通常知識者可瞭解如何設定輸出之空間維度,故在此不再加以贅述。 Figure 3 shows the word vector distribution space WVD. It should be noted that, in order to simplify the description, the word vector distribution space WVD in this embodiment represents the distribution of word vectors in a two-dimensional space. However, in actual operation, based on the type of program operation sequence data, the developer can determine the dimension of the word vector distribution space WVD. Since a person having ordinary knowledge in the technical field can understand how to set the spatial dimension of the output, it will not be repeated here.

於詞向量分布空間WVD中,位置較接近之詞向量具有類似 的詞性或語意。因此,本發明係基於非監督式學習之一分群演算法將這些詞向量進行分群,以作為後續擷取各程式操作序列資料POSD之特徵的依據。於本發明中,分群演算法可一吸引子傳播(Affinity Propagation;AP)分群演算法、一譜(Spectral)分群演算法、一模糊平均數(Fuzzy C-means;FCM)分群演算法、一反覆自我組織分析技術(Iterative Self-Organizing Data Analysis Technique Algorithm;ISODATA)分群演算法、一K平均值(K-means)分群演算法、一完整連結(Complete-linkage;CL)分群演算法、一單一連結(Single-Linkage;SL)分群演算法及一華德法(Ward’s method)分群演算法其中之一,但不限於此。 In word vector distribution space WVD, word vectors that are closer to each other have similar Part of speech or semantics. Therefore, the present invention is based on a group algorithm based on unsupervised learning to group these word vectors as a basis for subsequently extracting the characteristics of each program operation sequence data POSD. In the present invention, the grouping algorithm can be an attractor propagation (AP) grouping algorithm, a spectral grouping algorithm, a fuzzy C-means (FCM) grouping algorithm, and iterative Iterative Self-Organizing Data Analysis Technique Algorithm (ISODATA) grouping algorithm, a K-means grouping algorithm, a Complete-linkage (CL) grouping algorithm, a single link (Single-Linkage (SL)) One of the grouping algorithm and a Ward's method grouping algorithm, but it is not limited to this.

舉例而言,處理器13基於AP分群演算法,將該等詞向量分群為四個詞向量群組G1-G4,如第4圖所示。詞向量群組G1包含詞向量V1-V4,詞向量群組G2包含詞向量V5-V6,詞向量群組G3包含詞向量V7,詞向量群組G4包含詞向量V8-V11。須說明者,詞向量群組之數量可由開發者自行設定分群演算法之參數決定(例如:直接設定所需群組之數量,或設定分群演算法執行的迭代次數)。由於所屬技術領域中具有通常知識者可瞭解如何基於分群演算法進行分群的詳細操作,故在此不再加以贅述。 For example, the processor 13 groups the word vectors into four word vector groups G1-G4 based on the AP clustering algorithm, as shown in FIG. 4. The word vector group G1 contains word vectors V1-V4, the word vector group G2 contains word vectors V5-V6, the word vector group G3 contains word vectors V7, and the word vector group G4 contains word vectors V8-V11. It should be noted that the number of word vector groups can be determined by the developer by setting parameters of the grouping algorithm (for example, directly setting the number of required groups or setting the number of iterations of the grouping algorithm). Since a person with ordinary knowledge in the technical field can understand how to perform a detailed operation of clustering based on a clustering algorithm, it will not be repeated here.

於獲得該等詞向量群組後,處理器13將各程式操作序列資料POSD之該等程式操作序列分別與各詞向量群組所包含之至少一該等詞向量所對應之至少一該等程式操作序列進行一比對,以產生各程式操作序列資料POSD之一特徵向量。舉例而言,假設一程式操作序列資料POSD中存在對應至詞向量V2、詞向量V6、詞向量V8及詞向量V11之程式操作序列,則表示此程式操作序列資料POSD對應詞向量群組G1之特徵值為1,對應詞 向量群組G2之特徵值為1,對應詞向量群組G3之特徵值為0,以及對應詞向量群組G4之特徵值為2,故此程式操作序列資料POSD之特徵向量為(1,1,0,2)。再舉例而言,假設另一程式操作序列資料POSD中存在對應至詞向量V1、詞向量V2、詞向量V4、詞向量V5、詞向量V7、詞向量V9、詞向量V10之程式操作序列,則表示此另一程式操作序列資料POSD對應詞向量群組G1之特徵值為3,對應詞向量群組G2之特徵值為1,對應詞向量群組G3之特徵值為1,以及對應詞向量群組G4之特徵值為2,故此另一程式操作序列資料POSD之特徵向量為(3,1,1,2)。 After obtaining the word vector groups, the processor 13 compares the program operation sequences of each program operation sequence data POSD with at least one of the programs corresponding to at least one of the word vectors included in each word vector group. The operation sequences are compared to generate a feature vector of each program operation sequence data POSD. For example, if there is a program operation sequence corresponding to word vector V2, word vector V6, word vector V8, and word vector V11 in a program operation sequence data POSD, it means that this program operation sequence data POSD corresponds to the word vector group G1. Characteristic value is 1, corresponding word The eigenvalue of the vector group G2 is 1, the eigenvalue of the corresponding word vector group G3 is 0, and the eigenvalue of the corresponding word vector group G4 is 2, so the feature vector of the sequence operation data POSD is (1,1, 0,2). For another example, suppose that there is a program operation sequence corresponding to word vector V1, word vector V2, word vector V4, word vector V5, word vector V7, word vector V9, and word vector V10 in another program operation sequence data POSD, then Represents this other program operation sequence data. The POSD corresponding word vector group G1 has a feature value of 3, the corresponding word vector group G2 has a feature value of 1, the corresponding word vector group G3 has a feature value of 1, and the corresponding word vector group. The feature value of group G4 is 2, so the feature vector of another program operation sequence data POSD is (3,1,1,2).

須說明者,前述產生特徵向量所進行之比對係基於程式操作序列資料POSD中是否存在各詞向量群組所包含之至少一該等詞向量所對應之至少一該等程式操作序列來實現;然而,於其他實施例中,產生特徵向量所進行之比對亦可基於程式操作序列資料POSD中存在各詞向量群組所包含之至少一該等詞向量所對應之至少一該等程式操作序列的數量來實現。舉例而言,假設一程式操作序列資料POSD中存在5個對應至詞向量V2之程式操作序列、3個對應至詞向量V6之程式操作序列、1個對應至詞向量V8之程式操作序列及3個對應至詞向量V11之程式操作序列,則表示此程式操作序列資料POSD對應詞向量群組G1之特徵值為5,對應詞向量群組G2之特徵值為3,對應詞向量群組G3之特徵值為0,以及對應詞向量群組G4之特徵值為4,故此程式操作序列資料POSD之特徵向量為(5,3,0,4)。 It should be noted that the aforementioned comparison of generating feature vectors is based on whether or not there is at least one such program operation sequence corresponding to at least one of the word vectors included in each word vector group in the program operation sequence data POSD; However, in other embodiments, the comparison performed by generating the feature vectors may also be based on the program operation sequence data POSD. At least one of the word vector sequences corresponding to at least one of the word vectors contained in each word vector group exists. To achieve. For example, suppose that there are 5 program operation sequences corresponding to the word vector V2, 3 program operation sequences corresponding to the word vector V6, 1 program operation sequence corresponding to the word vector V8, and 3 in the program operation sequence data POSD. A program operation sequence corresponding to the word vector V11 indicates that the program operation sequence data POSD corresponds to a feature value of the word vector group G1 of 5, a feature value of the corresponding word vector group G2 of 3, and a corresponding word vector group G3. The eigenvalue is 0 and the eigenvalue of the corresponding word vector group G4 is 4, so the feature vector of the sequence operation data POSD is (5,3,0,4).

於產生各程式操作序列資料POSD之特徵向量後,基於該等特徵向量及該等行為標籤AL,處理器13進行一分類演算法之一監督式學習,以生成一分類器。舉例而言,分類演算法可為一支援向量機(support vector machine;SVM)演算法、一決策樹(Decision Tree;DT)演算法、一貝氏(Bayes)演算法及一鄰近(Nearest Neighbors;NN)演算法其中之一,但不限於此。前述之監督式學習係為了使該等特徵向量經由分類演算法運算後可確實地被分類至適當的類別,以對應至該等行為標籤AL,例如:對應至惡意廣告程式標籤之該等程式操作序列資料POSD可確實地被歸類至同一類別,對應至蠕蟲程式標籤之該等程式操作序列資料POSD可確實地被歸類至同一類別,對應至木馬程式標籤之該等程式操作序列資料POSD可確實地被歸類至同一類別,以及對應至正常行為標籤之該等程式操作序列資料POSD可確實地被歸類至同一類別。最後,處理器13基於該等詞向量群組及該分類器,生成一異常行為偵測模型。 After generating the feature vectors of the program operation sequence data POSD, based on the feature vectors and the behavior labels AL, the processor 13 performs a supervised learning of a classification algorithm to generate a classifier. For example, the classification algorithm can be a support vector machine (support vector machine) machine (SVM) algorithm, a decision tree (DT) algorithm, a Bayes algorithm, and a Nearest Neighbors (NN) algorithm, but not limited thereto. The aforementioned supervised learning is to ensure that the feature vectors can be reliably classified into the appropriate categories after classification algorithm operations, so as to correspond to the behavior labels AL, such as the program operations corresponding to the malicious advertising program labels. The sequence data POSD can be definitely classified into the same category, and the program operation sequence data POSD corresponding to the worm program label can be surely classified into the same category, the program operation sequence data POSD corresponding to the Trojan program label. The program operation sequence data POSD which can be surely classified into the same category, and the program operation sequence data POSD corresponding to the normal behavior label can be definitely classified into the same category. Finally, the processor 13 generates an abnormal behavior detection model based on the word vector groups and the classifier.

於其他實施例中,處理器13於生成異常行為偵測模型後,可利用複數測試程式操作序列資料對異常行為偵測模型進行測試,並根據一偵測率(Detection Rate),判斷異常行為偵測模型辨識該等測試程式操作序列資料之準確度,以供開發者基於準確度,調整前述詞嵌入模型、分群演算法及分類演算法之相關參數設定,重新進行前述訓練生成異常行為偵測模型之操作。據此,本發明經由前述操作可針對不同類型的程式操作序列資料,生成不同的異常行為偵測模型,以達到偵測各種動態程式操作序列或靜態程式操作序列的異常行為。 In other embodiments, after generating the abnormal behavior detection model, the processor 13 may test the abnormal behavior detection model by using a plurality of test program operation sequence data, and judge the abnormal behavior detection based on a detection rate. The test model identifies the accuracy of the operating sequence data of these test programs for developers to adjust the relevant parameter settings of the aforementioned word embedding model, group algorithm and classification algorithm based on the accuracy, and re-train the aforementioned training to generate an abnormal behavior detection model. Operation. Accordingly, the present invention can generate different abnormal behavior detection models for different types of program operation sequence data through the foregoing operations, so as to detect abnormal behaviors of various dynamic program operation sequences or static program operation sequences.

再者,本發明所生成之異常行為偵測模型可被編譯成一可執行程式,運作於一作業系統中,以提供該作業系統偵測異常行為(例如:偵測惡意程式、偵測非法操作等)。此外,本發明用於生成異常行為偵測模型所使用的程式操作序列資料POSD亦可全部為異常程式操作序列資料(例 如,全部程式操作序列資料皆與惡意程式相關聯),以使得生成異常行為偵測模型單純地針對已被辨識為異常程式操作序列資料進行類別判別。換言之,本發明之所生成異常行為偵測模型可與其他異常行為偵測程式搭配使用,在其他異常行為偵測程式偵測到異常程式時,進一步地針對此異常程式之程式操作序列資料進行類別判別。舉例而言,其他異常行為偵測程式可為一防毒程式,當防毒程式偵測到一異常程式時,本發明之異常行為偵測模型可進一步協助判斷此異常程式的類別。 Furthermore, the abnormal behavior detection model generated by the present invention can be compiled into an executable program and run in an operating system to provide the operating system to detect abnormal behavior (e.g., detecting malicious programs, detecting illegal operations, etc.) ). In addition, the program operation sequence data POSD used to generate the abnormal behavior detection model of the present invention can also be all abnormal program operation sequence data (for example, For example, all program operation sequence data is associated with a malicious program), so that the abnormal behavior detection model is generated to perform class discrimination solely on the program operation sequence data that has been identified as abnormal. In other words, the abnormal behavior detection model generated by the present invention can be used in combination with other abnormal behavior detection programs, and when other abnormal behavior detection programs detect abnormal programs, further classify the program operation sequence data of the abnormal programs. Judge. For example, the other abnormal behavior detection program can be an anti-virus program. When the anti-virus program detects an abnormal program, the abnormal behavior detection model of the present invention can further help determine the type of the abnormal program.

本發明第二實施例請參考第5圖,其係本發明之異常行為偵測模型生成方法之流程圖。異常行為偵測模型生成方法適用於一異常行為偵測模型生成裝置(例如:前述實施例之異常行為偵測模型生成裝置1)。異常行為偵測模型生成裝置包含一儲存器及一處理器。儲存器儲存複數程式操作序列資料及複數行為標籤。各程式操作序列資料記載複數程式操作序列。各程式操作序列資料對應至該等行為標籤其中之一。異常行為偵測模型生成方法由處理器執行。 Please refer to FIG. 5 for a second embodiment of the present invention, which is a flowchart of a method for generating an abnormal behavior detection model of the present invention. The abnormal behavior detection model generation method is applicable to an abnormal behavior detection model generation device (for example, the abnormal behavior detection model generation device 1 of the foregoing embodiment). The abnormal behavior detection model generating device includes a memory and a processor. The memory stores plural program operation sequence data and plural behavior labels. Each program operation sequence data records a plurality of program operation sequences. Each program operation sequence data corresponds to one of these behavior tags. The abnormal behavior detection model generation method is executed by the processor.

首先,於步驟S501中,透過一詞嵌入(word embedding)模型,運算該等程式操作序列資料之該等程式操作序列,以產生複數詞向量(例如:第3圖所示之詞向量V1-V11)。如先前所述,各詞向量對應至該等程式操作序列其中之一。接著,於步驟S503中,基於一分群演算法,將該等詞向量分群為複數詞向量群組(例如:第4圖所示之詞向量群組G1-G4)。 First, in step S501, the program operation sequences of the program operation sequence data are calculated through a word embedding model to generate a plural word vector (for example, the word vectors V1-V11 shown in FIG. 3). ). As mentioned earlier, each word vector corresponds to one of the program operation sequences. Next, in step S503, the word vectors are grouped into plural word vector groups (for example, the word vector groups G1-G4 shown in FIG. 4) based on a group algorithm.

於步驟S505中,將各程式操作序列資料之該等程式操作序列分別與各詞向量群組所包含之至少一該等詞向量所對應之至少一該等程式操作序列進行一比對,以產生各程式操作序列資料之一特徵向量。之後, 於步驟S507中,基於該等特徵向量及該等行為標籤,進行一分類演算法之一監督式學習,以生成一分類器。分類器係用以將該等特徵向量分類以對應至該等行為標籤。最後,於步驟S509中,基於該等詞向量群組及分類器,生成一異常行為偵測模型。 In step S505, the program operation sequences of each program operation sequence data are respectively compared with at least one of the program operation sequences corresponding to at least one of the word vectors included in each word vector group to generate A feature vector of sequence operation data for each program. after that, In step S507, based on the feature vectors and the behavior labels, a supervised learning of a classification algorithm is performed to generate a classifier. A classifier is used to classify the feature vectors to correspond to the behavior labels. Finally, in step S509, an abnormal behavior detection model is generated based on the word vector groups and the classifier.

於其他實施例中,該等程式操作序列係一動態程式操作序列及一靜態程式操作序列其中之一。動態程式操作序列為一應用程式介面(Application Programming Interface;API)序列或一系統呼叫(System Call)序列。靜態程式操作序列為一運算碼(Operation Code;Opcode)序列。於一實施例中,動態程式操作序列係透過一追蹤程式擷取。於其他實施例中,詞嵌入模型係一詞至向量(Word2Vec)模型及一獨熱編碼(One-Hot Encoding)模型其中之一。 In other embodiments, the program operation sequences are one of a dynamic program operation sequence and a static program operation sequence. The dynamic program operation sequence is an application programming interface (API) sequence or a system call sequence. The static program operation sequence is an operation code (Opcode) sequence. In one embodiment, the dynamic program operation sequence is retrieved through a tracking program. In other embodiments, the word embedding model is one of a Word2Vec model and a One-Hot Encoding model.

於其他實施例中,分群演算法係一吸引子傳播(Affinity Propagation;AP)分群演算法、一譜(Spectral)分群演算法、一模糊平均數(Fuzzy C-means;FCM)分群演算法、一反覆自我組織分析技術(Iterative Self-Organizing Data Analysis Technique Algorithm;ISODATA)分群演算法、一K平均值(K-means)分群演算法、一完整連結(Complete-linkage;CL)分群演算法、一單一連結(Single-Linkage;SL)分群演算法及一華德法(Ward’s method)分群演算法其中之一。 In other embodiments, the clustering algorithm is an Affinity Propagation (AP) clustering algorithm, a Spectral clustering algorithm, a Fuzzy C-means (FCM) clustering algorithm, a Iterative Self-Organizing Data Analysis Technique Algorithm (ISODATA) grouping algorithm, a K-means grouping algorithm, a complete-linkage (CL) grouping algorithm, a single Link (Single-Linkage; SL) grouping algorithm and a Ward's method grouping algorithm.

此外,於其他實施例中,分類演算法係一支援向量機(support vector machine;SVM)演算法、一決策樹(Decision Tree;DT)演算法、一貝氏(Bayes)演算法及一鄰近(Nearest Neighbors;NN)演算法其中之一。 In addition, in other embodiments, the classification algorithm is a support vector machine (SVM) algorithm, a decision tree (DT) algorithm, a Bayes algorithm, and a proximity ( Nearest Neighbors (NN) algorithm.

於一實施例中,該等程式操作序列資料中包含複數異常程式 操作序列資料,以及各該異常程式操作序列資料與一惡意程式相關聯。除了上述步驟,本實施例之異常行為偵測模型生成方法亦能執行在前述實施例中所闡述之所有操作並具有所有對應之功能。所屬技術領域具有通常知識者可直接瞭解此實施例如何基於前述實施例執行此等操作及具有該等功能,故不贅述。 In one embodiment, the program operation sequence data includes a plurality of abnormal programs. The operation sequence data and each abnormal program operation sequence data are associated with a malicious program. In addition to the above steps, the method for generating an abnormal behavior detection model in this embodiment can also perform all operations described in the foregoing embodiments and have all corresponding functions. Those with ordinary knowledge in the technical field can directly understand how this embodiment performs these operations and has these functions based on the foregoing embodiments, so it will not be repeated here.

此外,前述本發明之異常行為偵測模型生成方法可藉由一電腦程式產品實現。電腦程式產品,儲存有包含複數個程式指令之一電腦程式,在所述電腦程式被載入並安裝於一電子計算裝置(例如:異常行為偵測模型生成裝置1)之後,電子計算裝置之處理器執行所述電腦程式所包含之該等程式指令,以執行本發明之異常行為偵測模型生成方法。電腦程式產品可為,例如:一唯讀記憶體(read only memory;ROM)、一快閃記憶體、一軟碟、一硬碟、一光碟(compact disk;CD)、一隨身碟、一磁帶、一可由網路存取之資料庫或本發明所屬技術領域中具有通常知識者所知且具有相同功能之任何其他儲存媒體。 In addition, the aforementioned abnormal behavior detection model generation method of the present invention can be implemented by a computer program product. The computer program product stores a computer program including one of a plurality of program instructions. After the computer program is loaded and installed in an electronic computing device (for example, an abnormal behavior detection model generating device 1), the processing of the electronic computing device is performed. The computer executes the program instructions included in the computer program to execute the abnormal behavior detection model generation method of the present invention. The computer program product may be, for example: a read only memory (ROM), a flash memory, a floppy disk, a hard disk, a compact disk (CD), a portable disk, a magnetic tape 1. A database accessible by a network or any other storage medium known to those skilled in the art to which the present invention pertains and having the same function.

綜上所述,本發明係透過對複數程式操作序列資料中複數程式操作序列進行詞嵌入運算,以產生複數詞向量,並將該等詞向量分群。於分群後得到各程式操作序列資料之特徵向量,並根據該等特徵向量,對分類演算法進行訓練,來生成異常行為偵測模型。據此,本發明之異常行為偵測模型可基於程式操作序列的詞性分群結果,來獲得程式操作序列資料的特徵向量,故可有效地偵測抵禦特徵混淆的惡意程式或非正常的程式操作行為,且無需依賴預先決定之專家規則或靜態特徵,亦不受沙箱(Sandbox)環境設定的不同而有所影響。 In summary, the present invention is to perform a word embedding operation on a complex program operation sequence in the complex program operation sequence data to generate a plural word vector and group the word vectors. After clustering, the feature vectors of the operation sequence data of each program are obtained, and the classification algorithm is trained according to the feature vectors to generate an abnormal behavior detection model. According to this, the abnormal behavior detection model of the present invention can obtain the feature vector of the program operation sequence data based on the part-of-speech clustering result of the program operation sequence, so it can effectively detect malicious programs or abnormal program operation behaviors that are resistant to feature confusion Without relying on pre-determined expert rules or static features, and without being affected by different Sandbox environment settings.

上述之實施例僅用來例舉本發明之實施態樣,以及闡釋本發明之技術特徵,並非用來限制本發明之保護範疇。任何熟悉此技術者可輕易完成之改變或均等性之安排均屬於本發明所主張之範圍,本發明之權利保護範圍應以申請專利範圍為準。 The above embodiments are only used to exemplify the implementation aspects of the present invention, and to explain the technical features of the present invention, and are not intended to limit the protection scope of the present invention. Any change or equivalence arrangement that can be easily accomplished by those skilled in the art belongs to the scope claimed by the present invention, and the scope of protection of the rights of the present invention shall be subject to the scope of patent application.

Claims (20)

一種異常行為偵測模型生成裝置,包含:一儲存器,用以儲存複數程式操作序列資料及複數行為標籤,各該程式操作序列資料記載複數程式操作序列,各該程式操作序列資料對應至該等行為標籤其中之一;以及一處理器,電性連接至該儲存器,並用以執行下列操作:透過一詞嵌入(word embedding)模型運算該等程式操作序列資料之該等程式操作序列,以產生複數詞向量,各該詞向量對應至該等程式操作序列其中之一;基於一分群演算法,將該等詞向量分群為複數詞向量群組;將各該程式操作序列資料之該等程式操作序列分別與各該詞向量群組所包含之至少一該等詞向量所對應之至少一該等程式操作序列進行一比對,以產生各該程式操作序列資料之一特徵向量;基於該等特徵向量及該等行為標籤,進行一分類演算法之一監督式學習,以生成一分類器,該分類器係用以將該等特徵向量分類以對應至該等行為標籤;以及基於該等詞向量群組及該分類器,生成一異常行為偵測模型。An abnormal behavior detection model generating device includes: a memory for storing plural program operation sequence data and plural behavior labels, each of the program operation sequence data records a plural program operation sequence, and each of the program operation sequence data corresponds to these One of the behavior tags; and a processor, electrically connected to the memory, and configured to perform the following operations: the program operation sequence of the program operation sequence data is calculated by a word embedding model to generate Plural word vectors, each of which corresponds to one of the program operation sequences; based on a grouping algorithm, the word vectors are grouped into plural word vector groups; the program operations of each program operation sequence data The sequences are respectively compared with at least one of the program operation sequences corresponding to at least one of the word vectors included in each of the word vector groups to generate a feature vector of each of the program operation sequence data; based on the features Vector and these behavior labels, supervised learning of a classification algorithm is performed to generate a classifier. The classifier is used to classify the feature vectors to correspond to the behavior labels; and to generate an abnormal behavior detection model based on the word vector groups and the classifier. 如請求項1所述之異常行為偵測模型生成裝置,其中該等程式操作序列係一動態程式操作序列及一靜態程式操作序列其中之一。The abnormal behavior detection model generating device according to claim 1, wherein the program operation sequence is one of a dynamic program operation sequence and a static program operation sequence. 如請求項2所述之異常行為偵測模型生成裝置,其中該動態程式操作序列為一應用程式介面(Application Programming Interface;API)序列。The abnormal behavior detection model generating device according to claim 2, wherein the dynamic program operation sequence is an Application Programming Interface (API) sequence. 如請求項2所述之異常行為偵測模型生成裝置,其中該動態程式操作序列為一系統呼叫(System Call)序列。The abnormal behavior detection model generating device according to claim 2, wherein the dynamic program operation sequence is a system call sequence. 如請求項2所述之異常行為偵測模型生成裝置,其中該靜態程式操作序列為一運算碼(Operation Code;Opcode)序列。The abnormal behavior detection model generating device according to claim 2, wherein the static program operation sequence is an operation code (Opcode) sequence. 如請求項2所述之異常行為偵測模型生成裝置,其中該動態程式操作序列係透過一追蹤程式擷取。The abnormal behavior detection model generating device according to claim 2, wherein the dynamic program operation sequence is acquired through a tracking program. 如請求項1所述之異常行為偵測模型生成裝置,其中該詞嵌入模型係一詞至向量(Word2Vec)模型及一獨熱編碼(One-Hot Encoding)模型其中之一。The abnormal behavior detection model generating device according to claim 1, wherein the word embedding model is one of a word to vector (Word2Vec) model and a one-hot encoding model. 如請求項1所述之異常行為偵測模型生成裝置,其中該分群演算法係一吸引子傳播(Affinity Propagation;AP)分群演算法、一譜(Spectral)分群演算法、一模糊平均數(Fuzzy C-means;FCM)分群演算法、一反覆自我組織分析技術演算法(Iterative Self-Organizing Data Analysis Technique Algorithm;ISODATA)分群演算法、一K平均值(K-means)分群演算法、一完整連結(Complete-linkage;CL)分群演算法、一單一連結(Single-Linkage;SL)分群演算法及一華德法(Ward’s method)分群演算法其中之一。The abnormal behavior detection model generating device according to claim 1, wherein the clustering algorithm is an Affinity Propagation (AP) clustering algorithm, a Spectral clustering algorithm, and a fuzzy average C-means (FCM) clustering algorithm, an Iterative Self-Organizing Data Analysis Technique Algorithm (ISODATA) clustering algorithm, a K-means clustering algorithm, a complete link (Complete-linkage (CL) grouping algorithm, a single-linkage (SL) grouping algorithm and a Ward's method) grouping algorithm. 如請求項1所述之異常行為偵測模型生成裝置,其中該分類演算法係一支援向量機(support vector machine;SVM)演算法、一決策樹(Decision Tree;DT)演算法、一貝氏(Bayes)演算法及一鄰近(Nearest Neighbors;NN)演算法其中之一。The abnormal behavior detection model generating device according to claim 1, wherein the classification algorithm is a support vector machine (SVM) algorithm, a decision tree (DT) algorithm, and a Bayesian algorithm (Bayes) algorithm and a Nearest Neighbors (NN) algorithm. 如請求項1所述之異常行為偵測模型生成裝置,其中該等程式操作序列資料中包含複數異常程式操作序列資料,以及各該異常程式操作序列資料與一惡意程式相關聯。The abnormal behavior detection model generating device according to claim 1, wherein the program operation sequence data includes a plurality of abnormal program operation sequence data, and each of the abnormal program operation sequence data is associated with a malicious program. 一種用於一異常行為偵測模型生成裝置之異常行為偵測模型生成方法,該異常行為偵測模型生成裝置包含一儲存器及一處理器,該儲存器儲存複數程式操作序列資料及複數行為標籤,各該程式操作序列資料記載複數程式操作序列,各該程式操作序列資料對應至該等行為標籤其中之一,該異常行為偵測模型生成方法由該處理器執行且包含下列步驟:透過一詞嵌入(word embedding)模型,運算該等程式操作序列資料之該等程式操作序列,以產生複數詞向量,各該詞向量對應至該等程式操作序列其中之一;基於一分群演算法,將該等詞向量分群為複數詞向量群組;將各該程式操作序列資料之該等程式操作序列分別與各該詞向量群組所包含之至少一該等詞向量所對應之至少一該等程式操作序列進行一比對,以產生各該程式操作序列資料之一特徵向量;基於該等特徵向量及該等行為標籤,進行一分類演算法之一監督式學習,以生成一分類器,該分類器係用以將該等特徵向量分類以對應至該等行為標籤;以及基於該等詞向量群組及該分類器,生成一異常行為偵測模型。An abnormal behavior detection model generation method for an abnormal behavior detection model generation device. The abnormal behavior detection model generation device includes a memory and a processor. The memory stores plural program operation sequence data and plural behavior labels. Each of the program operation sequence data records a plurality of program operation sequences, each of the program operation sequence data corresponds to one of the behavior labels, and the abnormal behavior detection model generation method is executed by the processor and includes the following steps: through the word The word embedding model calculates the program operation sequences of the program operation sequence data to generate plural word vectors, each of which corresponds to one of the program operation sequences; based on a group algorithm, the The equal word vectors are grouped into plural word vector groups; the program operation sequences of each of the program operation sequence data are respectively associated with at least one of the program operations corresponding to at least one of the word vectors included in each of the word vector groups. A sequence comparison is performed to generate a feature vector of each program operation sequence data; based on the characteristics The feature vectors and the behavior labels, supervised learning of a classification algorithm to generate a classifier, the classifier is used to classify the feature vectors to correspond to the behavior labels; and based on the words The vector group and the classifier generate an abnormal behavior detection model. 如請求項11所述之異常行為偵測模型生成方法,其中該等程式操作序列係一動態程式操作序列及一靜態程式操作序列其中之一。The method for generating an abnormal behavior detection model according to claim 11, wherein the program operation sequence is one of a dynamic program operation sequence and a static program operation sequence. 如請求項12所述之異常行為偵測模型生成方法,其中該動態程式操作序列為一應用程式介面(Application Programming Interface;API)序列。The abnormal behavior detection model generating method according to claim 12, wherein the dynamic program operation sequence is an Application Programming Interface (API) sequence. 如請求項12所述之異常行為偵測模型生成方法,其中該動態程式操作序列為一系統呼叫(System Call)序列。The method for generating an abnormal behavior detection model according to claim 12, wherein the dynamic program operation sequence is a system call sequence. 如請求項12所述之異常行為偵測模型生成方法,其中該靜態程式操作序列為一運算碼(Operation Code;Opcode)序列。The method for generating an abnormal behavior detection model according to claim 12, wherein the static program operation sequence is an Operation Code (Opcode) sequence. 如請求項12所述之異常行為偵測模型生成方法,其中該動態程式操作序列係透過一追蹤程式擷取。The method for generating an abnormal behavior detection model according to claim 12, wherein the dynamic program operation sequence is acquired through a tracking program. 如請求項11所述之異常行為偵測模型生成方法,其中該詞嵌入模型係一詞至向量(Word2Vec)模型及一獨熱編碼(One-Hot Encoding)模型其中之一。The abnormal behavior detection model generating method according to claim 11, wherein the word embedding model is one of a word-to-vector (Word2Vec) model and a one-hot encoding model. 如請求項11所述之異常行為偵測模型生成方法,其中該分群演算法係一吸引子傳播(Affinity Propagation;AP)分群演算法、一譜(Spectral)分群演算法、一模糊平均數(Fuzzy C-means;FCM)分群演算法、一反覆自我組織分析技術演算法(Iterative Self-Organizing Data Analysis Technique Algorithm;ISODATA)分群演算法、一K平均值(K-means)分群演算法、一完整連結(Complete-linkage;CL)分群演算法、一單一連結(Single-Linkage;SL)分群演算法及一華德法(Ward’s method)分群演算法其中之一。The method for generating an abnormal behavior detection model according to claim 11, wherein the clustering algorithm is an Affinity Propagation (AP) clustering algorithm, a Spectral clustering algorithm, and a fuzzy average C-means (FCM) clustering algorithm, an Iterative Self-Organizing Data Analysis Technique Algorithm (ISODATA) clustering algorithm, a K-means clustering algorithm, a complete link (Complete-linkage (CL) grouping algorithm, a single-linkage (SL) grouping algorithm and a Ward's method) grouping algorithm. 如請求項11所述之異常行為偵測模型生成方法,其中該分類演算法係一支援向量機(support vector machine;SVM)演算法、一決策樹(Decision Tree;DT)演算法、一貝氏(Bayes)演算法及一鄰近(Nearest Neighbors;NN)演算法其中之一。The method for generating an abnormal behavior detection model according to claim 11, wherein the classification algorithm is a support vector machine (SVM) algorithm, a decision tree (DT) algorithm, and a Bayesian algorithm (Bayes) algorithm and a Nearest Neighbors (NN) algorithm. 如請求項11所述之異常行為偵測模型生成方法,其中該等程式操作序列資料中包含複數異常程式操作序列資料,以及各該異常程式操作序列資料與一惡意程式相關聯。The method for generating an abnormal behavior detection model according to claim 11, wherein the program operation sequence data includes a plurality of abnormal program operation sequence data, and each of the abnormal program operation sequence data is associated with a malicious program.
TW106143548A 2017-12-12 2017-12-12 Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof TWI658372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW106143548A TWI658372B (en) 2017-12-12 2017-12-12 Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW106143548A TWI658372B (en) 2017-12-12 2017-12-12 Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof

Publications (2)

Publication Number Publication Date
TWI658372B true TWI658372B (en) 2019-05-01
TW201928744A TW201928744A (en) 2019-07-16

Family

ID=67348084

Family Applications (1)

Application Number Title Priority Date Filing Date
TW106143548A TWI658372B (en) 2017-12-12 2017-12-12 Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof

Country Status (1)

Country Link
TW (1) TWI658372B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI747452B (en) * 2020-08-20 2021-11-21 慧景科技股份有限公司 System, method and storage medium for intelligent monitoring of case field anomaly detection using artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201227385A (en) * 2010-12-16 2012-07-01 Univ Nat Taiwan Science Tech Method of detecting malicious script and system thereof
TW201629824A (en) * 2015-01-30 2016-08-16 瑟古爾歐尼克斯股份有限公司 Anomaly detection using adaptive behavioral profiles
TW201710960A (en) * 2015-08-14 2017-03-16 高通公司 Using normalized confidence values for classifying mobile device behaviors
CN106709345A (en) * 2015-11-17 2017-05-24 武汉安天信息技术有限责任公司 Deep learning method-based method and system for deducing malicious code rules and equipment
US20170302516A1 (en) * 2016-04-19 2017-10-19 Nec Laboratories America, Inc. Entity embedding-based anomaly detection for heterogeneous categorical events

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201227385A (en) * 2010-12-16 2012-07-01 Univ Nat Taiwan Science Tech Method of detecting malicious script and system thereof
TW201629824A (en) * 2015-01-30 2016-08-16 瑟古爾歐尼克斯股份有限公司 Anomaly detection using adaptive behavioral profiles
TW201710960A (en) * 2015-08-14 2017-03-16 高通公司 Using normalized confidence values for classifying mobile device behaviors
CN106709345A (en) * 2015-11-17 2017-05-24 武汉安天信息技术有限责任公司 Deep learning method-based method and system for deducing malicious code rules and equipment
US20170302516A1 (en) * 2016-04-19 2017-10-19 Nec Laboratories America, Inc. Entity embedding-based anomaly detection for heterogeneous categorical events

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI747452B (en) * 2020-08-20 2021-11-21 慧景科技股份有限公司 System, method and storage medium for intelligent monitoring of case field anomaly detection using artificial intelligence

Also Published As

Publication number Publication date
TW201928744A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
US10586044B2 (en) Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof
EP3499396A1 (en) Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof
Hu et al. Anomaly detection using local kernel density estimation and context-based regression
Chen et al. Automatic ransomware detection and analysis based on dynamic API calls flow graph
Kim et al. A multimodal deep learning method for android malware detection using various features
Hashemi et al. Visual malware detection using local malicious pattern
Baldwin et al. Leveraging support vector machine for opcode density based detection of crypto-ransomware
Lu Malware detection with lstm using opcode language
Millar et al. DANdroid: A multi-view discriminative adversarial network for obfuscated Android malware detection
Altaher An improved Android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features
Althubiti et al. Applying long short-term memory recurrent neural network for intrusion detection
US8997256B1 (en) Systems and methods for detecting copied computer code using fingerprints
Palahan et al. Extraction of statistically significant malware behaviors
CN109918906B (en) Abnormal behavior detection model generation device and abnormal behavior detection model generation method thereof
KR101930293B1 (en) Apparatus and Method for Identifying Variety Malicious Code Using Static Analysis and Dynamic Analysis
KR101963756B1 (en) Apparatus and method for learning software vulnerability prediction model, apparatus and method for analyzing software vulnerability
Abdulla et al. Intelligent approach for android malware detection
Barros et al. Malware‐SMELL: A zero‐shot learning strategy for detecting zero‐day vulnerabilities
Altaher et al. Android malware classification based on ANFIS with fuzzy c-means clustering using significant application permissions
KR20220032730A (en) On identifying the author group of malwares via graph embedding and human-in-loop approaches
Ito et al. Detecting unknown malware from ASCII strings with natural language processing techniques
TWI658372B (en) Abnormal behavior detection model building apparatus and abnormal behavior detection model building method thereof
US9865158B2 (en) Method for detecting false alarm
EP3499429A1 (en) Behavior inference model building apparatus and method
Atacak et al. Android malware detection using hybrid ANFIS architecture with low computational cost convolutional layers