CN109918905B

CN109918905B - Behavior inference model generation device and behavior inference model generation method thereof

Info

Publication number: CN109918905B
Application number: CN201711320002.9A
Authority: CN
Inventors: 赖家民; 卢嘉昱
Original assignee: Institute for Information Industry
Current assignee: Institute for Information Industry
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2022-05-10
Anticipated expiration: 2037-12-12
Also published as: CN109918905A

Abstract

A behavior inference model generation apparatus and a behavior inference model generation method thereof are provided. The behavior inference model generation device converts a plurality of program operation sequences of a plurality of program operation sequence data into a plurality of word vectors by using the word embedding model, and inputs the first M word vectors in the word vectors corresponding to the program operation sequence data into the generative confrontation network model so as to train and optimize the generative confrontation network model. The behavior inference model generating device integrates the word embedding model and the generator of the optimized generative confrontation network model to generate a behavior inference model.

Description

Behavior inference model generation device and behavior inference model generation method thereof

[ technical field ] A method for producing a semiconductor device

The invention relates to a behavior inference model generation device and a behavior inference model generation method thereof. Specifically, the behavior inference model generation device of the present invention generates a behavior inference model based on a word embedding model and a generator of an optimized generative confrontation network model.

[ background of the invention ]

With the development of technology, applications available to users from networks are becoming more and more diversified, and some applications may damage the computer system of the user when executed, causing files in the computer to be damaged or personal information of the user to be stolen.

The detection mechanism of the malicious program at present mainly utilizes the rule-type feature comparison to judge whether the application program is the malicious program or not and defend the attacks of the malicious programs. However, the detection mechanism of the regular feature comparison only detects based on the known sample features, and a certain number of features need to be captured during the execution of the application program to have an opportunity to determine whether the currently executed application program is a malicious program. In this case, when a malicious program is detected, the malicious program may have caused a file in the computer to be destroyed or personal information of the user to be stolen.

In view of the above, it is an urgent need in the art to establish a behavior inference model that can accurately infer subsequent program operations at the initial stage of executing an application program so as to prevent files in a computer from being damaged or personal information of a user from being stolen.

[ summary of the invention ]

The present invention provides a behavior inference model (behavior inference model) that accurately infers subsequent program operations at the initial stage of execution of an application program, so as to prevent files in a computer from being damaged or personal information of a user from being stolen.

To achieve the above objective, the present invention discloses a behavior inference model generation apparatus, which includes a memory and a processor. The memory is used for storing a plurality of program operation sequence data. Each of the program operation sequence data describes a plurality of program operation sequences. The processor is electrically connected to the memory and is used for executing the following steps: (a) converting the program operation sequences of the program operation sequence data into word vectors through a word embedding (word embedding) model; (b) capturing first M word vectors of the word vectors as M input vectors of a Generative Adaptive Network (GAN) model for each program operation sequence data, wherein M is a positive integer; (c) for each of the program operation sequence data, operating the M input vectors via a generator (generator) of the GAN model to generate a plurality of inference word vectors; (d) for each program operation sequence data, performing an authenticity judgment on the word vectors and the inferred word vectors through a discriminator (discriminator) of the GAN model; (e) feeding back a discrimination result of the authenticity discrimination to the generator to adjust a parameter setting of the generator; (f) repeating the steps (c) through (e), training the GAN model to optimize the GAN model; and (g) integrating the word embedding model and the optimized generator of the GAN model to generate a behavior inference model.

In addition, the invention further discloses a behavior inference model generation method for the behavior inference model generation device. The behavior inference model generation device comprises a memory and a processor. The memory stores a plurality of program operation sequence data. Each of the program operation sequence data describes a plurality of program operation sequences. The behavior inference model generation method is executed by the processor and includes the following steps: (a) converting the program operation sequences of the program operation sequence data into word vectors through a word embedding model; (b) capturing first M word vectors of the word vectors as M input vectors of a generative countermeasure network (GAN) model for each of the program operation sequence data, M being a positive integer; (c) computing the M input vectors by a generator of the GAN model for each of the program operation sequence data to generate a plurality of inference word vectors; (d) for each program operation sequence data, performing authenticity judgment on the multiple word vectors and the multiple inferred word vectors through a discriminator of the GAN model; (e) feeding back a discrimination result of the authenticity discrimination to the generator to adjust a parameter setting of the generator; (f) repeating the steps (c) through (e), training the GAN model to optimize the GAN model; and (g) integrating the word embedding model and the optimized generator of the GAN model to generate a behavior inference model.

Other objects, technical means and embodiments of the present invention will be apparent to those skilled in the art from the accompanying drawings and the embodiments described later.

[ description of the drawings ]

Fig. 1 is a schematic diagram of a behavior inference model generation apparatus 1 of the present invention;

FIG. 2 is a schematic diagram of a generative countermeasure network;

FIG. 3 is a schematic representation of sequence data of a program operation;

FIG. 4 is a diagram depicting the distribution of word vectors in a two-dimensional space;

FIG. 5 depicts groups of word vectors after clustering;

FIG. 6 is a flow chart of a behavior inference model generation method of the present invention; and

fig. 7 is a flowchart of generating an abnormal behavior detection model in the behavior inference model generation method of the present invention.

[ notation ] to show

1: behavior inference model generation device

11: memory device

13: processor with a memory having a plurality of memory cells

POSD: program operation sequence data

GM: generative confrontation network model

GR: generator

DR: distinguishing device

IWV: input vector

PWV: inferred word vectors

RT: the result of the discrimination

WVD: word vector distribution space

G1-G4: word vector group

V1-V11: word vector

S601-S613: step (ii) of

S701-S707: step (ii) of

[ detailed description ] embodiments

This summary is explained below by way of examples, which are not intended to limit the invention to any particular environment, application, or particular manner in which the invention may be practiced as described in the examples. Therefore, the description of the embodiments is for the purpose of illustration only, and not for the purpose of limitation. It should be noted that in the following embodiments and the accompanying drawings, components which are not directly related to the present invention are omitted and not shown, and the dimensional relationship between the components in the drawings is only for easy understanding and is not intended to limit the actual scale.

A first embodiment of the invention is shown in fig. 1-3. Fig. 1 is a schematic diagram of a behavior inference model generation apparatus 1 according to the present invention. The behavior inference model generation apparatus 1 includes a memory 11 and a processor 13. The processor 13 is electrically connected to the memory 11. The memory 11 is used for storing a plurality of program operation sequence data POSD. Each program operation sequence data POSD records a plurality of program operation sequences. For example, the plurality of program operation sequences may be a dynamic program operation sequence, such as: an Application Programming Interface (API) sequence, a System Call (System Call) sequence, but is not limited thereto.

The processor 13 converts the plurality of program operation sequences of the program operation sequence data POSD into a plurality of word vectors WV through a word embedding (word embedding) model. The Word embedding (Word embedding) model may be, for example, a Word-to-vector (Word2Vec) model or a One-Hot Encoding (One-Hot Encoding) model. Subsequently, as shown in FIG. 2, for each program operation sequence POSD, the processor 13 retrieves the first M word vectors of the word vectors WV as M input vectors IWV of a Generative countermeasure Network (GAN) model GM, wherein M is a positive integer. It should be noted that the value of M can be set by the developer based on the type of dynamic program operation sequence to determine the number of word vectors to be input to the GAN model GM as the basis for inference.

For example, taking the API sequence as an illustration, the processor 13 may retrieve 100 API sequences of an executing application program through a trace program and store them as a program operation sequence data POSD, as shown in fig. 3. It should be noted that, based on the layout constraint, the API sequence shown in fig. 3 is only a part of the program operation sequence data POSD. Subsequently, as described above, for each program operation sequence data POSD, the processor 13 converts the 100 API sequences in the program operation sequence data POSD into the 100 word vector WV by the word embedding model. Next, for each program operation sequence data POSD, the processor 13 takes the first 20 word vectors of the 100 word vectors WV as the input vector IWV of the GAN model GM.

Referring to fig. 2, the processor 13 computes the M input vectors IWV through a generator GR of the GAN model GM for each program operation sequence data POSD to generate a plurality of inference word vectors PWV. The number of inference word vectors PWV is the same as the number of word vectors WV (e.g., 100), however, this number may equally be set by the developer based on the type of dynamic program operation sequence. The processor 13 performs an authenticity determination on the word vectors WV and the inferred word vectors PWV through a discriminator (discriminator) DR of the GAN model GM for each program operation sequence data POSD, and feeds back a determination result RT of the authenticity determination to the generator GR to adjust a parameter setting of the generator GR.

After adjusting the parameter setting of the generator GR, the processor 13 performs the operation on the input vector IWV again to generate a new inferred word vector PWV for each program operation sequence data POSD, and performs the authenticity determination and feeds back the determination result RT to the generator GR again through the determiner DR. The processor 13 trains the GAN model GM by repeatedly performing the above steps (i.e., generating an inference word vector PWV, performing authenticity judgment, feeding back a judgment result RT, and adjusting parameter settings of the generator GR) to optimize the GAN model GM, and finally integrates the word embedding model and the generator GR of the optimized GAN model GM to generate a behavior inference model.

Based on the foregoing description, it can be understood by those skilled in the art that, generally, the similarity between the inferred word vector PWV (i.e. the simulated word vector) generated by the generator GR after the parameter setting is adjusted and the word vector WV (i.e. the real word vector) will be higher and higher, and the discriminator DR will adjust the setting of the relevant parameters for performing the authenticity discrimination according to the inferred word vector PWV generated by the generator GR and the discrimination result RT. Therefore, the generator GR and the discriminator DR compete against each other, and finally, when the discriminator DR is hard to distinguish whether the inferred word vector PWV is true or false with the word vector WV of the program operation sequence data POSD, the model GM represents that the optimization training has been completed.

For example, the objective function of the optimization generator GR may be expressed as the following formula:

where M denotes the total number of program operation sequence data POSD, z denotes M input vectors IWV, g (z) denotes the inference word vectors PWV generated by the generator GR, and D (g (z)) denotes the probability that the discriminator DR judges the inference word vectors PWV to be true.

Furthermore, the objective function of the optimization arbiter GR can be expressed as the following formula:

where M denotes the total number of program operation sequence data POSD, X denotes the word vectors WV corresponding to the program operation sequence data POSD, D (X) denotes the probability that the word vectors WV are judged to be true by the discriminator DR, z denotes the M input vectors IWV, g (z) denotes the inferred word vectors PWV generated by the generator GR, and D (g (z)) denotes the probability that the inferred word vectors PWV are judged to be true by the discriminator DR.

Unlike the GAN model in the prior art, which randomly generates vectors and inputs them to the generator GR, the present invention inputs the first M word vectors of each program operation sequence data POSD to the generator GR of the GAN model, so that the GAN model trained by the present invention can be used as a behavior inference to predict the program operation sequences that have not yet been generated. Since the detailed operation of training the GAN model can be understood by those skilled in the art based on the foregoing description, it is not described herein again.

As mentioned above, the program operation sequences captured by the present invention can be dynamic program operation sequences, so those skilled in the art can appreciate that the program operation sequences captured by tracking programs or the program operation sequences recorded by monitoring the executed program by the operating system themselves can generate behavior inference models of specific program operation sequences by the present invention. In other words, the manner of generating the behavior inference model according to the present invention is applicable to the program operation sequence generated when any terminal device executes the program. For example, the plurality of program operation sequence data POSD may include a plurality of abnormal program operation sequence data, and each abnormal program operation sequence data is associated with a malicious program. As another example, the POSD may be a log file generated by an operating system monitoring an executed program.

In addition, the behavior inference model generated by the present invention can be compiled into an executable program, run in an operating system, and be used in conjunction with an abnormal behavior detection program. Accordingly, the behavior inference model of the present invention can infer a subsequent program operation sequence based on the first program operation sequences at the initial stage of program execution, and provide the inferred program operation sequence for the abnormal behavior detection program to determine whether the program operation sequence is abnormal behavior. For example, the abnormal behavior detection program may be an antivirus program, and the behavior inference model of the present invention may infer a program operation sequence of a program that has just been executed, and provide the program operation sequence to the antivirus program to determine whether the program is a malicious program.

Please refer to fig. 3-5 for a second embodiment of the present invention. The second embodiment is an extension of the first embodiment. In the present embodiment, the memory 11 further stores a plurality of behavior tags (not shown), and each program operation sequence data POSD corresponds to one of the behavior tags. The behavior tags can be, for example, a normal behavior tag, an abnormal behavior tag, etc., but are not limited thereto. In one embodiment, the program operation sequence data POSD includes a plurality of abnormal program operation sequence data, and each abnormal program operation sequence data is associated with a malicious program. In this case, the behavior tags may further include a malicious advertisement (Adware) program, a Worm (word) program, a Trojan (Trojan) program, and the like, but are not limited thereto.

As described in the first embodiment, the processor 13 converts the plurality of program operation sequences of the program operation sequence data POSD into a plurality of word vectors WV through the word embedding model. In this embodiment, the processor 13 further groups the word vectors WV of the program operation sequence data POSD into word vector groups based on a clustering algorithm, and compares the program operation sequences of the program operation sequence data POSD with at least one of the program operation sequences corresponding to at least one of the word vectors included in each of the word vector groups, respectively, to generate a feature vector of each of the program operation sequence data POSD.

For example, taking the API sequence as an illustration, the plurality of program operation sequences may include: "GetSystemInfo", "GetFileSize", "GetSystemDirectoryW", "GetSystemMetrics", "RegQueryValueExA", "RegOpenKeyExA", "LdrLoadDll", "NtCreatFile", "NtTadFile", "NtClose", and "NtOpenDirectoryObject". The processor 13 operates on the plurality of program operation sequences through a word embedding model, and generates word vectors V1-V11 corresponding to the respective program operation sequences. It is assumed that word vector V1 corresponds to "getsystemlnfo", word vector V2 corresponds to "GetFileSize", word vector V3 corresponds to "getsystemdireyw", word vector V4 corresponds to "getsystemetrics", word vector V5 corresponds to "RegQueryValueExA", word vector V6 corresponds to "regopenkeyeexa", word vector V7 corresponds to "ldlorladdill", word vector V8 corresponds to "ntcutfile", word vector V9 corresponds to "ntreadeadfile", word vector V10 corresponds to "NtClose", word vector V11 corresponds to "ntopenyobject".

FIG. 4 is a diagram illustrating word vectors V1-V11 in a word vector distribution space WVD. It should be noted that, for simplicity, the word vector distribution space WVD in the present embodiment represents the distribution of the word vectors by a two-dimensional space. In practice, however, the developer can determine the dimension of the word vector distribution space WVD according to the type of the sequence data of the program operation. Since those skilled in the art can understand how to set the spatial dimension of the output, the detailed description is omitted here.

In the word vector distribution space WVD, word vectors located closer to each other have similar parts of speech or semantic meanings. Therefore, the invention groups the word vectors based on a clustering algorithm of unsupervised learning, so as to be used as the basis for subsequently capturing the characteristics of the program operation sequence data POSD. In the present invention, the clustering Algorithm may be one of an Affinity Propagation (AP) clustering Algorithm, a spectrum (Spectral) clustering Algorithm, a Fuzzy C-means (FCM) clustering Algorithm, an Iterative Self-Organizing Data Analysis (ISODATA) clustering Algorithm, a K-means (K-means) clustering Algorithm, a Complete-link (CL) clustering Algorithm, a Single-link (SL) clustering Algorithm, and a Ward's method (Ward's method) clustering Algorithm, but is not limited thereto.

For example, the processor 13 groups the word vectors into four word vector groups G1-G4 based on the AP clustering algorithm, as shown in fig. 5. Word vector group G1 includes word vectors V1-V4, word vector group G2 includes word vectors V5-V6, word vector group G3 includes word vector V7, and word vector group G4 includes word vectors V8-V11. It should be noted that the number of word vector groups can be determined by the developer setting the parameters of the clustering algorithm (e.g., directly setting the number of groups required, or setting the number of iterations performed by the clustering algorithm). Since the detailed operation of clustering based on the clustering algorithm can be understood by those skilled in the art, it is not described herein again.

After obtaining the word vector groups, the processor 13 compares the program operation sequences of the program operation sequence data POSD with at least one of the program operation sequences corresponding to at least one of the word vectors included in each word vector group, so as to generate a feature vector of the program operation sequence data POSD. For example, if there is a program operation sequence corresponding to the word vector V2, the word vector V6, the word vector V8, and the word vector V11 in the program operation sequence data POSD, it indicates that the program operation sequence data POSD has a feature value of 1 for the word vector group G1, a feature value of 1 for the word vector group G2, a feature value of 0 for the word vector group G3, and a feature value of 2 for the word vector group G4, so that the feature vector of the program operation sequence data POSD is (1,1,0, 2). For another example, assuming that there is a program operation sequence corresponding to the word vector V1, the word vector V2, the word vector V4, the word vector V5, the word vector V7, the word vector V9, and the word vector V10 in the other program operation sequence data POSD, it indicates that the feature value of the word vector group G1 corresponding to the other program operation sequence data POSD is 3, the feature value of the word vector group G2 is 1, the feature value of the word vector group G3 is 1, and the feature value of the word vector group G4 is 2, so that the feature vector of the other program operation sequence data POSD is (3,1,1, 2).

It should be noted that the comparison for generating the feature vectors is implemented based on whether at least one of the program operation sequences corresponding to at least one of the word vectors included in each word vector group exists in the program operation sequence data POSD; however, in other embodiments, the alignment for generating the feature vectors may also be performed based on the number of at least one of the program operation sequences corresponding to at least one of the word vectors included in each word vector group in the program operation sequence data POSD. For example, if there are 5 program operation sequences corresponding to the word vector V2, 3 program operation sequences corresponding to the word vector V6, 1 program operation sequence corresponding to the word vector V8, and 3 program operation sequences corresponding to the word vector V11 in the program operation sequence data POSD, the eigenvalue of the word vector group G1 corresponding to the program operation sequence data POSD is 5, the eigenvalue of the word vector group G2 is 3, the eigenvalue of the word vector group G3 is 0, and the eigenvalue of the word vector group G4 is 4, so that the eigenvector of the program operation sequence data POSD is (5,3,0, 4).

After generating the feature vectors of the program operation sequence data POSD, the processor 13 performs a supervised learning of a classification algorithm based on the feature vectors and the behavior tags to generate a classifier. The classifier is used for classifying the plurality of feature vectors to correspond to the plurality of behavior labels. For example, the classification algorithm may be one of a Support Vector Machine (SVM) algorithm, a Decision Tree (DT) algorithm, a Bayesian (Bayes) algorithm, and a neighbor (NN) algorithm, but is not limited thereto.

The supervised learning is to make the feature vectors classified into proper categories after being operated by a classification algorithm, so as to correspond to the behavior labels, for example: the program operation sequence data POSDs corresponding to the malicious advertiser tags can be surely classified into the same category, the program operation sequence data POSDs corresponding to the worm tags can be surely classified into the same category, the program operation sequence data POSDs corresponding to the Trojan tags can be surely classified into the same category, and the program operation sequence data POSDs corresponding to the normal behavior tags can be surely classified into the same category.

Finally, the processor 13 generates an abnormal behavior detection model based on the word vector groups and the classifier. Therefore, the processor 13 may further integrate the abnormal behavior detection model, the word embedding model and the optimized GAN model generator GR to generate the behavior inference model. Therefore, the behavior inference model generated by the invention can not only infer the word vectors of the program operation sequences which are not generated based on the word vectors of the previous program operation sequences at the initial stage of program execution to predict the program operation which is not generated, but also detect abnormal behaviors based on the program operation sequences corresponding to the inferred word vectors so as to prevent the files in the computer from being damaged by malicious programs or the personal information of users from being stolen.

In other embodiments, after generating the abnormal behavior Detection model, the processor 13 may utilize a plurality of test program operation sequence data to test the abnormal behavior Detection model, and determine the accuracy of the abnormal behavior Detection model identifying the plurality of test program operation sequence data according to a Detection Rate (Detection Rate), so that a developer may adjust the related parameter settings of the word embedding model, the clustering algorithm, and the classification algorithm based on the accuracy, and perform the operation of generating the abnormal behavior Detection model again. Therefore, the invention can generate different abnormal behavior detection models aiming at different types of program operation sequence data through the operation, so as to detect the abnormal behavior of various dynamic program operation sequences. Similarly, the behavior inference model generated by the present invention can be compiled into an executable program running in an operating system to provide the operating system with the ability to detect abnormal behavior (e.g., malicious programs, illegal operations, etc.).

Referring to fig. 6, a flowchart of a behavior inference model generation method according to a third embodiment of the present invention is shown. The behavior inference model generation method is suitable for a behavior inference model generation apparatus (for example, the behavior inference model generation apparatus 1 of the foregoing embodiment). The behavior inference model generation device comprises a memory and a processor. The memory stores a plurality of program operation sequence data. Each program operation sequence data records a plurality of program operation sequences. The behavior inference model generation method is executed by a processor.

First, in step S601, the program operation sequences of the program operation sequence data are converted into word vectors (e.g., word vector WV shown in FIG. 2) by a word embedding model. Then, in step S603, for each program operation sequence data, the first M word vectors of the word vectors are extracted as M input vectors (e.g., the input vector IWV shown in fig. 2) of a generative confrontation network (GAN) model, where M is a positive integer.

In step S605, for each program operation sequence data, the M input vectors are computed by a generator of the GAN model to generate a plurality of inference word vectors (e.g., inference word vector PWV shown in fig. 2). Subsequently, in step S607, for each program operation sequence data, an authenticity determination is performed on the word vectors and the inferred word vectors by a discriminator of the GAN model. Then, in step S609, a determination result of the authenticity determination is fed back to the generator to adjust a parameter setting of the generator.

In step S611, the steps S605 to S609 are repeated to train the GAN model to optimize the GAN model. As described in the first embodiment, the generator and the discriminator respectively adjust the settings of the related parameters after receiving the discrimination result and the regenerated inference word vector. Finally, in step S613, the word embedding model and the optimized GAN model are integrated to generate a behavior inference model.

In other embodiments, the plurality of program operation sequences is a dynamic program operation sequence, which is an application programming interface sequence or a system call sequence. In one embodiment, the dynamic program operation sequence is captured by a trace program. In other embodiments, the word embedding model is one of a word-to-vector model and a one-hot coding model.

In addition to the above steps, the behavior inference model generation method of the present embodiment can also perform all the operations described in the foregoing embodiments and have all the corresponding functions. Those skilled in the art can directly understand how to perform these operations and have these functions based on the foregoing embodiments, and therefore, the detailed description is omitted here.

Referring to fig. 7, a fourth embodiment of the present invention is an extension of the third embodiment. In this embodiment, step S613 further includes: integrating an abnormal behavior detection model, the word embedding model and the optimized generator of the GAN model to generate a behavior inference model. FIG. 7 is a flowchart of the method for generating abnormal behavior detection models according to the present invention.

In step S701, the word vectors of the program operation sequence data are grouped into word vector groups based on a clustering algorithm. Next, in step S703, the program operation sequences of each program operation sequence data are respectively aligned with at least one of the program operation sequences corresponding to at least one of the word vectors included in each word vector group, so as to generate a feature vector of each program operation sequence data.

In step S705, a supervised learning of a classification algorithm is performed based on the feature vectors and the behavior labels to generate a classifier. The classifier is used for classifying the plurality of feature vectors to correspond to the plurality of behavior labels. Finally, in step S707, an abnormal behavior detection model is generated based on the word vector groups and the classifiers.

In other embodiments, the clustering Algorithm is one of an Affinity Propagation (AP) clustering Algorithm, a Spectral (Spectral) clustering Algorithm, a Fuzzy C-means (FCM) clustering Algorithm, an Iterative Self-Organizing Data Analysis Technique (Iterative Self-Organizing Data Analysis Technique) Algorithm, ISODATA) clustering Algorithm, a K-means (K-means) clustering Algorithm, a Complete-link (CL) clustering Algorithm, a Single-link (SL) clustering Algorithm, and a Wald's method (Ward's method) clustering Algorithm, and the classification Algorithm is one of a support vector machine (NN) Algorithm, a Decision Tree (Decision Tree; DT) Algorithm, a Bayesian (Bayesian) Algorithm, and a neighbor (neighbor) Algorithm.

In addition, the behavior inference model generation method of the present invention can be implemented by a computer storage medium. The computer storage medium stores a computer program comprising a plurality of program instructions, and after the computer program is loaded and installed on an electronic computing device (for example, the behavior inference model generation device 1), a processor of the electronic computing device executes the program instructions included in the computer program to execute the behavior inference model generation method of the present invention. The computer storage medium may be, for example: a Read Only Memory (ROM), a flash memory, a floppy disk, a hard disk, a Compact Disk (CD), a USB disk (USB disk), a magnetic tape, a database accessible by a network, or any other storage medium known to those skilled in the art and having the same functions.

In summary, after the program operation sequence data is converted into a plurality of word vectors by the word embedding model, the first M word vectors of the word vectors are input to the generator of the generative confrontation network model to generate a plurality of inferred word vectors, the authenticity is determined by the discriminator of the generative confrontation network model, and the discrimination result is fed back to the generator, so that the generator can adjust the parameter setting according to the discrimination result. Therefore, the discriminator repeatedly carries out true and false discrimination on the inferred word vector and the real word vector and feeds back the discrimination result to the generator, so that the generator can adjust the parameter setting and generate the inferred word vector which is more similar to the real word vector.

The above-mentioned embodiments are only used to illustrate the implementation of the present invention and to explain the technical features of the present invention, and are not used to limit the protection scope of the present invention. Any arrangement which can be easily changed or equalized by a person skilled in the art is included in the scope of the present invention, and the scope of the present invention is defined by the appended claims.

Claims

1. A behavior inference model generation apparatus, comprising:

a memory for storing a plurality of program operation sequence data, each of the program operation sequence data describing a plurality of program operation sequences; and

a processor electrically connected to the memory and configured to perform the following steps:

(a) converting the program operation sequences of each program operation sequence data into a plurality of word vectors through a word embedding (word embedding) model,

(b) capturing first M word vectors of the word vectors as M input vectors of a Generative Adaptive Network (GAN) model for each program operation sequence data, wherein M is a positive integer;

(c) for each of the program operation sequence data, operating the M input vectors via a generator (generator) of the GAN model to generate a plurality of inference word vectors;

(d) for each program operation sequence data, performing an authenticity judgment on the word vectors and the inferred word vectors through a discriminator (discriminator) of the GAN model;

(e) feeding back a discrimination result of the authenticity discrimination to the generator to adjust a parameter setting of the generator;

(f) repeating the steps (c) through (e), training the GAN model to optimize the GAN model; and

(g) integrating the word embedding model and the optimized generator of the GAN model to generate a behavior inference model;

wherein the processor further integrates an abnormal behavior detection model, the word embedding model, and the optimized generator of the GAN model to generate the behavior inference model;

the memory further stores a plurality of behavior tags, each of the program operation sequence data corresponds to one of the plurality of behavior tags, and the processor further performs the following steps:

grouping the word vectors of the program operation sequence data into word vector groups based on a clustering algorithm;

comparing the program operation sequences of the program operation sequence data with at least one of the program operation sequences corresponding to at least one of the word vectors included in the word vector group to generate a feature vector of the program operation sequence data;

performing supervised learning of a classification algorithm based on a plurality of feature vectors and the plurality of behavior labels to generate a classifier, wherein the classifier is used for classifying the plurality of feature vectors to correspond to the plurality of behavior labels; and

generating the abnormal behavior detection model based on the word vector groups and the classifier.

2. The apparatus of claim 1, wherein the plurality of program operation sequences are dynamic program operation sequences.

3. The apparatus of claim 2, wherein the dynamic program operation sequence is an Application Programming Interface (API) sequence.

4. A behavior inference model generation apparatus as defined in claim 2, wherein the dynamic program operation sequence is a System Call (System Call) sequence.

5. The apparatus of claim 2, wherein the sequence of dynamic program operations is captured by a tracking program.

6. The apparatus of claim 1, wherein the Word embedding model is One of a Word-to-vector (Word2Vec) model and a One-Hot Encoding (One-Hot Encoding) model.

7. The apparatus of claim 1, wherein the program operation sequence data comprises abnormal program operation sequence data, and each abnormal program operation sequence data is associated with a malicious program.

8. The apparatus of claim 1, wherein the clustering Algorithm is one of a neighbor Propagation (AP) clustering Algorithm, a Spectral (Spectral) clustering Algorithm, a Fuzzy C-means (FCM) clustering Algorithm, an Iterative Self-Organizing Data Analysis (ISODATA) clustering Algorithm, a K-means (K-means) clustering Algorithm, a Complete-Linkage (CL) clustering Algorithm, a Single-Linkage (SL) clustering Algorithm, and a Ward's method (Ward's method) clustering Algorithm, and the classification algorithm is one of a Support Vector Machine (SVM) algorithm, a Decision Tree (DT) algorithm, a Bayesian (Bayes) algorithm, and a neighbor (NN) algorithm.

9. A behavior inference model generation method for a behavior inference model generation apparatus, the behavior inference model generation apparatus comprising a memory and a processor, the memory storing a plurality of program operation sequence data, each of the program operation sequence data recording a plurality of program operation sequences, the behavior inference model generation method being executed by the processor and comprising the steps of:

wherein the step (g) further comprises the steps of:

integrating an abnormal behavior detection model, the word embedding model and the optimized generator of the GAN model to generate the behavior inference model;

the memory further stores a plurality of behavior tags, each of the program operation sequence data corresponds to one of the plurality of behavior tags, and the behavior inference model generating method further comprises the following steps:

10. The method of claim 9, wherein the plurality of program operation sequences are dynamic program operation sequences.

11. The method of claim 10, wherein the dynamic program operation sequence is an Application Programming Interface (API) sequence.

12. The method of claim 10, wherein the dynamic program operation sequence is a System Call (System Call) sequence.

13. The method of claim 10, wherein the sequence of dynamic program operations is captured by a tracking program.

14. The method of claim 9, wherein the Word embedding model is One of a Word-to-vector (Word2Vec) model and a One-Hot Encoding (One-Hot Encoding) model.

15. The method of claim 9, wherein the program operation sequence data comprises abnormal program operation sequence data, and each abnormal program operation sequence data is associated with a malicious program.

16. The method of claim 9, wherein the clustering Algorithm is one of an Affinity Propagation (AP) clustering Algorithm, a Spectral (Spectral) clustering Algorithm, a Fuzzy C-means (FCM) clustering Algorithm, an Iterative Self-Organizing Data Analysis (ITQ-Organizing Data Analysis) Technique (ISODATA) clustering Algorithm, a K-means (K-means) clustering Algorithm, a Complete Link (CL) clustering Algorithm, a Single Link (SL) clustering Algorithm, and a Ward's method (Ward's method) Algorithm, and the classification algorithm is one of a Support Vector Machine (SVM) algorithm, a Decision Tree (DT) algorithm, a Bayesian (Bayes) algorithm, and a neighbor (NN) algorithm.