WO2017084586A1 - 基于深度学习方法推断恶意代码规则的方法、系统及设备 - Google Patents
基于深度学习方法推断恶意代码规则的方法、系统及设备 Download PDFInfo
- Publication number
- WO2017084586A1 WO2017084586A1 PCT/CN2016/106128 CN2016106128W WO2017084586A1 WO 2017084586 A1 WO2017084586 A1 WO 2017084586A1 CN 2016106128 W CN2016106128 W CN 2016106128W WO 2017084586 A1 WO2017084586 A1 WO 2017084586A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- string
- vector
- malicious code
- malicious
- attribute data
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the invention proposes a method for inferring malicious code rules based on a deep learning method, comprising:
- a system for inferring malicious code rules based on a deep learning method including:
- a retraining module configured to extract a character string in a preset range from the key string according to the malicious sample feature vector, and use the word2vec idea to train the string in the preset range again to obtain a second Training result
- the training module is specifically configured to: perform feature extraction on the key string, and infer a string with the most relevance according to the feature, where the most relevant string is used to indicate An explanatory string describing the key string.
- the parsing module is specifically configured to: select a comparison sample by using a Monte Carlo method, and calculate an Euclidean distance between the character string in the comparison sample and the character string of the dex file with malicious code; The Euclidean distance and a preset Euclidean distance threshold are extracted from the string of the dex file with malicious code.
- the constructing the malicious sample feature vector in the building module is based on the sample attribute data, summarizing n sample attribute data, and calculating a vector of the attribute data based on the n sample attribute data, and calculating the attribute data.
- a vector two-two similarity matrix retaining the main vector, accumulating the various dimensional components of all of the main vectors.
- the main vector is reserved in the building module, and each dimension component of all the main vectors is accumulated, and the vector of the attribute data whose difference between the two similarity matrices is greater than a preset threshold is reserved, and The vector of the retained attribute data is used as the malicious sample feature vector, wherein for each vector of the reserved attribute data, each dimensional component of the vector of the same attribute data is accumulated.
- the obtaining module is specifically configured to: obtain a string with a maximum relevance, find a string feature between the associated strings, and finally obtain a malicious code according to the string feature between the associated strings. String rules.
- the invention provides a terminal device comprising: one or more processors; a memory; one or more programs, The one or more programs are stored in the memory and, when executed by the one or more processors, perform the following operations:
- a malicious code string rule is obtained based on the second training result.
- the invention utilizes the creative word2vec idea to apply to the field of malicious code analysis, trains the string of the known malicious code, obtains the string with the most relevance of the malicious code string, and obtains the string rule of the malicious code.
- the invention can fully utilize the characteristics of the malicious sample to infer malicious code rules with low false positive rate and high coverage rate, so as to optimize the existing virus detection engine and improve the detection efficiency of the malicious code.
- the invention can also be applied to the field of malicious code detection and malicious code analysis.
- FIG. 1 is a flow chart of an embodiment of a method for inferring malicious code rules based on a deep learning method provided by the present invention
- FIG. 3 is a flow chart of a method for inferring malicious code rules based on a deep learning method, in accordance with an embodiment of the present invention
- FIG. 4 is a structural diagram of a system embodiment for inferring malicious code rules based on a deep learning method provided by the present invention.
- the present invention provides a method and system for inferring malicious code rules based on a deep learning method.
- the origin of deep learning methods is applied to image recognition, and text-based context feature extraction has only achieved some results in recent years.
- the traditional text feature extraction algorithm model is N-gram, but there is a problem with N-gram. If some n-tuples in the training corpus do not appear, the corresponding conditional probability is 0, which will lead to the calculation of a whole sentence probability. Is 0. Insufficient corpus It is impossible to train higher-order language models. In addition, this model cannot model the similarities between words.
- the neural network model is also widely concerned.
- the neural network-based language model performs very well in terms of effect, but its training and prediction time is longer, which affects the actual application.
- a deep learning text method represented by word2vec such as the word2vec text feature extraction model
- word2vec text feature extraction model a deep learning text method represented by word2vec
- the traditional word2vec uses the article segmentation and then the contextual relationship training, which cannot be directly applied to the field of malicious code analysis technology. Therefore, the present invention has done creative research.
- the preset rule may be set according to actual needs, for example, the preset Rules can include: URL address, code with functions such as jumps or calls, package name, class name, and so on.
- the function call relationship in the dex file with malicious code, the code structure context, and the like, which can describe the content string relationship of the dex file can be extracted as a key string. And also need to select a corresponding number of control samples (or white samples, that is, officially released normal samples without malicious code), wherein the present invention adopts the Monte Carlo method (also known as statistical simulation method) A comparison sample with a malicious code sample (a sample to be extracted) is selected, and a plurality of equal numbers of control samples are calculated for use in the next step.
- the default rule includes a URL address, a package name, and a class name.
- a dex file with malicious code can parse the dex file into a string and according to the preset rule. Extracting a string containing the URL address, package name, or class name from the string, and using the string containing the URL address, package name, or class name as the key string, that is, after extracting
- the key string can be as follows:
- the key string extracted from the dex file with malicious code can be trained by using the deep learning text idea represented by word2vec to obtain the training result of the dex file with malicious code;
- the control sample in S101 is subjected to word2vec training to obtain multiple sets of training results of the control sample.
- S103 constructs a malicious sample feature vector by using the first training result.
- Nm sample attribute data is summarized based on sample attribute data, and attributes are calculated based on the n sample attribute data a vector of data, and a vector two-two similarity matrix of the attribute data, retaining the main vector, accumulating the dimensional components of all the main vectors; wherein the sample attribute data includes the comparison sample attribute data and the malicious sample attribute data, which is malicious
- the sample attribute data of the code is the key string extracted from the above dex file with malicious code
- the comparison sample attribute data is the key string extracted from the legal file.
- the comparison sample attribute data may be obtained by obtaining an officially released dex file corresponding to the dex file, and the officially released dex file may be regarded as a legal file, and parsing the legal file into a string. And extracting a key string from the string according to a preset rule, and using the key string extracted from the legal file as the attribute data of the control sample.
- the malicious code string rule is obtained according to the string feature between the most relevant strings.
- the string with the most relevance is that the vector is smaller than the other n-1 sample vectors in the Euclidean distance from the sample vector.
- Find the string feature (with more) associated with a malicious code sample find the string feature between the associated strings, and finally obtain the malicious code character based on the string feature between the associated strings.
- FIG. 3 is a flow diagram of a method for inferring malicious code rules based on a deep learning method, in accordance with an embodiment of the present invention. It is assumed that there are N dex files with malicious code, and the dex file with malicious code is called a malicious sample. It can be understood that each malicious sample should have a corresponding control sample, and the control sample is officially released. There are no normal samples of malicious code.
- the method for inferring malicious code rules based on the deep learning method may include:
- S303 performing a word2vec training on a function call relationship and a code structure context (ie, a key string in a malicious sample) of each malicious sample cluster, extracting feature vectors of each sample cluster according to a first preset rule, and generating a T1 of the malicious sample cluster.
- the eigenvectors (the number of dimensions of the feature vector can be limited to 20) of the (usually 10000+) strings. That is, training the entire malicious sample cluster to obtain T1 character-specific vectors.
- the feature vector of the same character string of the M control sample clusters may be calculated as an average feature vector as a representative of the feature vector of the character string of the control sample.
- the method for calculating the similarity may include, but is not limited to, Euclidean distance, Manhattan distance, and the like.
- weights can be calculated for the M groups respectively, and the values of all similarities are accumulated to obtain the similarity of each feature vector in the malicious sample cluster.
- the determining the similarity is low may adopt a method of setting a threshold.
- a threshold may be preset, and by calculating the Euclidean distance, if the similarity between the feature vector in the malicious sample cluster and the control sample cluster is less than the threshold, It can be determined that the feature vector has low similarity.
- the N samples belonging to the malicious sample cluster are separately trained as word2vec (ie, training each sample), and the trained string result belongs to the string in the above step S305 for filtering, that is, the training result is found.
- the feature vector belonging to the sensitive character string in step S305 is subjected to feature extraction of the found feature vector, and finally the malicious code rule is obtained.
- the specific implementation process of performing feature extraction on the found feature vector may be as follows: the above-identified feature vector can be regarded as a normal document, and LDA (Latent Dirichlet Allocation is one)
- LDA Topic Dirichlet Allocation is one
- a document subject generation model also known as a three-layer Bayesian probability model, performs feature extraction.
- an embodiment of the present invention further provides a system for inferring malicious code rules based on the deep learning method, which is provided by the embodiment of the present invention.
- the system for inferring malicious code rules based on the deep learning method corresponds to the method for inferring malicious code rules based on the deep learning method provided by the above several embodiments, and therefore the implementation method of the foregoing method for inferring malicious code rules based on the deep learning method is also applicable to The system for inferring malicious code rules based on the deep learning method provided in this embodiment is not described in detail in this embodiment.
- 4 is a structural diagram of a system embodiment for inferring malicious code rules based on a deep learning method provided by the present invention. As shown in FIG. 4, the system for inferring malicious code rules based on the deep learning method includes:
- the parsing module 401 is configured to parse the dex file with malicious code into a string and extract the key string; wherein the key string includes a function call relationship and a code structure context for describing the content character of the dex file The content of the string relationship.
- the specific implementation process of the parsing module 401 extracting the key string may be as follows: using the Monte Carlo method to select the control sample, and calculating the string between the control sample and the string of the dex file with malicious code. Euclidean distance; according to the Euclidean distance and the preset Euclidean distance threshold, the key string is extracted from the string of the dex file with malicious code.
- the training module 402 is configured to train the key string by using the word2vec idea to obtain a first training result.
- the training module 402 may perform feature extraction on the key string and infer the most relevant string according to the feature. , where the most relevant string is used to indicate an interpreted string for the key string.
- the main vector is reserved in the building module 403, and each dimension component of all the main vectors is accumulated, and the vector of the attribute data whose difference between the two similarity matrices is greater than the preset threshold is retained, and The vector of the retained attribute data is used as a malicious sample feature vector, wherein for each vector of the retained attribute data, the respective dimensional components of the vector of the same attribute data are accumulated.
- the retraining module 404 is configured to extract a character string in a preset range from the key string according to the malicious sample feature vector, and perform training on the character string in the preset range again by using the word2vec idea to obtain a second training result;
- the obtaining module 405 may perform feature extraction on the character string feature between the association strings by the document topic generation model LDA to obtain a malicious code string rule.
- the present invention also provides a terminal device comprising: one or more processors; a memory; one or more programs, one or more programs stored in the memory, when processed by one or more When the device is executed, do the following:
- S101' parses the dex file with malicious code into a string, and extracts the key string from the string according to a preset rule.
- the present invention also provides a storage medium for storing an application for performing the method for inferring malicious code rules based on the deep learning method according to the above embodiment of the present invention.
- the present invention relates to a method for inferring malicious code rules based on a deep learning method.
- the core of the method is based on the word2vec idea, using a deep learning method to train a string of known malicious code twice to obtain a malicious code character.
- the string with the most stringent relevance, and then the string rule of the malicious code finally obtains the relevance of the malicious sample.
- the invention can fully utilize the characteristics of the malicious sample to infer malicious code rules with low false positive rate and high coverage rate, optimize the existing virus detection engine, improve the efficiency of malicious code detection, and can also be applied to malicious code analysis.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Virology (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (17)
- 一种基于深度学习方法推断恶意代码规则的方法,其特征在于,包括:将带有恶意代码的dex文件解析成字符串,并根据预设规则从所述字符串中提取出关键字符串;利用word2vec思想对所述关键字符串进行训练,得到第一训练结果;通过所述第一训练结果,构建恶意样本特征向量;根据所述恶意样本特征向量从所述关键字符串中提取预设范围内的字符串,并对所述预设范围内的字符串再次利用word2vec思想进行训练,得到第二训练结果;基于所述第二训练结果获得恶意代码字符串规则。
- 如权利要求1所述的方法,其特征在于,所述利用word2vec思想对所述关键字符串进行训练,具体为:对所述关键字符串进行特征提取,并根据所述特征推断出关联性最大的字符串,其中,所述关联性最大的字符串用于指示针对所述关键字符串的解释性的字符串。
- 如权利要求1所述的方法,其特征在于,所述根据预设规则从所述字符串中提取出关键字符串,包括:选取对照样本,并计算所述对照样本中的字符串与所述带有恶意代码的dex文件的字符串之间的距离;根据所述距离以及预设的距离阈值,从所述带有恶意代码的dex文件的字符串中提取出所述关键字符串。
- 如权利要求1所述的方法,其特征在于,所述关键字符串包括函数调用关系和代码结构上下文中的用于描述所述dex文件内容字符串关系的内容。
- 如权利要求1所述的方法,其特征在于,所述构建恶意样本特征向量为:基于样本属性数据,归纳n个样本属性数据,并基于所述n个样本属性数据,计算属性数据的向量,并计算所述属性数据的向量两两相似度矩阵,保留主要向量,累加所有所述主要向量的各维度分量。
- 如权利要求5所述的方法,其特征在于,所述保留主要向量,累加所有所述主要向量的各维度分量,包括:将两两相似度矩阵之间的差异性大于预设阈值的属性数据的向量进行保留;将所述保留的属性数据的向量作为所述恶意样本特征向量,其中,针对所述保留的属性数据的向量,对相同的属性数据的向量的各维度分量进行累加。
- 如权利要求1所述的方法,其特征在于,所述基于训练结果获得恶意代码字符串规 则,具体为:通过获得关联性最大的字符串,找出关联性字符串之间的字符串特征,最终根据所述关联性字符串之间的字符串特征获得恶意代码字符串规则。
- 如权利要求7所述的方法,其特征在于,通过文档主题生成模型LDA将所述关联性字符串之间的字符串特征进行特征提取以获得所述恶意代码字符串规则。
- 一种基于深度学习方法推断恶意代码规则的系统,其特征在于,包括:解析模块,用于将带有恶意代码的dex文件解析成字符串,并根据预设规则从所述字符串中提取出关键字符串;训练模块,用于利用word2vec思想对所述关键字符串进行训练,得到第一训练结果;构建模块,用于通过所述第一训练结果,构建恶意样本特征向量;再训练模块,用于根据所述恶意样本特征向量从所述关键字符串中提取预设范围内的字符串,并对所述预设范围内的字符串再次利用word2vec思想进行训练,得到第二训练结果;获取模块,用于基于所述第二训练结果获得恶意代码字符串规则。
- 如权利要求9所述的系统,其特征在于,所述训练模块具体用于:对所述关键字符串进行特征提取,并根据所述特征推断出关联性最大的字符串,其中,所述关联性最大的字符串用于指示针对所述关键字符串的解释性的字符串。
- 如权利要求9所述的系统,其特征在于,所述解析模块具体用于:利用蒙特卡洛方法选取对照样本,并计算所述对照样本中的字符串与所述带有恶意代码的dex文件的字符串之间的欧式距离;根据所述欧式距离以及预设的欧式距离阈值,从所述带有恶意代码的dex文件的字符串中提取出所述关键字符串。
- 如权利要求9所述的系统,其特征在于,所述构建模块中的构建恶意样本特征向量为基于样本属性数据,归纳n个样本属性数据,并基于所述n个样本属性数据,计算属性数据的向量,并计算所述属性数据的向量两两相似度矩阵,保留主要向量,累加所有所述主要向量的各维度分量。
- 如权利要求12所述的系统,其特征在于,所述构建模块中的保留主要向量,累加所有所述主要向量的各维度分量具体为:将两两相似度矩阵之间的差异性大于预设阈值的属性数据的向量进行保留,并将所述保留的属性数据的向量作为所述恶意样本特征向量,其中,针对所述保留的属性数据的向量,对相同的属性数据的向量的各维度分量进行累加。
- 如权利要求9所述的系统,其特征在于,所述获取模块具体用于:通过获得关联性最大的字符串,找出关联性字符串之间的字符串特征,最终根据所述关联性字符串之间 的字符串特征获得恶意代码字符串规则。
- 如权利要求14所述的系统,其特征在于,所述获取模块通过文档主题生成模型LDA将所述关联性字符串之间的字符串特征进行特征提取以获得所述恶意代码字符串规则。
- 一种终端设备,其特征在于,包括:一个或者多个处理器;存储器;一个或多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:将带有恶意代码的dex文件解析成字符串,并根据预设规则从所述字符串中提取出关键字符串;利用word2vec思想对所述关键字符串进行训练,得到第一训练结果;通过所述第一训练结果,构建恶意样本特征向量;根据所述恶意样本特征向量从所述关键字符串中提取预设范围内的字符串,并对所述预设范围内的字符串再次利用word2vec思想进行训练,得到第二训练结果;基于所述第二训练结果获得恶意代码字符串规则。
- 一种存储介质,其特征在于,用于存储应用程序,所述应用程序用于执行权利要求1至8中任一项所述的基于深度学习方法推断恶意代码规则的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/572,082 US10503903B2 (en) | 2015-11-17 | 2016-11-16 | Method, system, and device for inferring malicious code rule based on deep learning method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510787438.3 | 2015-11-17 | ||
CN201510787438.3A CN105975857A (zh) | 2015-11-17 | 2015-11-17 | 基于深度学习方法推断恶意代码规则的方法及系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017084586A1 true WO2017084586A1 (zh) | 2017-05-26 |
Family
ID=56988279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/106128 WO2017084586A1 (zh) | 2015-11-17 | 2016-11-16 | 基于深度学习方法推断恶意代码规则的方法、系统及设备 |
Country Status (3)
Country | Link |
---|---|
US (1) | US10503903B2 (zh) |
CN (2) | CN105975857A (zh) |
WO (1) | WO2017084586A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107316024A (zh) * | 2017-06-28 | 2017-11-03 | 北京博睿视科技有限责任公司 | 基于深度学习的周界报警算法 |
CN107392019A (zh) * | 2017-07-05 | 2017-11-24 | 北京金睛云华科技有限公司 | 一种恶意代码家族的训练和检测方法及装置 |
CN107889068A (zh) * | 2017-12-11 | 2018-04-06 | 成都欧督系统科技有限公司 | 基于无线通信的消息广播控制方法 |
CN113312622A (zh) * | 2021-06-09 | 2021-08-27 | 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) | 一种检测url的方法及装置 |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8977858B1 (en) | 2014-05-27 | 2015-03-10 | Support Intelligence, Inc. | Using space-filling curves to fingerprint data |
CN105975857A (zh) * | 2015-11-17 | 2016-09-28 | 武汉安天信息技术有限责任公司 | 基于深度学习方法推断恶意代码规则的方法及系统 |
US10204226B2 (en) * | 2016-12-07 | 2019-02-12 | General Electric Company | Feature and boundary tuning for threat detection in industrial asset control system |
US10546143B1 (en) | 2017-08-10 | 2020-01-28 | Support Intelligence, Inc. | System and method for clustering files and assigning a maliciousness property based on clustering |
CN109726554B (zh) * | 2017-10-30 | 2021-05-18 | 武汉安天信息技术有限责任公司 | 一种恶意程序的检测方法、装置 |
CN107807987B (zh) * | 2017-10-31 | 2021-07-02 | 广东工业大学 | 一种字符串分类方法、系统及一种字符串分类设备 |
KR102016226B1 (ko) * | 2017-11-22 | 2019-08-29 | 숭실대학교산학협력단 | 적응형 동적 분석 방법, 적응형 동적 분석 플랫폼 및 이를 탑재한 장치 |
TWI658372B (zh) * | 2017-12-12 | 2019-05-01 | 財團法人資訊工業策進會 | 異常行為偵測模型生成裝置及其異常行為偵測模型生成方法 |
CN110198291B (zh) * | 2018-03-15 | 2022-02-18 | 腾讯科技(深圳)有限公司 | 一种网页后门检测方法、装置、终端及存储介质 |
CN110392081B (zh) * | 2018-04-20 | 2022-08-30 | 武汉安天信息技术有限责任公司 | 病毒库推送方法及装置、计算机设备和计算机存储介质 |
CN108959924A (zh) * | 2018-06-12 | 2018-12-07 | 浙江工业大学 | 一种基于词向量和深度神经网络的Android恶意代码检测方法 |
US10764246B2 (en) * | 2018-08-14 | 2020-09-01 | Didi Research America, Llc | System and method for detecting generated domain |
CN109614795B (zh) * | 2018-11-30 | 2023-04-28 | 武汉大学 | 一种事件感知的安卓恶意软件检测方法 |
CN111262818B (zh) * | 2018-11-30 | 2023-08-15 | 三六零科技集团有限公司 | 病毒检测方法、系统、装置、设备及存储介质 |
KR102327026B1 (ko) * | 2019-02-07 | 2021-11-16 | 고려대학교 산학협력단 | Gcn 기반의 어셈블리 코드 학습 장치 및 방법과 이를 이용한 보안 약점 탐지 장치 및 방법 |
CN110008987B (zh) * | 2019-02-20 | 2022-02-22 | 深圳大学 | 分类器鲁棒性的测试方法、装置、终端及存储介质 |
KR102255600B1 (ko) * | 2019-08-26 | 2021-05-25 | 국민대학교산학협력단 | Gan을 이용한 문서형 악성코드 탐지 장치 및 방법 |
CN110659420B (zh) * | 2019-09-25 | 2022-05-20 | 广州西思数字科技有限公司 | 一种基于深度神经网络蒙特卡洛搜索树的个性化配餐方法 |
CN110889283B (zh) * | 2019-11-29 | 2023-07-11 | 上海观安信息技术股份有限公司 | 一种系统审批理由随意性检测方法及系统 |
CN110727944B (zh) * | 2019-12-19 | 2020-06-02 | 江阴市普尔网络信息技术有限公司 | 一种安全网站及其检测入侵的方法 |
CN113127866B (zh) * | 2019-12-31 | 2023-08-18 | 奇安信科技集团股份有限公司 | 恶意代码的特征码提取方法、装置和计算机设备 |
CN113127863A (zh) * | 2019-12-31 | 2021-07-16 | 奇安信科技集团股份有限公司 | 恶意代码的检测方法、装置、计算机设备和存储介质 |
CN111241496B (zh) * | 2020-04-24 | 2021-06-29 | 支付宝(杭州)信息技术有限公司 | 确定小程序特征向量的方法、装置和电子设备 |
US11573785B2 (en) | 2020-05-14 | 2023-02-07 | International Business Machines Corporation | Predicting code vulnerabilities using machine learning classifier models trained on internal analysis states |
US20220198294A1 (en) * | 2020-12-23 | 2022-06-23 | Oracle International Corporation | Generalized production rules - n-gram feature extraction from abstract syntax trees (ast) for code vectorization |
CN113378165B (zh) * | 2021-06-25 | 2021-11-05 | 中国电子科技集团公司第十五研究所 | 一种基于Jaccard系数的恶意样本相似性判定方法 |
CN113890756B (zh) * | 2021-09-26 | 2024-01-02 | 网易(杭州)网络有限公司 | 用户账号的混乱度检测方法、装置、介质和计算设备 |
CN117556263B (zh) * | 2024-01-10 | 2024-04-23 | 阿里云计算有限公司 | 样本构建方法、代码生成方法、电子设备及存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411687A (zh) * | 2011-11-22 | 2012-04-11 | 华北电力大学 | 未知恶意代码的深度学习检测方法 |
US20120159625A1 (en) * | 2010-12-21 | 2012-06-21 | Korea Internet & Security Agency | Malicious code detection and classification system using string comparison and method thereof |
CN102737186A (zh) * | 2012-06-26 | 2012-10-17 | 腾讯科技(深圳)有限公司 | 恶意文件识别方法、装置及存储介质 |
CN103473506A (zh) * | 2013-08-30 | 2013-12-25 | 北京奇虎科技有限公司 | 用于识别恶意apk文件的方法和装置 |
CN104123500A (zh) * | 2014-07-22 | 2014-10-29 | 卢永强 | 一种基于深度学习的Android平台恶意应用检测方法及装置 |
CN104486461A (zh) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | 域名分类方法和装置、域名识别方法和系统 |
CN105975857A (zh) * | 2015-11-17 | 2016-09-28 | 武汉安天信息技术有限责任公司 | 基于深度学习方法推断恶意代码规则的方法及系统 |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6161130A (en) * | 1998-06-23 | 2000-12-12 | Microsoft Corporation | Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set |
US20100325109A1 (en) * | 2007-02-09 | 2010-12-23 | Agency For Science, Technology And Rearch | Keyword classification and determination in language modelling |
KR101377014B1 (ko) * | 2007-09-04 | 2014-03-26 | 삼성전자주식회사 | 면역 데이터베이스 기반의 악성코드 진단 방법 및 시스템 |
CN102141978A (zh) * | 2010-02-02 | 2011-08-03 | 阿里巴巴集团控股有限公司 | 一种文本分类的方法及系统 |
US9774616B2 (en) * | 2012-06-26 | 2017-09-26 | Oppleo Security, Inc. | Threat evaluation system and method |
US9411327B2 (en) * | 2012-08-27 | 2016-08-09 | Johnson Controls Technology Company | Systems and methods for classifying data in building automation systems |
CN103744905B (zh) * | 2013-12-25 | 2018-03-30 | 新浪网技术(中国)有限公司 | 垃圾邮件判定方法和装置 |
WO2015134665A1 (en) * | 2014-03-04 | 2015-09-11 | SignalSense, Inc. | Classifying data with deep learning neural records incrementally refined through expert input |
US9971765B2 (en) * | 2014-05-13 | 2018-05-15 | Nuance Communications, Inc. | Revising language model scores based on semantic class hypotheses |
US9769208B2 (en) * | 2015-05-28 | 2017-09-19 | International Business Machines Corporation | Inferring security policies from semantic attributes |
CN104933365B (zh) * | 2015-07-08 | 2018-04-27 | 中国科学院信息工程研究所 | 一种基于调用习惯的恶意代码自动化同源判定方法及系统 |
US10455088B2 (en) * | 2015-10-21 | 2019-10-22 | Genesys Telecommunications Laboratories, Inc. | Dialogue flow optimization and personalization |
US10397272B1 (en) * | 2018-05-10 | 2019-08-27 | Capital One Services, Llc | Systems and methods of detecting email-based attacks through machine learning |
-
2015
- 2015-11-17 CN CN201510787438.3A patent/CN105975857A/zh active Pending
-
2016
- 2016-11-16 WO PCT/CN2016/106128 patent/WO2017084586A1/zh active Application Filing
- 2016-11-16 CN CN201611024547.0A patent/CN106709345B/zh active Active
- 2016-11-16 US US15/572,082 patent/US10503903B2/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120159625A1 (en) * | 2010-12-21 | 2012-06-21 | Korea Internet & Security Agency | Malicious code detection and classification system using string comparison and method thereof |
CN102411687A (zh) * | 2011-11-22 | 2012-04-11 | 华北电力大学 | 未知恶意代码的深度学习检测方法 |
CN102737186A (zh) * | 2012-06-26 | 2012-10-17 | 腾讯科技(深圳)有限公司 | 恶意文件识别方法、装置及存储介质 |
CN103473506A (zh) * | 2013-08-30 | 2013-12-25 | 北京奇虎科技有限公司 | 用于识别恶意apk文件的方法和装置 |
CN104123500A (zh) * | 2014-07-22 | 2014-10-29 | 卢永强 | 一种基于深度学习的Android平台恶意应用检测方法及装置 |
CN104486461A (zh) * | 2014-12-29 | 2015-04-01 | 北京奇虎科技有限公司 | 域名分类方法和装置、域名识别方法和系统 |
CN105975857A (zh) * | 2015-11-17 | 2016-09-28 | 武汉安天信息技术有限责任公司 | 基于深度学习方法推断恶意代码规则的方法及系统 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107316024A (zh) * | 2017-06-28 | 2017-11-03 | 北京博睿视科技有限责任公司 | 基于深度学习的周界报警算法 |
CN107392019A (zh) * | 2017-07-05 | 2017-11-24 | 北京金睛云华科技有限公司 | 一种恶意代码家族的训练和检测方法及装置 |
CN107889068A (zh) * | 2017-12-11 | 2018-04-06 | 成都欧督系统科技有限公司 | 基于无线通信的消息广播控制方法 |
CN113312622A (zh) * | 2021-06-09 | 2021-08-27 | 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) | 一种检测url的方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN106709345B (zh) | 2020-05-19 |
CN106709345A (zh) | 2017-05-24 |
CN105975857A (zh) | 2016-09-28 |
US10503903B2 (en) | 2019-12-10 |
US20180096144A1 (en) | 2018-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017084586A1 (zh) | 基于深度学习方法推断恶意代码规则的方法、系统及设备 | |
WO2020244066A1 (zh) | 一种文本分类方法、装置、设备及存储介质 | |
US10785241B2 (en) | URL attack detection method and apparatus, and electronic device | |
US11544459B2 (en) | Method and apparatus for determining feature words and server | |
US11017178B2 (en) | Methods, devices, and systems for constructing intelligent knowledge base | |
CN111600919B (zh) | 智能网络应用防护系统模型的构建方法和装置 | |
CN112600834B (zh) | 内容安全识别方法及装置、存储介质和电子设备 | |
CN103914494A (zh) | 一种微博用户身份识别方法及系统 | |
US20230353585A1 (en) | Malicious traffic identification method and related apparatus | |
Zhang et al. | Toward unsupervised protocol feature word extraction | |
CN113961768B (zh) | 敏感词检测方法、装置、计算机设备和存储介质 | |
CN109614795A (zh) | 一种事件感知的安卓恶意软件检测方法 | |
CN103324886A (zh) | 一种网络攻击检测中指纹库的提取方法和系统 | |
CN112507167A (zh) | 一种识别视频合集的方法、装置、电子设备及存储介质 | |
CN113381963A (zh) | 一种域名检测方法、装置和存储介质 | |
US10217455B2 (en) | Linguistic model database for linguistic recognition, linguistic recognition device and linguistic recognition method, and linguistic recognition system | |
US20140129490A1 (en) | Image url-based junk detection | |
KR20210054799A (ko) | Url 클러스터링을 위한 url의 요약을 생성하는 방법 및 장치 | |
US10025936B2 (en) | Systems and methods for SQL value evaluation to detect evaluation flaws | |
CN116800518A (zh) | 一种网络防护策略的调整方法及装置 | |
CN115858776A (zh) | 一种变体文本分类识别方法、系统、存储介质和电子设备 | |
CN115473734A (zh) | 基于单分类和联邦学习的远程代码执行攻击检测方法 | |
CN114169540A (zh) | 一种基于改进机器学习的网页用户行为检测方法及系统 | |
CN113722713A (zh) | 一种恶意代码检测的方法、装置、电子设备及存储介质 | |
Wang et al. | A logical combination based application layer intrusion detection model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16865763 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15572082 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16865763 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16865763 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16/10/2018) |