CN115576840A - Static program pile insertion detection method and device based on machine learning - Google Patents

Static program pile insertion detection method and device based on machine learning Download PDF

Info

Publication number
CN115576840A
CN115576840A CN202211357366.5A CN202211357366A CN115576840A CN 115576840 A CN115576840 A CN 115576840A CN 202211357366 A CN202211357366 A CN 202211357366A CN 115576840 A CN115576840 A CN 115576840A
Authority
CN
China
Prior art keywords
instrumentation
basic block
result
program
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211357366.5A
Other languages
Chinese (zh)
Other versions
CN115576840B (en
Inventor
刘昱玮
余媛萍
贾相堃
苏璞睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN202211357366.5A priority Critical patent/CN115576840B/en
Publication of CN115576840A publication Critical patent/CN115576840A/en
Application granted granted Critical
Publication of CN115576840B publication Critical patent/CN115576840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3644Software debugging by instrumenting at runtime
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a program instrumentation detection method and device based on machine learning, and belongs to the technical field of network security. The method comprises the following steps: acquiring a target program; performing pile insertion on a target program to obtain a binary file; converting the binary file into an intermediate language representation; calculating the characteristic vector of each basic block in the binary file based on the intermediate language expression, and sending the characteristic vector into a machine learning model to identify the actual instrumentation result of each basic block; generating a code attribute graph based on the target program and the intermediate language expression, learning the code attribute graph by using a graph neural network, and performing linear transformation on the learned node feature vectors to judge the expected pile insertion result of each basic block according to the score of each node; and obtaining the pile inserting detection result of the target program according to the actual pile inserting result and the expected pile inserting result of each basic block. The invention can evaluate the accuracy of the existing static program pile inserting method.

Description

Static program pile insertion detection method and device based on machine learning
Technical Field
The invention belongs to the technical field of network security, and particularly relates to a program instrumentation detection method and device based on machine learning.
Background
Program instrumentation refers to the process of inserting code segments into a target program that can feed back specific information or perform specific functions without destroying the original running logic integrity of the target program. Program instrumentation may provide information support for program analysis and vulnerability mining methods including taint analysis, symbolic execution, and fuzz testing. Where static program instrumentation is currently the popular method of program instrumentation due to its efficiency advantages.
The existing improvement of the static program instrumentation method focuses on improving the information quantity acquired by the program instrumentation and improving the execution efficiency of the program instrumentation. For example, angora developed by Chen Peng of the university of shanghai science and technology, and TortoiseFuzz developed by Wang Yanhao of the university of chinese academy of sciences, improve vulnerability mining capability by acquiring more information of a program in the running process, including program running context, program memory access times, and the like; selectiveTaint, developed by Chen Sanchuan et al, ohio state university, improves taint analysis execution efficiency by inserting taint analysis instrumentation code only for certain instructions.
While the above improvements improve the capacity and efficiency of program instrumentation, higher accuracy requirements are also placed on program instrumentation, and no research has been proposed to date for static program instrumentation accuracy.
Disclosure of Invention
Aiming at the problem that the accuracy of the existing static program instrumentation research is not concerned, the invention aims to provide a static program instrumentation error detection method based on machine learning, which extracts the code characteristics of basic blocks of a target program through static analysis, inputs the extracted basic block characteristics into word2vec to identify the instrumentation situation in each basic block, converts the program into a code attribute graph and inputs the code attribute graph into a graph neural network to identify the instrumentation error.
The technical content of the invention comprises:
a method of machine learning-based procedural instrumentation detection, the method comprising:
acquiring a target program, and performing instrumentation on the target program to obtain a binary file;
converting the binary file into an intermediate language representation;
calculating a feature vector of each basic block in the binary file based on the intermediate language representation, and sending the feature vector to a machine learning model to identify an actual instrumentation result of each basic block;
generating a code attribute graph based on the target program and the intermediate language representation, learning the code attribute graph by using a graph neural network, and performing linear transformation on the learned node feature vectors to judge the expected instrumentation result of each basic block according to the score of each node; the nodes in the code attribute graph are basic blocks, and edges in the code attribute graph are constructed based on the dependency relationship between control flows and data flows among the basic blocks;
and obtaining the instrumentation detection result of the target program according to the actual instrumentation result and the expected instrumentation result of each basic block.
Further, the intermediate language representation includes: LLVM IR intermediate language representation.
Further, the calculating a feature vector of each basic block in the binary file based on the intermediate language includes:
selecting features in each basic block based on the intermediate language; the features include: instruction operation codes, instruction operands, and instruction sequences; the instruction sequence is a sequence in which the instruction operation codes are ordered according to the appearance sequence of the instructions in the basic block;
acquiring a serial number of the instruction operation code according to an operation code coding table, and acquiring a feature vector of the instruction operation code based on the serial number; the operation code coding table is constructed based on the occurrence frequency and the letter sequence of each instruction operation code in the training set;
acquiring a feature vector of the instruction operand according to the operand length of the instruction operand;
calculating a feature vector of the instruction sequence according to the feature vector of each basic block in the instruction sequence;
and obtaining the feature vector of the basic block based on the feature vector of the instruction operation code, the feature vector of the instruction operand and the feature vector of the instruction sequence.
Further, the operation code includes: alloca, store, load, and icmp.
Further, the operands include: immediate and variable.
Further, the generating a code attribute map of the binary file based on the target program and the intermediate language includes:
generating an abstract syntax tree, a control flow graph and a program dependency graph of the binary file based on the target program and the intermediate language; wherein, the basic block information in the control flow graph includes the out degree and the in degree of the basic block, the number of instructions and whether instrumentation codes are included, and the information in the program dependency graph includes: data dependency and control dependency information of the basic block;
and merging the abstract syntax tree, the control flow graph and the program dependency graph into a code attribute graph.
Further, the obtaining an instrumentation detection result of the target program according to the actual instrumentation result and the expected instrumentation result of each basic block includes:
comparing the actual pile inserting result with the expected pile inserting result one by one;
if the comparison result of each basic block is consistent, the pile inserting detection result is that the pile inserting is correct;
and if the comparison result of at least one basic block is not consistent, the instrumentation detection result of the target program is an instrumentation error.
A machine learning-based procedural instrumentation detection apparatus, the apparatus comprising:
the file acquisition module is used for acquiring a target program and performing instrumentation on the target program to obtain a binary file;
the file conversion module is used for converting the binary file into an intermediate language representation;
the first detection module is used for calculating the characteristic vector of each basic block in the binary file based on the intermediate language representation and sending the characteristic vector to a machine learning model so as to identify the actual instrumentation result of each basic block;
the second detection module is used for generating a code attribute graph of the binary file based on the target program and the intermediate language representation, learning the code attribute graph by using a graph neural network, and performing linear transformation on the learned node feature vectors so as to judge the expected instrumentation result of each basic block according to the score of each node; the nodes in the code attribute graph are basic blocks, and edges in the code attribute graph are constructed based on the dependency relationship between control flow and data flow among the basic blocks;
and the result generation module is used for obtaining the instrumentation detection result of the target program according to the actual instrumentation result and the expected instrumentation result of each basic block.
A storage medium having a computer program stored therein, wherein the computer program is arranged to perform any of the above methods when executed.
An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform any of the methods described above.
Compared with the prior art, the invention has the following advantages and positive effects:
1. the invention realizes the automatic identification of the instrumentation code by extracting the code characteristics of the target program and inputting the characteristics into the machine learning model, thereby realizing the universal identification capability of the instrumentation code aiming at various static program instrumentation tools.
2. The invention extracts and codes LLVM IR code characteristics of a basic block of a target program, and converts the target program into an intermediate language irrelevant to the architecture so as to realize cross-architecture instrumentation code identification.
3. According to the invention, by generating the abstract syntax tree of the target program, the control flow diagram and the program dependence diagram and combining the abstract syntax tree, the control flow diagram and the program dependence diagram into the code attribute diagram, the characteristics of the input model can more comprehensively represent the target program, and the accuracy of pile insertion error identification is improved.
Drawings
FIG. 1 is a flowchart of an embodiment of a static program instrumentation detection method based on machine learning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the static program instrumentation detection method based on machine learning of the present invention, the flow chart of the whole method is shown in fig. 1.
In fig. 1, the static program instrumentation detection method for machine learning includes the following steps:
1. selecting a proper experimental sample, and constructing a binary file data set containing cross-architecture static instrumentation
Collecting target program samples in a Google fuzzy Bench database, and compiling under x86, x86_64 and AArch64 architectures by using a instrumentation tool in AFL to obtain binary files compiled by different architectures and completing static instrumentation.
2. Extracting characteristic information of experimental sample
The binary files in the dataset are converted to LLVM IR intermediate language by the McSema tool. The operation code including the instruction in the basic block, the operand of the instruction, and the instruction sequence are selected as the extracted features. The operation code comprises alloca, store, load, icmp and the like; the operands include an immediate and a variable; the instruction sequence is a sequence in which the operation codes are ordered according to the appearance order of the instructions in the basic block.
3. Feature encoding feature information extracted from a data set
In the obtained characteristics, the instruction operation codes are sorted according to the occurrence frequencies of the instruction operation codes in the data set, the instruction operation codes with the same frequencies are sorted according to the alphabetical order, and the instruction operation codes are coded into serial numbers; encoding the instruction operand according to the operand length of the instruction operand, wherein the encoding is the corresponding byte length, namely if the instruction operand is an immediate number, assigning 0; if the instruction operand is a variable, encoding the instruction operand into the byte length of the operand according to the length of the variable of the operand; the instruction sequence is encoded as a vector corresponding to the opcode encoding in the basic block.
4. Features are fed into a machine learning model for training to identify actual instrumentation results in the basic blocks
And (3) training the data set by using a word2vec machine learning method, training the feature codes obtained before, and judging the training effect of each model on the data set by using the accuracy, the AUC value, the recall rate and the F1 value. And averagely dividing the processed data characteristics into ten parts, selecting one part as a test set, and taking the other nine parts as a training set, and judging whether the instrumentation codes exist in the basic block or not in turn.
5. Constructing a code attribute graph for an experimental sample
And generating a corresponding abstract syntax tree, a control flow graph and a program dependency graph based on the target binary program and the LLVM IR intermediate language generated in the second step. The basic block information in the control flow graph comprises the out degree, the in degree, the number of contained instructions, whether instrumentation codes are contained or not and the like of the basic block. The information in the program dependency graph includes data dependency and control dependency information for the basic blocks. And finally, combining the generated abstract syntax tree, the control flow graph and the program dependency graph into a code attribute graph, wherein the combined code attribute graph takes the basic blocks as graph nodes, the initial nodes as the inlet basic blocks of the target program, and the control flow and data flow dependency relations among the basic blocks are taken as the edges of the graph nodes.
6. Inputting the constructed code attribute graph into a machine learning model for training, and realizing detection of an expected instrumentation result
And (3) selecting a graph neural network machine learning method to train a data set, training the characteristics of the obtained code attribute graph, and after the training is finished, performing linear transformation on the vector representation of the node to obtain the score of each basic block so as to judge whether the pile insertion is expected to exist in each basic block. The accuracy, AUC, recall, and F1 values were used to assess the training effect of each model on the data set. And averagely dividing the processed data characteristics into ten parts, selecting one part as a test set, and taking the other nine parts as a training set, and alternately judging whether the instrumentation is expected to exist.
7. Detection of pile insertion errors is achieved based on actual pile insertion results and expected pile insertion results
Comparing whether the actual pile inserting result is consistent with the expected pile inserting result one by one according to basic blocks: if the basic block is consistent, the instrumentation error does not exist in the basic block, and if the basic block is not consistent, the instrumentation error exists in the basic block. And finally outputting a instrumentation error detection report of the target program.
In summary, the static program instrumentation error detection method based on machine learning provided by the invention fills the blank in the aspect of static program instrumentation error detection. The cross-architecture automatic program instrumentation error detection method can be used for multiple static program instrumentation tools, and the accuracy of the conventional static program instrumentation method is evaluated.
The invention also discloses a program instrumentation detection device based on machine learning, which can be computer equipment and also can be arranged in the computer equipment. The device includes: the device comprises a file acquisition module, a file conversion module, a first detection module, a second detection module and a result generation module.
The file acquisition module is used for acquiring a target program and performing instrumentation on the target program to obtain a binary file;
the file conversion module is used for converting the binary file into an intermediate language representation;
the first detection module is used for calculating the characteristic vector of each basic block in the binary file based on the intermediate language representation and sending the characteristic vector to a machine learning model so as to identify the actual instrumentation result of each basic block;
the second detection module is used for generating a code attribute graph of the binary file based on the target program and the intermediate language representation, learning the code attribute graph by using a graph neural network, and performing linear transformation on the learned node feature vectors so as to judge an expected instrumentation result of each basic block according to the score of each node; the nodes in the code attribute graph are basic blocks, and edges in the code attribute graph are constructed based on the dependency relationship between control flows and data flows among the basic blocks;
and the result generation module is used for obtaining the instrumentation detection result of the target program according to the actual instrumentation result and the expected instrumentation result of each basic block.
For the explanation of the specific execution process, beneficial effects, etc. of the device module, please refer to the description of the above method embodiment, which is not described herein again.
In an exemplary embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the computer program being loaded and executed by the processor to implement the above-mentioned machine learning-based program instrumentation detection method.
In an exemplary embodiment, a computer-readable storage medium is also provided, on which a computer program is stored, which, when being executed by a processor, implements the machine learning-based program instrumentation detection method as described above.
In an exemplary embodiment, a computer program product is also provided, which, when run on a computer device, causes the computer device to perform the machine learning based program instrumentation detection method as described above.
Although specific embodiments of the invention have been disclosed for illustrative purposes and the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated by reference, those skilled in the art will appreciate that: various substitutions, alterations, and modifications are possible without departing from the spirit and scope of this disclosure and the appended claims. Therefore, the present invention should not be limited to the disclosure of the preferred embodiments and the drawings, but the scope of the invention is defined by the appended claims.

Claims (10)

1. A program instrumentation detection method based on machine learning, the method comprising:
acquiring a target program, and performing instrumentation on the target program to obtain a binary file;
converting the binary file into an intermediate language representation;
calculating a feature vector of each basic block in the binary file based on the intermediate language representation, and sending the feature vector to a machine learning model to identify an actual instrumentation result of each basic block;
generating a code attribute graph based on the target program and the intermediate language representation, learning the code attribute graph by using a graph neural network, and performing linear transformation on the learned node feature vectors to judge an expected pile insertion result of each basic block according to the score of each node; the nodes in the code attribute graph are basic blocks, and edges in the code attribute graph are constructed based on the dependency relationship between control flows and data flows among the basic blocks;
and obtaining the instrumentation detection result of the target program according to the actual instrumentation result and the expected instrumentation result of each basic block.
2. The method of claim 1, wherein the intermediate language representation comprises: LLVM IR intermediate language representation.
3. The method of claim 1, wherein said computing a feature vector for each basic block in the binary file based on the intermediate language comprises:
selecting features in each basic block based on the intermediate language; the features include: instruction operation codes, instruction operands, and instruction sequences; the instruction sequence is a sequence in which the instruction operation codes are ordered according to the appearance sequence of the instructions in the basic block;
acquiring a serial number of the instruction operation code according to an operation code coding table, and acquiring a feature vector of the instruction operation code based on the serial number; the operation code coding table is constructed based on the occurrence frequency and the letter sequence of each instruction operation code in the training set;
acquiring a feature vector of the instruction operand according to the operand length of the instruction operand;
calculating a feature vector of the instruction sequence according to the feature vector of each basic block in the instruction sequence;
and obtaining the feature vector of the basic block based on the feature vector of the instruction operation code, the feature vector of the instruction operand and the feature vector of the instruction sequence.
4. The method of claim 3, wherein the opcode comprises: alloca, store, load, and icmp.
5. The method of claim 3, wherein the operands comprise: immediate and variable.
6. The method of claim 1, wherein generating the code property graph for the binary file based on the target program and the intermediate language comprises:
generating an abstract syntax tree, a control flow graph and a program dependency graph of the binary file based on the target program and the intermediate language; wherein, the basic block information in the control flow graph includes the out degree and the in degree of the basic block, the number of instructions and whether instrumentation codes are included, and the information in the program dependency graph includes: data dependency and control dependency information of the basic block;
and merging the abstract syntax tree, the control flow graph and the program dependency graph into a code attribute graph.
7. The method of claim 1, wherein obtaining instrumentation detection results for the target program based on the actual instrumentation results and the expected instrumentation results for each basic block comprises:
comparing the actual pile inserting result with the expected pile inserting result one by one;
if the comparison result of each basic block is consistent, the pile inserting detection result is that the pile inserting is correct;
and if the comparison result of at least one basic block is not consistent, the instrumentation detection result of the target program is an instrumentation error.
8. A program stake detection apparatus based on machine learning, the apparatus comprising:
the file acquisition module is used for acquiring a target program and performing instrumentation on the target program to obtain a binary file;
the file conversion module is used for converting the binary file into an intermediate language representation;
the first detection module is used for calculating the characteristic vector of each basic block in the binary file based on the intermediate language representation and sending the characteristic vector to a machine learning model so as to identify the actual instrumentation result of each basic block;
the second detection module is used for generating a code attribute graph of the binary file based on the target program and the intermediate language representation, learning the code attribute graph by using a graph neural network, and performing linear transformation on the learned node feature vectors so as to judge the expected instrumentation result of each basic block according to the score of each node; the nodes in the code attribute graph are basic blocks, and edges in the code attribute graph are constructed based on the dependency relationship between control flows and data flows among the basic blocks;
and the result generation module is used for obtaining the instrumentation detection result of the target program according to the actual instrumentation result and the expected instrumentation result of each basic block.
9. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when executed, perform the method of any of claims 1-7.
10. An electronic device comprising a memory having a computer program stored therein and a processor arranged to execute the computer program to perform the method according to any of claims 1-7.
CN202211357366.5A 2022-11-01 2022-11-01 Static program pile insertion detection method and device based on machine learning Active CN115576840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211357366.5A CN115576840B (en) 2022-11-01 2022-11-01 Static program pile insertion detection method and device based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211357366.5A CN115576840B (en) 2022-11-01 2022-11-01 Static program pile insertion detection method and device based on machine learning

Publications (2)

Publication Number Publication Date
CN115576840A true CN115576840A (en) 2023-01-06
CN115576840B CN115576840B (en) 2023-04-18

Family

ID=84589190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211357366.5A Active CN115576840B (en) 2022-11-01 2022-11-01 Static program pile insertion detection method and device based on machine learning

Country Status (1)

Country Link
CN (1) CN115576840B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361182A (en) * 2023-04-03 2023-06-30 南京航空航天大学 Symbol execution method for error state guidance
CN116578979A (en) * 2023-05-15 2023-08-11 软安科技有限公司 Cross-platform binary code matching method and system based on code features

Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8060869B1 (en) * 2007-06-08 2011-11-15 Oracle America, Inc. Method and system for detecting memory problems in user programs
CN104536898A (en) * 2015-01-19 2015-04-22 浙江大学 C-program parallel region detecting method
US20150161028A1 (en) * 2013-12-09 2015-06-11 International Business Machines Corporation System and method for determining test coverage
US20180060580A1 (en) * 2016-09-01 2018-03-01 Cylance Inc. Training a machine learning model for container file analysis
US20180204002A1 (en) * 2017-01-18 2018-07-19 New York University Determining an aspect of behavior of an embedded device such as, for example, detecting unauthorized modifications of the code and/or behavior of an embedded device
CN108416219A (en) * 2018-03-18 2018-08-17 西安电子科技大学 A kind of Android binary files leak detection method and system
CN108647520A (en) * 2018-05-15 2018-10-12 浙江大学 A kind of intelligent fuzzy test method and system based on fragile inquiry learning
US20180365139A1 (en) * 2017-06-15 2018-12-20 Microsoft Technology Licensing, Llc Machine learning for constrained mutation-based fuzz testing
CN109308415A (en) * 2018-09-21 2019-02-05 四川大学 One kind is towards binary guiding performance fuzz testing method and system
CN110008710A (en) * 2019-04-15 2019-07-12 上海交通大学 Leak detection method based on deeply study and Program path pitching pile
US20200184070A1 (en) * 2018-12-06 2020-06-11 Nec Laboratories America, Inc. Confidential machine learning with program compartmentalization
CN111460472A (en) * 2020-03-20 2020-07-28 西北大学 Encryption algorithm identification method based on deep learning graph network
US10762200B1 (en) * 2019-05-20 2020-09-01 Sentinel Labs Israel Ltd. Systems and methods for executable code detection, automatic feature extraction and position independent code detection
CN111639344A (en) * 2020-07-31 2020-09-08 中国人民解放军国防科技大学 Vulnerability detection method and device based on neural network
CN111859388A (en) * 2020-06-30 2020-10-30 广州大学 Multi-level mixed vulnerability automatic mining method
CN112328505A (en) * 2021-01-04 2021-02-05 中国人民解放军国防科技大学 Method and system for improving coverage rate of fuzz test
EP3812886A1 (en) * 2019-10-24 2021-04-28 Eberhard Karls Universität Tübingen System and method for optimising programming codes
US20210157906A1 (en) * 2019-11-27 2021-05-27 Data Security Technologies LLC Systems and methods for proactive and reactive data security
CN113360915A (en) * 2021-06-09 2021-09-07 扬州大学 Intelligent contract multi-vulnerability detection method and system based on source code graph representation learning
CN113672908A (en) * 2021-07-31 2021-11-19 荣耀终端有限公司 Fixed point pile inserting method, related device and system
CN114064506A (en) * 2021-11-29 2022-02-18 电子科技大学 Binary program fuzzy test method and system based on deep neural network
CN114168454A (en) * 2021-11-23 2022-03-11 叶嵩 Asynchronous testing method based on dynamic pile inserting-pile pinning technology
US20220107793A1 (en) * 2021-12-14 2022-04-07 Intel Corporation Concept for Placing an Execution of a Computer Program
US20220121429A1 (en) * 2020-10-20 2022-04-21 Battelle Energy Alliance, Llc Systems and methods for architecture-independent binary code analysis
CN114579969A (en) * 2022-05-05 2022-06-03 北京邮电大学 Vulnerability detection method and device, electronic equipment and storage medium
CN115129591A (en) * 2022-06-28 2022-09-30 山东大学 Binary code-oriented reproduction vulnerability detection method and system
CN115202736A (en) * 2022-06-14 2022-10-18 北京理工大学 Cross-platform binary function representation method and device for control flow chart

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8060869B1 (en) * 2007-06-08 2011-11-15 Oracle America, Inc. Method and system for detecting memory problems in user programs
US20150161028A1 (en) * 2013-12-09 2015-06-11 International Business Machines Corporation System and method for determining test coverage
CN104536898A (en) * 2015-01-19 2015-04-22 浙江大学 C-program parallel region detecting method
US20180060580A1 (en) * 2016-09-01 2018-03-01 Cylance Inc. Training a machine learning model for container file analysis
US20180204002A1 (en) * 2017-01-18 2018-07-19 New York University Determining an aspect of behavior of an embedded device such as, for example, detecting unauthorized modifications of the code and/or behavior of an embedded device
US20180365139A1 (en) * 2017-06-15 2018-12-20 Microsoft Technology Licensing, Llc Machine learning for constrained mutation-based fuzz testing
CN108416219A (en) * 2018-03-18 2018-08-17 西安电子科技大学 A kind of Android binary files leak detection method and system
CN108647520A (en) * 2018-05-15 2018-10-12 浙江大学 A kind of intelligent fuzzy test method and system based on fragile inquiry learning
CN109308415A (en) * 2018-09-21 2019-02-05 四川大学 One kind is towards binary guiding performance fuzz testing method and system
US20200184070A1 (en) * 2018-12-06 2020-06-11 Nec Laboratories America, Inc. Confidential machine learning with program compartmentalization
CN110008710A (en) * 2019-04-15 2019-07-12 上海交通大学 Leak detection method based on deeply study and Program path pitching pile
US10762200B1 (en) * 2019-05-20 2020-09-01 Sentinel Labs Israel Ltd. Systems and methods for executable code detection, automatic feature extraction and position independent code detection
EP3812886A1 (en) * 2019-10-24 2021-04-28 Eberhard Karls Universität Tübingen System and method for optimising programming codes
US20210157906A1 (en) * 2019-11-27 2021-05-27 Data Security Technologies LLC Systems and methods for proactive and reactive data security
CN111460472A (en) * 2020-03-20 2020-07-28 西北大学 Encryption algorithm identification method based on deep learning graph network
CN111859388A (en) * 2020-06-30 2020-10-30 广州大学 Multi-level mixed vulnerability automatic mining method
CN111639344A (en) * 2020-07-31 2020-09-08 中国人民解放军国防科技大学 Vulnerability detection method and device based on neural network
US20220121429A1 (en) * 2020-10-20 2022-04-21 Battelle Energy Alliance, Llc Systems and methods for architecture-independent binary code analysis
CN112328505A (en) * 2021-01-04 2021-02-05 中国人民解放军国防科技大学 Method and system for improving coverage rate of fuzz test
CN113360915A (en) * 2021-06-09 2021-09-07 扬州大学 Intelligent contract multi-vulnerability detection method and system based on source code graph representation learning
CN113672908A (en) * 2021-07-31 2021-11-19 荣耀终端有限公司 Fixed point pile inserting method, related device and system
CN114168454A (en) * 2021-11-23 2022-03-11 叶嵩 Asynchronous testing method based on dynamic pile inserting-pile pinning technology
CN114064506A (en) * 2021-11-29 2022-02-18 电子科技大学 Binary program fuzzy test method and system based on deep neural network
US20220107793A1 (en) * 2021-12-14 2022-04-07 Intel Corporation Concept for Placing an Execution of a Computer Program
CN114579969A (en) * 2022-05-05 2022-06-03 北京邮电大学 Vulnerability detection method and device, electronic equipment and storage medium
CN115202736A (en) * 2022-06-14 2022-10-18 北京理工大学 Cross-platform binary function representation method and device for control flow chart
CN115129591A (en) * 2022-06-28 2022-09-30 山东大学 Binary code-oriented reproduction vulnerability detection method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
苏璞睿: "软件漏洞自动利用研究综述" *
赵尚儒,李学俊: "安全漏洞自动利用综述" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361182A (en) * 2023-04-03 2023-06-30 南京航空航天大学 Symbol execution method for error state guidance
CN116361182B (en) * 2023-04-03 2023-12-05 南京航空航天大学 Symbol execution method for error state guidance
CN116578979A (en) * 2023-05-15 2023-08-11 软安科技有限公司 Cross-platform binary code matching method and system based on code features
CN116578979B (en) * 2023-05-15 2024-05-31 软安科技有限公司 Cross-platform binary code matching method and system based on code features

Also Published As

Publication number Publication date
CN115576840B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN115576840B (en) Static program pile insertion detection method and device based on machine learning
US10521224B2 (en) Automatic identification of relevant software projects for cross project learning
CN109740347B (en) Method for identifying and cracking fragile hash function of intelligent device firmware
Meng et al. Improving fault localization and program repair with deep semantic features and transferred knowledge
CN111897946B (en) Vulnerability patch recommendation method, vulnerability patch recommendation system, computer equipment and storage medium
CN115033895B (en) Binary program supply chain safety detection method and device
CN111045670B (en) Method and device for identifying multiplexing relationship between binary code and source code
CN113591093A (en) Industrial software vulnerability detection method based on self-attention mechanism
CN116541286A (en) High coverage rate test data generation method based on pile insertion and symbol execution
CN116627490A (en) Intelligent contract byte code similarity detection method
CN113760358A (en) Countermeasure sample generation method for source code classification model
CN114064472B (en) Automatic software defect repairing acceleration method based on code representation
CN115017015B (en) Method and system for detecting abnormal behavior of program in edge computing environment
CN115878498A (en) Key byte extraction method for predicting program behavior based on machine learning
CN113204349B (en) RL-based hyper-optimization compiler establishment method, code hyper-optimization method and system
CN115712760A (en) Binary code abstract generation method and system based on BERT model and deep isometric convolutional neural network
CN112162932A (en) Symbol execution optimization method and device based on linear programming prediction
Saougkos et al. Revisiting java bytecode compression for embedded and mobile computing environments
CN114969131B (en) Information query method, device and equipment
CN116911321B (en) Method and assembly for front-end automatic translation of dictionary values
Duy et al. VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model
Chen et al. Mining API protocols based on a balanced probabilistic model
CN112394984B (en) Firmware code analysis method and device
EP4053759A1 (en) Machine learning pipeline skeleton instantiation
EP4050524A2 (en) Machine learning pipeline skeleton instantiation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant