US20210202031A1 - Methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis - Google Patents
Methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis Download PDFInfo
- Publication number
- US20210202031A1 US20210202031A1 US16/731,195 US201916731195A US2021202031A1 US 20210202031 A1 US20210202031 A1 US 20210202031A1 US 201916731195 A US201916731195 A US 201916731195A US 2021202031 A1 US2021202031 A1 US 2021202031A1
- Authority
- US
- United States
- Prior art keywords
- code
- individually
- verified
- queue
- identifiable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/564—Static detection by virus signature recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/53—Decompilation; Disassembly
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
Definitions
- the present invention relates to methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis.
- code fragments need to be identified in a source file before such fragments can be classified as code genes and analyzed.
- files are binary files which need to be disassembled into assembly code containing instructions. This involves parsing the code into blocks from which code genes can be extracted representing at least one logic unit (i.e., from the start-block location/address to the stop-block location/address of a single block).
- SRE Software Reverse Engineering
- IDA Pro IDA Pro
- RADARE RADARE
- GHIDRA GHIDRA
- Disassemblers identify the entry points of a binary file as the starting points for an assembly stream signature.
- the byte sequence of the code is disassembled into functions with their associated arguments. Disassembly is necessary if one wants to analyze a binary file in any meaningful way to determine its inherent functionality from a bitstream of indistinguishable zeroes and ones. By breaking the file into its component functions, each function can be analyzed and understood.
- disassembly can be multifold. Applications include ultimately searching for shared code genes by analyzing the genes with a database of known assembly-code fragments including both malicious and trusted code fragments. While disassemblers differ slightly, all disassemblers require a file to be fully disassembled before being able to further extract, normalize, and analyze such fragments for detection of code genes using a gene-analysis system, either trusted code or malware.
- RADARE Using RADARE, one can analyze code of a function at a known address. However, one is limited to only the code fragments that are known in advance. A series of code fragments in an unknown file would require full disassembly before inspection of any one of the code fragments extracted, making such undertakings tediously manual and lacking scalability.
- exemplary is used herein to refer to examples of embodiments and/or implementations, and is not meant to necessarily convey a more-desirable use-case.
- alternative and “alternatively” are used herein to refer to an example out of an assortment of contemplated embodiments and/or implementations, and is not meant to necessarily convey a more-desirable use-case. Therefore, it is understood from the above that “exemplary” and “alternative” may be applied herein to multiple embodiments and/or implementations. Various combinations of such alternative and/or exemplary embodiments are also contemplated herein.
- Embodiments of the present invention enable disassembly of binary code into assembly code by starting at one or more entry points, and continuing through other analysis points (such as blocks from jump commands and export calls).
- a function i.e., when the end of the last block of a function, or the beginning of the first block of the next function, is found
- the disassembled function can be accessed by a code-matching analysis program, without having to wait for all functions in the file to be fully disassembled. This saves substantial time in the overall detection and analysis of shared code genes.
- the disassembler can terminate the disassembly process during disassembly, saving valuable resources of the disassembler to process other files.
- Such a disassembler makes the overall process of disassembly and gene analysis streamlined through automation of the extracted gene analysis as each function is separated from the binary file and becomes available. Each function can be addressed during the analysis stage, allowing for scalability to process bulk files. Both aspects provide significant enhancement in the ability to rapidly process binary files to analyze their code fragments for malicious and/or trusted genes in a shared code database.
- Embodiments of the present invention provide a disassembler with an integrated Function-Queue Manager (FQM) for submitting disassembled functions to be searched within a database of known shared genes, both trusted and malicious.
- Embodiments of the present invention further provide a disassembly interrupter for determining whether to terminate disassembling of a target binary file during disassembly based on the gene information.
- FQM Function-Queue Manager
- the gene information regarding the target binary file is received from the gene-analysis system.
- Such gene information can include the total number of detected genes, the number of detected genes by category or type, the number of detected genes by gene criticality, severity, and/or importance, and/or the presence of detected genes by gene criticality, severity, and/or importance.
- the gene information can also include ancillary information about the file such as a current elapsed time for a given disassembly process.
- a method for an integrated disassembler for code gene analysis including the steps of: (a) upon receiving a target binary file, disassembling the target binary file into assembly code; (b) extracting individually-identifiable code fragments from the assembly code; (c) as each individually-identifiable code fragment is extracted, verifying each individually-identifiable code fragment; (d) upon availability, placing each verified individually-identifiable code fragment in an extractor queue; and (e) upon availability, submitting each individually-identifiable code fragment in the extractor queue to a gene-analysis system having a code genome database.
- the step of placing includes placing only each individually-identifiable code fragment that has been completely verified to be a valid function.
- the method further including the steps of: ( 0 upon determining each individually-identifiable code fragment has not been completely verified, placing each partially-verified individually-identifiable code fragment in a verification queue; (g) performing additional verification on each partially-verified individually-identifiable code fragment; and (h) upon successfully completing the additional verification on each partially-verified individually-identifiable code fragment, transferring each completely-verified individually-identifiable code fragment to the extractor queue.
- the method further including the step of: (i) upon determining the extractor queue is empty, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- the method further including the step of: (i) upon determining resources of the gene-analysis system are underutilized, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- the method further including the step of: (f) upon receiving gene information regarding the target binary file from the gene-analysis system during disassembly, determining whether to terminate the step of disassembling based on the gene information.
- a system for an integrated disassembler for code gene analysis including: (a) a CPU for performing computational operations; (b) a memory module for storing data; (c) a disassembly module configured for, upon receiving a target binary file, disassembling the target binary file into assembly code; (d) an extracting module configured for extracting individually-identifiable code fragments from the assembly code; (e) a verification module configured for, as each individually-identifiable code fragment is extracted, verifying each individually-identifiable code fragment; and (f) a function-queue manager configured for: (i) upon availability, placing each verified individually-identifiable code fragment in an extractor queue; and (ii) upon availability, submitting each individually-identifiable code fragment in the extractor queue to a gene-analysis system having a code genome database.
- the function-queue manager is further configured for: (iii) placing only each individually-identifiable code fragment that has been completely verified to be a valid function.
- the function-queue manager is further configured for: (iv) upon the verification module determining each individually-identifiable code fragment has not been completely verified, placing each partially-verified individually-identifiable code fragment in a verification queue; (v) performing additional verification on each partially-verified individually-identifiable code fragment by the verification module; and (vi) upon successfully completing the additional verification on each partially-verified individually-identifiable code fragment, transferring each completely-verified individually-identifiable code fragment to the extractor queue.
- the function-queue manager is further configured for: (vii) upon determining the extractor queue is empty, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- the function-queue manager is further configured for: (vii) upon determining resources of the gene-analysis system are underutilized, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- system further including: (g) a disassembly interrupter configured for, upon receiving gene information regarding the target binary file from the gene-analysis system during disassembly, determining whether to terminate the step of disassembling based on the gene information.
- a disassembly interrupter configured for, upon receiving gene information regarding the target binary file from the gene-analysis system during disassembly, determining whether to terminate the step of disassembling based on the gene information.
- a non-transitory computer-readable storage medium having computer-readable code embodied on the non-transitory computer-readable storage medium, for an integrated disassembler for code gene analysis
- the computer-readable code including: (a) program code for, upon receiving a target binary file, disassembling the target binary file into assembly code; (b) program code for extracting individually-identifiable code fragments from the assembly code; (c) program code for, as each individually-identifiable code fragment is extracted, verifying each individually-identifiable code fragment; (d) program code for, upon availability, placing each verified individually-identifiable code fragment in an extractor queue; and (e) program code for, upon availability, submitting each individually-identifiable code fragment in the extractor queue to a gene-analysis system having a code genome database.
- the placing includes placing only each individually-identifiable code fragment that has been completely verified to be a valid function.
- the computer-readable code further including: (f) program code for, upon determining each individually-identifiable code fragment has not been completely verified, placing each partially-verified individually-identifiable code fragment in a verification queue; (g) program code for performing additional verification on each partially-verified individually-identifiable code fragment; and (h) program code for, upon successfully completing the additional verification on each partially-verified individually-identifiable code fragment, transferring each completely-verified individually-identifiable code fragment to the extractor queue.
- the computer-readable code further including: (i) program code for, upon determining the extractor queue is empty, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- the computer-readable code further including: (i) program code for, upon determining resources of the gene-analysis system are underutilized, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- the computer-readable code further including: ( 0 program code for, upon receiving gene information regarding the target binary file from the gene-analysis system during disassembly, determining whether to terminate the disassembling based on the gene information.
- FIG. 1 is a simplified flowchart of the major process steps for an integrated disassembler for code gene extraction and analysis, according to embodiments of the present invention
- FIG. 2 is a simplified flowchart of the major process steps for the Function-Queue Manager (FQM) and disassembly interrupter, according to embodiments of the present invention.
- FQM Function-Queue Manager
- the present invention relates to methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis.
- the principles and operation for providing such methods and systems, according to the present invention may be better understood with reference to the accompanying description and the drawings.
- FIG. 1 is a simplified flowchart of the major process steps for an integrated disassembler with a function-queue manager for code gene extraction and analysis, according to embodiments of the present invention.
- the process starts with activation of the disassembly process upon accessing a target binary file and finding the entry points (Step 2 ).
- the binary file is then disassembled into assembly code by finding instructions such as function calls or starts of loops (Step 4 ).
- Individually identified code fragments are extracted from the assembly code (Step 6 ).
- the individually-identifiable code fragments are then queued upon availability for gene analysis without requiring the entire binary file to be fully disassembled (Step 8 ).
- the individually-identifiable code fragments are then submitted to a gene-analysis system for determining whether the code fragments are trusted or malicious (Step 10 ).
- FIG. 2 is a simplified flowchart of the major process steps for the Function-Queue Manager (FQM) and scan interrupter, according to embodiments of the present invention.
- FQM Function-Queue Manager
- scan interrupter scan interrupter
- a function is only placed in the extractor queue if the function has been completely verified (Step 28 ). If a function hasn't been completely verified, the function is placed in a verification queue (Step 30 ). Functions in the verification queue undergo further verification to determine if they are truly valid, unique, and meaningful functions (Step 32 ). Functions in the verification queue are transferred to the extractor queue upon successfully completing verification (Step 34 ).
- functions in the verification queue can also be transferred to the extractor queue upon the extractor queue becoming empty in order to prevent the gene-analysis system from becoming idle even without being completely verified (Step 36 ).
- the FQM can check if the gene-analysis system is idle or underutilized before transferring functions from verification queue to extractor queue (Step 38 ).
- a disassembly interrupter can determine whether to terminate disassembly based on the gene information (Step 40 ). Finally, the process returns to Step 26 by submitting the verified functions to the gene-analysis system.
- the disassembly interrupter prevents the disassembler from continuing to disassemble the target binary file unnecessarily once the gene-analysis system has obtained enough gene information regarding the file to categorize the nature of the file (e.g., known shared genes, trusted genes, and/or malicious genes), saving valuable resources of the disassembler to process other files.
- gene information can include the total number of detected genes, the number of detected genes by category or type, the number of detected genes by gene criticality, severity, and/or importance, and/or the presence of detected genes by gene criticality, severity, and/or importance.
- the gene information can also include ancillary information about the file such as a current elapsed time for a given disassembly process.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Virology (AREA)
- Biophysics (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Molecular Biology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- The present invention relates to methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis.
- Despite the rapid pace of technology in general, few industries today are as dynamic as that of cyber security. Attackers' techniques are constantly evolving, and along with them, the potential threat.
- For security teams, the challenge remains not to keep up, but rather, to outpace them. It is a persistent struggle: a never-ending, record-setting marathon at a constant sprint. Even as security professionals rest, attackers are hard at work. The tools and approaches used must also adapt in order to stay a step ahead in defending their organizations. Malware classification, which encompasses both the identification and attribution of code, has the power to unlock many clues that aid security teams in achieving this.
- Whether legitimate or malicious, nearly every software is composed of previously written code; the key to deeply understanding its nature and origins lies in discovering code that has appeared in previously known software. Reports on malware statistics indicate that there are around 350,000 new samples every day.
- In order to determine if a file is benign/trusted or malicious, code fragments need to be identified in a source file before such fragments can be classified as code genes and analyzed. Typically, such files are binary files which need to be disassembled into assembly code containing instructions. This involves parsing the code into blocks from which code genes can be extracted representing at least one logic unit (i.e., from the start-block location/address to the stop-block location/address of a single block).
- Software Reverse Engineering (SRE) relies on disassembling binary files using a disassembler (such as IDA Pro, RADARE, and GHIDRA). Such disassemblers identify the entry points of a binary file as the starting points for an assembly stream signature. The byte sequence of the code is disassembled into functions with their associated arguments. Disassembly is necessary if one wants to analyze a binary file in any meaningful way to determine its inherent functionality from a bitstream of indistinguishable zeroes and ones. By breaking the file into its component functions, each function can be analyzed and understood.
- The goal of such disassembly can be multifold. Applications include ultimately searching for shared code genes by analyzing the genes with a database of known assembly-code fragments including both malicious and trusted code fragments. While disassemblers differ slightly, all disassemblers require a file to be fully disassembled before being able to further extract, normalize, and analyze such fragments for detection of code genes using a gene-analysis system, either trusted code or malware.
- Using RADARE, one can analyze code of a function at a known address. However, one is limited to only the code fragments that are known in advance. A series of code fragments in an unknown file would require full disassembly before inspection of any one of the code fragments extracted, making such undertakings tediously manual and lacking scalability.
- Given that there can be a very large number of such functions in a binary file, when code matching of genes is the goal, such disassemblers are slow, clumsy, and inefficient in processing a file, requiring manual entry and consuming valuable processing time. To appreciate the significance of efficiency and scalability, typically such code matching currently involves analyzing tens of billions of genes from tens of millions of files. Disassembling each binary file for extracting genes with current disassemblers would take around 16 years (calculated based on 10 seconds per file for 50M files).
- It would be desirable to have methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis. Such methods and systems would, inter alia, overcome the various limitations mentioned above.
- It is the purpose of the present invention to provide methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis.
- It is noted that the term “exemplary” is used herein to refer to examples of embodiments and/or implementations, and is not meant to necessarily convey a more-desirable use-case. Similarly, the terms “alternative” and “alternatively” are used herein to refer to an example out of an assortment of contemplated embodiments and/or implementations, and is not meant to necessarily convey a more-desirable use-case. Therefore, it is understood from the above that “exemplary” and “alternative” may be applied herein to multiple embodiments and/or implementations. Various combinations of such alternative and/or exemplary embodiments are also contemplated herein.
- Embodiments of the present invention enable disassembly of binary code into assembly code by starting at one or more entry points, and continuing through other analysis points (such as blocks from jump commands and export calls). When a function is detected (i.e., when the end of the last block of a function, or the beginning of the first block of the next function, is found), the disassembled function can be accessed by a code-matching analysis program, without having to wait for all functions in the file to be fully disassembled. This saves substantial time in the overall detection and analysis of shared code genes. Moreover, once a code-matching analysis program has detected a requisite amount of gene information that the file contains, the disassembler can terminate the disassembly process during disassembly, saving valuable resources of the disassembler to process other files.
- Such a disassembler makes the overall process of disassembly and gene analysis streamlined through automation of the extracted gene analysis as each function is separated from the binary file and becomes available. Each function can be addressed during the analysis stage, allowing for scalability to process bulk files. Both aspects provide significant enhancement in the ability to rapidly process binary files to analyze their code fragments for malicious and/or trusted genes in a shared code database.
- Embodiments of the present invention provide a disassembler with an integrated Function-Queue Manager (FQM) for submitting disassembled functions to be searched within a database of known shared genes, both trusted and malicious. Embodiments of the present invention further provide a disassembly interrupter for determining whether to terminate disassembling of a target binary file during disassembly based on the gene information.
- The gene information regarding the target binary file is received from the gene-analysis system. Such gene information can include the total number of detected genes, the number of detected genes by category or type, the number of detected genes by gene criticality, severity, and/or importance, and/or the presence of detected genes by gene criticality, severity, and/or importance. Furthermore, the gene information can also include ancillary information about the file such as a current elapsed time for a given disassembly process.
- Therefore, according to the present invention, there is provided for the first time a method for an integrated disassembler for code gene analysis, the method including the steps of: (a) upon receiving a target binary file, disassembling the target binary file into assembly code; (b) extracting individually-identifiable code fragments from the assembly code; (c) as each individually-identifiable code fragment is extracted, verifying each individually-identifiable code fragment; (d) upon availability, placing each verified individually-identifiable code fragment in an extractor queue; and (e) upon availability, submitting each individually-identifiable code fragment in the extractor queue to a gene-analysis system having a code genome database.
- Alternatively, the step of placing includes placing only each individually-identifiable code fragment that has been completely verified to be a valid function.
- More alternatively, the method further including the steps of: (0 upon determining each individually-identifiable code fragment has not been completely verified, placing each partially-verified individually-identifiable code fragment in a verification queue; (g) performing additional verification on each partially-verified individually-identifiable code fragment; and (h) upon successfully completing the additional verification on each partially-verified individually-identifiable code fragment, transferring each completely-verified individually-identifiable code fragment to the extractor queue.
- Most alternatively, the method further including the step of: (i) upon determining the extractor queue is empty, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- Most alternatively, the method further including the step of: (i) upon determining resources of the gene-analysis system are underutilized, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- Alternatively, the method further including the step of: (f) upon receiving gene information regarding the target binary file from the gene-analysis system during disassembly, determining whether to terminate the step of disassembling based on the gene information.
- According to the present invention, there is provided for the first time a system for an integrated disassembler for code gene analysis, the system including: (a) a CPU for performing computational operations; (b) a memory module for storing data; (c) a disassembly module configured for, upon receiving a target binary file, disassembling the target binary file into assembly code; (d) an extracting module configured for extracting individually-identifiable code fragments from the assembly code; (e) a verification module configured for, as each individually-identifiable code fragment is extracted, verifying each individually-identifiable code fragment; and (f) a function-queue manager configured for: (i) upon availability, placing each verified individually-identifiable code fragment in an extractor queue; and (ii) upon availability, submitting each individually-identifiable code fragment in the extractor queue to a gene-analysis system having a code genome database.
- Alternatively, the function-queue manager is further configured for: (iii) placing only each individually-identifiable code fragment that has been completely verified to be a valid function.
- More alternatively, the function-queue manager is further configured for: (iv) upon the verification module determining each individually-identifiable code fragment has not been completely verified, placing each partially-verified individually-identifiable code fragment in a verification queue; (v) performing additional verification on each partially-verified individually-identifiable code fragment by the verification module; and (vi) upon successfully completing the additional verification on each partially-verified individually-identifiable code fragment, transferring each completely-verified individually-identifiable code fragment to the extractor queue.
- Most alternatively, the function-queue manager is further configured for: (vii) upon determining the extractor queue is empty, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- Most alternatively, the function-queue manager is further configured for: (vii) upon determining resources of the gene-analysis system are underutilized, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- Alternatively, the system further including: (g) a disassembly interrupter configured for, upon receiving gene information regarding the target binary file from the gene-analysis system during disassembly, determining whether to terminate the step of disassembling based on the gene information.
- According to the present invention, there is provided for the first time a non-transitory computer-readable storage medium, having computer-readable code embodied on the non-transitory computer-readable storage medium, for an integrated disassembler for code gene analysis, the computer-readable code including: (a) program code for, upon receiving a target binary file, disassembling the target binary file into assembly code; (b) program code for extracting individually-identifiable code fragments from the assembly code; (c) program code for, as each individually-identifiable code fragment is extracted, verifying each individually-identifiable code fragment; (d) program code for, upon availability, placing each verified individually-identifiable code fragment in an extractor queue; and (e) program code for, upon availability, submitting each individually-identifiable code fragment in the extractor queue to a gene-analysis system having a code genome database.
- Alternatively, the placing includes placing only each individually-identifiable code fragment that has been completely verified to be a valid function.
- More alternatively, the computer-readable code further including: (f) program code for, upon determining each individually-identifiable code fragment has not been completely verified, placing each partially-verified individually-identifiable code fragment in a verification queue; (g) program code for performing additional verification on each partially-verified individually-identifiable code fragment; and (h) program code for, upon successfully completing the additional verification on each partially-verified individually-identifiable code fragment, transferring each completely-verified individually-identifiable code fragment to the extractor queue.
- Most alternatively, the computer-readable code further including: (i) program code for, upon determining the extractor queue is empty, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- Most alternatively, the computer-readable code further including: (i) program code for, upon determining resources of the gene-analysis system are underutilized, transferring each partially-verified individually-identifiable code fragment to the extractor queue.
- Alternatively, the computer-readable code further including: (0 program code for, upon receiving gene information regarding the target binary file from the gene-analysis system during disassembly, determining whether to terminate the disassembling based on the gene information.
- These and further embodiments will be apparent from the detailed description and examples that follow.
- The present invention is herein described, by way of example only, with reference to the accompanying drawings, wherein:
-
FIG. 1 is a simplified flowchart of the major process steps for an integrated disassembler for code gene extraction and analysis, according to embodiments of the present invention; -
FIG. 2 is a simplified flowchart of the major process steps for the Function-Queue Manager (FQM) and disassembly interrupter, according to embodiments of the present invention. - The present invention relates to methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis. The principles and operation for providing such methods and systems, according to the present invention, may be better understood with reference to the accompanying description and the drawings.
- Referring to the drawings,
FIG. 1 is a simplified flowchart of the major process steps for an integrated disassembler with a function-queue manager for code gene extraction and analysis, according to embodiments of the present invention. The process starts with activation of the disassembly process upon accessing a target binary file and finding the entry points (Step 2). The binary file is then disassembled into assembly code by finding instructions such as function calls or starts of loops (Step 4). Individually identified code fragments are extracted from the assembly code (Step 6). The individually-identifiable code fragments are then queued upon availability for gene analysis without requiring the entire binary file to be fully disassembled (Step 8). The individually-identifiable code fragments are then submitted to a gene-analysis system for determining whether the code fragments are trusted or malicious (Step 10). - The queuing of the code fragments for gene analysis upon availability in
Step 8 is performed by an integrated function-queue manager of the disassembler.FIG. 2 is a simplified flowchart of the major process steps for the Function-Queue Manager (FQM) and scan interrupter, according to embodiments of the present invention. Once a function has been potentially identified during disassembly of target binary file (Step 20), each function is verified by the FQM (Step 22). Upon verification, the verified functions are placed in an extractor queue before transferring for gene analysis without waiting for the entire binary file to be disassembled (Step 24). Verified functions in the extractor queue are then submitted to the code gene database for code matching and gene analysis to identify trusted and malicious genes (Step 26). - In some embodiments, a function is only placed in the extractor queue if the function has been completely verified (Step 28). If a function hasn't been completely verified, the function is placed in a verification queue (Step 30). Functions in the verification queue undergo further verification to determine if they are truly valid, unique, and meaningful functions (Step 32). Functions in the verification queue are transferred to the extractor queue upon successfully completing verification (Step 34).
- Alternatively, functions in the verification queue can also be transferred to the extractor queue upon the extractor queue becoming empty in order to prevent the gene-analysis system from becoming idle even without being completely verified (Step 36). Alternatively, the FQM can check if the gene-analysis system is idle or underutilized before transferring functions from verification queue to extractor queue (Step 38). Alternatively, upon receiving gene information of the target binary file from the gene-analysis system during disassembly, a disassembly interrupter can determine whether to terminate disassembly based on the gene information (Step 40). Finally, the process returns to Step 26 by submitting the verified functions to the gene-analysis system.
- The disassembly interrupter prevents the disassembler from continuing to disassemble the target binary file unnecessarily once the gene-analysis system has obtained enough gene information regarding the file to categorize the nature of the file (e.g., known shared genes, trusted genes, and/or malicious genes), saving valuable resources of the disassembler to process other files. Examples of such gene information can include the total number of detected genes, the number of detected genes by category or type, the number of detected genes by gene criticality, severity, and/or importance, and/or the presence of detected genes by gene criticality, severity, and/or importance. Furthermore, the gene information can also include ancillary information about the file such as a current elapsed time for a given disassembly process.
- While the present invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications, and other applications of the present invention may be made.
Claims (15)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/731,195 US11056212B1 (en) | 2019-12-31 | 2019-12-31 | Methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/731,195 US11056212B1 (en) | 2019-12-31 | 2019-12-31 | Methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
US20210202031A1 true US20210202031A1 (en) | 2021-07-01 |
US11056212B1 US11056212B1 (en) | 2021-07-06 |
Family
ID=76545681
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/731,195 Active US11056212B1 (en) | 2019-12-31 | 2019-12-31 | Methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis |
Country Status (1)
Country | Link |
---|---|
US (1) | US11056212B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220414113A1 (en) * | 2021-06-29 | 2022-12-29 | International Business Machines Corporation | Managing extract, transform and load systems |
US12047440B2 (en) | 2021-10-05 | 2024-07-23 | International Business Machines Corporation | Managing workload in a service mesh |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8533836B2 (en) * | 2012-01-13 | 2013-09-10 | Accessdata Group, Llc | Identifying software execution behavior |
US9003529B2 (en) * | 2012-08-29 | 2015-04-07 | The Johns Hopkins University | Apparatus and method for identifying related code variants in binaries |
US20150186649A1 (en) * | 2013-12-31 | 2015-07-02 | Cincinnati Bell, Inc. | Function Fingerprinting |
-
2019
- 2019-12-31 US US16/731,195 patent/US11056212B1/en active Active
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220414113A1 (en) * | 2021-06-29 | 2022-12-29 | International Business Machines Corporation | Managing extract, transform and load systems |
US11841871B2 (en) * | 2021-06-29 | 2023-12-12 | International Business Machines Corporation | Managing extract, transform and load systems |
US20240012829A1 (en) * | 2021-06-29 | 2024-01-11 | International Business Machines Corporation | Managing extract, transform and load systems |
US12047440B2 (en) | 2021-10-05 | 2024-07-23 | International Business Machines Corporation | Managing workload in a service mesh |
Also Published As
Publication number | Publication date |
---|---|
US11056212B1 (en) | 2021-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101593253B (en) | Method and device for judging malicious programs | |
US9043917B2 (en) | Automatic signature generation for malicious PDF files | |
EP3447669B1 (en) | Information leakage detection method and device, server, and computer-readable storage medium | |
US20070152854A1 (en) | Forgery detection using entropy modeling | |
CN101923617B (en) | Cloud-based sample database dynamic maintaining method | |
US20080201779A1 (en) | Automatic extraction of signatures for malware | |
US20120151586A1 (en) | Malware detection using feature analysis | |
US11056212B1 (en) | Methods and systems for an integrated disassembler with a function-queue manager and a disassembly interrupter for rapid, efficient, and scalable code gene extraction and analysis | |
CN103020521B (en) | Wooden horse scan method and system | |
EP3346664B1 (en) | Binary search of byte sequences using inverted indices | |
CN103034808B (en) | Scan method, equipment and system and cloud management and equipment | |
WO2011076709A1 (en) | Malware identification and scanning | |
CN108256329B (en) | Fine-grained RAT program detection method and system based on dynamic behavior and corresponding APT attack detection method | |
US10607010B2 (en) | System and method using function length statistics to determine file similarity | |
CN104978521A (en) | Method and system for realizing malicious code marking | |
EP3800570A1 (en) | Methods and systems for genetic malware analysis and classification using code reuse patterns | |
Vadrevu et al. | Maxs: Scaling malware execution with sequential multi-hypothesis testing | |
Torres et al. | Malicious PDF documents detection using machine learning techniques | |
CN108229168B (en) | Heuristic detection method, system and storage medium for nested files | |
CN114218561A (en) | Weak password detection method, terminal equipment and storage medium | |
US20130179975A1 (en) | Method for Extracting Digital Fingerprints of a Malicious Document File | |
Shekhawat et al. | A review of malware classification methods using machine learning | |
CN102982278B (en) | A kind of methods, devices and systems of scanning document | |
JP6602799B2 (en) | Security monitoring server, security monitoring method, program | |
CN114448614A (en) | Weak password detection method, device, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: INTEZER LABS, LTD., ISRAEL Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TEVET, ITAI;HALEVI, ROY;ABRAHAMY, JONATHAN;AND OTHERS;REEL/FRAME:051900/0216 Effective date: 20191225 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: COMERICA BANK, MICHIGAN Free format text: SECURITY INTEREST;ASSIGNOR:INTEZER LABS, LTD.;REEL/FRAME:056833/0884 Effective date: 20210712 |