CN112084146A - Firmware homology detection method based on multi-dimensional features - Google Patents
Firmware homology detection method based on multi-dimensional features Download PDFInfo
- Publication number
- CN112084146A CN112084146A CN202010932458.6A CN202010932458A CN112084146A CN 112084146 A CN112084146 A CN 112084146A CN 202010932458 A CN202010932458 A CN 202010932458A CN 112084146 A CN112084146 A CN 112084146A
- Authority
- CN
- China
- Prior art keywords
- firmware
- hash
- file
- similarity
- homology
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000004364 calculation method Methods 0.000 claims abstract description 13
- 238000001914 filtration Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims 1
- 238000000605 extraction Methods 0.000 abstract 1
- 230000001939 inductive effect Effects 0.000 abstract 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/137—Hash-based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/10—Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/0876—Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Technology Law (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Power Engineering (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a firmware homology detection method based on multidimensional characteristics, which comprises firmware format identification, unpacking and file extraction; extracting multidimensional characteristics such as file character string characteristics and function control flow graph characteristics; calculating the file similarity under a single characteristic dimension by using fuzzy hash, constant matching, graph matching and other methods; and weighting and calculating the overall similarity of the firmware under the multi-dimensional characteristics according to the calculation result of the single dimension. The invention aims at the problems that under the conditions of confusion, encryption and the like of firmware, single characteristic error is large and homology is difficult to determine, and the accuracy of homology detection is improved by extracting, analyzing and inducing according to multidimensional characteristics.
Description
Technical Field
The invention relates to the field of information security, in particular to a multi-dimensional firmware homology detection method.
Background
With the advent of the internet of things era, terminal devices such as network cameras, wearable devices, activity trackers, intelligent automobiles, intelligent homes and the like of the internet of things devices are rapidly developed and widely applied. According to Gartner's report, the number of internet of things devices will exceed 200 billion in 2020. Meanwhile, security attack events aiming at the Internet of things equipment are continuously rising. The main attack mode is to utilize the equipment loophole to acquire the equipment control authority, further propagate large-scale malicious codes to control the network space, or utilize the loophole to steal user information data and hijack network flow to carry out other hacker underground industry transactions.
Because the functional realization of the internet of things equipment is mainly considered in the process of designing and developing, the safety consideration is neglected in the design, a vulnerability is introduced due to negligence in the process of developing, and the later safety check is lacked; meanwhile, due to the multiplexing of the components, a large amount of binary codes compiled by the same source code exist in the device firmware of different manufacturers, types and CPU architectures, and the binary codes potentially have the same vulnerability. It is the vulnerability mining technique of firmware homology detection that performs large-scale homology detection for this case. The prior art only starts from single angles of character string matching, function control flow diagrams and the like, has one-sidedness, and particularly has low detection precision for firmware adopting measures such as confusion and encryption. In this case, a firmware homology detection method based on multi-dimensional features is extracted.
Disclosure of Invention
The invention aims to overcome the defects that the existing homology detection method has single dimension, and has low accuracy and low precision when detecting the firmware adopting measures such as confusion, encryption and the like. The method uses multidimensional characteristics such as character strings and function control flow graphs to carry out firmware homology detection, improves the detection accuracy and has the capability of cross-platform similarity detection.
The specific technical scheme for realizing the purpose of the invention is as follows:
a firmware homology detection method based on multidimensional characteristics is characterized by comprising the following specific steps:
step S1: identifying the firmware format, unpacking and extracting an identifiable file;
step S2: generating a file hash feature for the identifiable file by using a hash algorithm; extracting character strings in the recognizable file, generating character string hash characteristics by using a hash algorithm, filtering the character strings, filtering out character strings related to a firmware operating system platform, a compiler, a kernel and the like, and generating character string constant characteristics; extracting binary files in the identifiable files and generating the characteristics of the function control flow graph;
step S3: performing hash similarity calculation on the file hash characteristics and the character string hash characteristics, and giving different weights to the file hash characteristics and the character string hash characteristics to generate hash similarity indexes; matching the character string constant characteristics to generate a constant matching similarity index; performing graph similarity calculation on the characteristics of the function control flow graph to generate a graph similarity index;
step S4: and giving different weights according to the Hash similarity index, the constant matching similarity index and the graph similarity index, and further calculating to obtain the firmware similarity among the to-be-detected firmware.
The recognizable files extracted in step S1 include third-party components such as busy, opennssl, and JavaScript in the firmware, dynamic link library files such as libsctp.
In step S2, the hash feature of the file is generated by using a hash algorithm for the recognizable file, where the hash algorithm includes a BKDRHash, an APHash, a JSHash, ssdeep, sdhash, or a CTPH hash algorithm.
Step S2, extracting the character strings in the recognizable file, in the following manner: string commands and third party open source tools.
And step S2, filtering to generate a character string constant, wherein the character string constant comprises a third-party version library, stack space character string information and symbol table, and a human readable character string with realistic meaning.
Step S2, extracting the binary file in the recognizable file and generating the feature of the function control flow graph, in the following manner: angr third party open source tool, IDA Pro, or other reverse tool.
Step S3, matching the string constants to generate a constant matching similarity index, including a Jaro-Winkler similarity algorithm or an edit distance algorithm.
And step S3, performing graph similarity calculation on the characteristics of the function control flow graph, wherein the graph similarity calculation includes K neighbor, VF2 or llmann algorithm.
The invention has the beneficial effects that:
the method can solve the problems that the accuracy of the single-dimensional feature is not high and the accuracy is low in the existing method under the condition that the firmware adopts protection measures such as confusion and encryption. Through multi-dimensional feature comparison, the homology comparison of the firmware can be rapidly and accurately realized, and meanwhile, the capability of cross-platform firmware homology detection is achieved.
Drawings
FIG. 1 is a flow chart of the present invention;
fig. 2 is a detailed flow chart of an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited.
As shown in fig. 1, the present invention comprises the steps of:
step S1: identifying the firmware format, unpacking and extracting an identifiable file;
step S2: generating a file hash feature for the identifiable file by using a hash algorithm; extracting character strings in the identifiable file, generating character string hash characteristics by using a hash algorithm, and filtering the character strings to generate character string constant characteristics; generating function control flow graph characteristics for binary files in the identifiable files;
step S3: performing hash similarity calculation on the file hash characteristics and the character string hash characteristics to generate a hash similarity index; matching the character string constant characteristics to generate a constant matching similarity index; performing graph similarity calculation on the characteristics of the function control flow graph to generate a graph similarity index;
step S4: giving different weights according to the Hash similarity index, the constant matching similarity index and the graph similarity index, and further calculating to obtain the firmware similarity; the overall similarity index is obtained by assigning different weights to the haar similarity index, the constant matching similarity index, and the graph similarity index in step S3 to calculate the firmware similarity. The larger the value, the higher the similarity of the firmware to be compared, and the smaller the value, the lower the similarity of the firmware to be compared.
Examples
Referring to fig. 2, the present embodiment is described in detail below:
step S1:
for the firmware 1 and the firmware 2 to be detected, using open source tools such as binwalk or BAP to identify the type of the firmware, and scanning the whole signature of the file to extract an identifiable file, wherein the extracted firmware file comprises but is not limited to third-party components such as busy, opennssl and JavaScript, dynamic link library files such as libsctp.
Step S2:
and directly generating a hash value for the extracted identifiable file by using a hash algorithm, wherein the hash value is the file hash characteristic of the file, and the extracted firmware files all generate a file hash characteristic. The hash algorithm used includes, but is not limited to, BKDRHash, APHash, JSHash, CTPH, ssdeep, sdhash, and other hash algorithms. And extracting the character strings of the recognizable file by using string commands or a third-party open source tool. On one hand, the extracted character strings are not filtered, and hash values are generated directly by using a hash algorithm and are used as the hash characteristics of the character strings of the file; on the other hand, the character strings are filtered, the character strings which influence the accuracy rate and are related to the SDK, the instruction set, the operating system, the kernel, the compiler and the like are filtered, and the filtered character string constants comprise a third-party version library, stack space character string information, a symbol table, human-readable character strings with practical significance and the like, so that the character string constant characteristics of the file are generated; and extracting the function control flow graph of the binary file by using a function control flow graph generation tool such as angr and IDA Pro for the binary file in the extracted identifiable file to generate the characteristics of the function control flow graph.
Step S3:
and calculating the file hash characteristics and the character string hash characteristics of the firmware to be compared to generate a hash similarity index, and calculating the hash similarity index in a manner of weighting. Constant matching similarity index calculations include, but are not limited to, the Jaro-Winkler similarity algorithm, edit distance, and the like. The graph similarity index calculation includes, but is not limited to, K neighbor, VF2, Ullmann, etc. algorithms. Wherein, the numerical range of the similarity index obtained by calculating each characteristic is 0-100, 0 represents no similarity, and 100 represents complete consistency.
Step S4:
the firmware similarity is obtained by assigning different weights to the hash similarity index, the constant matching similarity index, and the graph similarity index in step S3 to calculate the similarity between the firmware to be compared. The larger the value, the higher the similarity of the firmware to be compared, and the smaller the value, the lower the similarity of the two. The calculated values range from 0 to 100, 0 representing no similarity and 100 representing perfect agreement.
The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, and the scope of the appended claims is intended to be protected.
Claims (8)
1. A firmware homology detection method based on multidimensional characteristics is characterized by comprising the following specific steps:
step S1: identifying the firmware format, unpacking and extracting an identifiable file;
step S2: generating a file hash feature for the identifiable file by using a hash algorithm; extracting character strings in the recognizable file, generating character string hash characteristics by using a hash algorithm, filtering the character strings, filtering out character strings related to a firmware operating system platform, a compiler, a kernel and the like, and generating character string constant characteristics; extracting binary files in the identifiable files and generating the characteristics of the function control flow graph;
step S3: performing hash similarity calculation on the file hash characteristics and the character string hash characteristics, and giving different weights to the file hash characteristics and the character string hash characteristics to generate hash similarity indexes; matching the character string constant characteristics to generate a constant matching similarity index; performing graph similarity calculation on the characteristics of the function control flow graph to generate a graph similarity index;
step S4: and giving different weights according to the Hash similarity index, the constant matching similarity index and the graph similarity index, and further calculating to obtain the firmware similarity among the to-be-detected firmware.
2. The firmware homology detecting method according to claim 1, wherein the step S1 extracts identifiable files, which include third-party components such as busy box, opennssl and JavaScript, dynamic link library files such as libsctp.
3. The firmware homology detecting method according to claim 1, wherein the step S2 is to generate the file hash feature by using a hash algorithm on the recognizable file, wherein the hash algorithm includes BKDRHash, APHash, JSHash, ssdeep, sdhash, or CTPH hash algorithm.
4. The firmware homology detecting method according to claim 1, wherein the step S2 is performed by extracting the character strings in the recognizable file by: string commands and third party open source tools.
5. The firmware homology detecting method according to claim 1, wherein the filtering of step S2 generates string constants, the string constants including third party version library, stack space string information and symbol table, human readable realistic character string.
6. The firmware homology detecting method according to claim 1, wherein the step S2 is to extract the binary file in the recognizable file and generate the feature of the function control flow graph by: angr third party open source tool, IDA Pro, or other reverse tool.
7. The firmware homology detecting method according to claim 1, wherein the matching of the string constants in step S3 is performed to generate a constant matching similarity index, which comprises a Jaro-Winkler similarity algorithm or an edit distance algorithm.
8. The firmware homology detection method according to claim 1, wherein the step S3 is to perform graph similarity calculation on the characteristics of the function control flow graph, and the graph similarity calculation includes K-nearest neighbor, VF2 or llmann algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010932458.6A CN112084146A (en) | 2020-09-08 | 2020-09-08 | Firmware homology detection method based on multi-dimensional features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010932458.6A CN112084146A (en) | 2020-09-08 | 2020-09-08 | Firmware homology detection method based on multi-dimensional features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112084146A true CN112084146A (en) | 2020-12-15 |
Family
ID=73732151
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010932458.6A Pending CN112084146A (en) | 2020-09-08 | 2020-09-08 | Firmware homology detection method based on multi-dimensional features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112084146A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704180A (en) * | 2021-07-10 | 2021-11-26 | 国网浙江省电力有限公司信息通信分公司 | Lossless firmware extraction method based on embedded equipment firmware file information feature library |
CN114489787A (en) * | 2022-04-06 | 2022-05-13 | 奇安信科技集团股份有限公司 | Software component analysis method, device, electronic equipment and storage medium |
CN116578979A (en) * | 2023-05-15 | 2023-08-11 | 软安科技有限公司 | Cross-platform binary code matching method and system based on code features |
CN116578979B (en) * | 2023-05-15 | 2024-05-31 | 软安科技有限公司 | Cross-platform binary code matching method and system based on code features |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868108A (en) * | 2016-03-28 | 2016-08-17 | 中国科学院信息工程研究所 | Instruction-set-irrelevant binary code similarity detection method based on neural network |
CN109063055A (en) * | 2018-07-19 | 2018-12-21 | 中国科学院信息工程研究所 | Homologous binary file search method and device |
CN109460386A (en) * | 2018-10-29 | 2019-03-12 | 杭州安恒信息技术股份有限公司 | The matched malicious file homology analysis method and device of Hash is obscured based on various dimensions |
CN110362966A (en) * | 2019-07-11 | 2019-10-22 | 华东师范大学 | A kind of cross-platform firmware homology safety detection method based on fuzzy Hash |
CN110414238A (en) * | 2019-06-18 | 2019-11-05 | 中国科学院信息工程研究所 | The search method and device of homologous binary code |
CN111104674A (en) * | 2019-11-06 | 2020-05-05 | 中国电力科学研究院有限公司 | Power firmware homologous binary file association method and system |
CN111310178A (en) * | 2020-01-20 | 2020-06-19 | 武汉理工大学 | Firmware vulnerability detection method and system under cross-platform scene |
-
2020
- 2020-09-08 CN CN202010932458.6A patent/CN112084146A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105868108A (en) * | 2016-03-28 | 2016-08-17 | 中国科学院信息工程研究所 | Instruction-set-irrelevant binary code similarity detection method based on neural network |
CN109063055A (en) * | 2018-07-19 | 2018-12-21 | 中国科学院信息工程研究所 | Homologous binary file search method and device |
CN109460386A (en) * | 2018-10-29 | 2019-03-12 | 杭州安恒信息技术股份有限公司 | The matched malicious file homology analysis method and device of Hash is obscured based on various dimensions |
CN110414238A (en) * | 2019-06-18 | 2019-11-05 | 中国科学院信息工程研究所 | The search method and device of homologous binary code |
CN110362966A (en) * | 2019-07-11 | 2019-10-22 | 华东师范大学 | A kind of cross-platform firmware homology safety detection method based on fuzzy Hash |
CN111104674A (en) * | 2019-11-06 | 2020-05-05 | 中国电力科学研究院有限公司 | Power firmware homologous binary file association method and system |
CN111310178A (en) * | 2020-01-20 | 2020-06-19 | 武汉理工大学 | Firmware vulnerability detection method and system under cross-platform scene |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704180A (en) * | 2021-07-10 | 2021-11-26 | 国网浙江省电力有限公司信息通信分公司 | Lossless firmware extraction method based on embedded equipment firmware file information feature library |
CN113704180B (en) * | 2021-07-10 | 2024-03-15 | 国网浙江省电力有限公司信息通信分公司 | Lossless firmware extraction method based on embedded device firmware file information feature library |
CN114489787A (en) * | 2022-04-06 | 2022-05-13 | 奇安信科技集团股份有限公司 | Software component analysis method, device, electronic equipment and storage medium |
CN116578979A (en) * | 2023-05-15 | 2023-08-11 | 软安科技有限公司 | Cross-platform binary code matching method and system based on code features |
CN116578979B (en) * | 2023-05-15 | 2024-05-31 | 软安科技有限公司 | Cross-platform binary code matching method and system based on code features |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111400719B (en) | Firmware vulnerability distinguishing method and system based on open source component version identification | |
Euh et al. | Comparative analysis of low-dimensional features and tree-based ensembles for malware detection systems | |
Jeon et al. | Hybrid malware detection based on bi-lstm and spp-net for smart iot | |
D’Angelo et al. | Association rule-based malware classification using common subsequences of API calls | |
CN107239678B (en) | Android application repacking detection method based on Java file directory structure | |
Zhu et al. | Android malware detection based on multi-head squeeze-and-excitation residual network | |
CN110034921B (en) | Webshell detection method based on weighted fuzzy hash | |
CN111552969A (en) | Embedded terminal software code vulnerability detection method and device based on neural network | |
EP2609506A1 (en) | Mining source code for violations of programming rules | |
CN112084146A (en) | Firmware homology detection method based on multi-dimensional features | |
CN111382438B (en) | Malware detection method based on multi-scale convolutional neural network | |
Liu et al. | Vfdetect: A vulnerable code clone detection system based on vulnerability fingerprint | |
CN116366377B (en) | Malicious file detection method, device, equipment and storage medium | |
WO2021167483A1 (en) | Method and system for detecting malicious files in a non-isolated environment | |
CN105046152A (en) | Function call graph fingerprint based malicious software detection method | |
Ugarte-Pedrero et al. | Structural feature based anomaly detection for packed executable identification | |
CN105809034A (en) | Malicious software identification method | |
Khan et al. | Determining malicious executable distinguishing attributes and low-complexity detection | |
CN112817877B (en) | Abnormal script detection method and device, computer equipment and storage medium | |
CN108171057B (en) | Android platform malicious software detection method based on feature matching | |
CN108573148B (en) | Confusion encryption script identification method based on lexical analysis | |
CN109241706B (en) | Software plagiarism detection method based on static birthmarks | |
CN111104674A (en) | Power firmware homologous binary file association method and system | |
CN112163217B (en) | Malware variant identification method, device, equipment and computer storage medium | |
CN109446809B (en) | Malicious program identification method and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |