CN110399729A - A kind of binary software analysis1 method based on module diagnostic weight - Google Patents
A kind of binary software analysis1 method based on module diagnostic weight Download PDFInfo
- Publication number
- CN110399729A CN110399729A CN201910669789.2A CN201910669789A CN110399729A CN 110399729 A CN110399729 A CN 110399729A CN 201910669789 A CN201910669789 A CN 201910669789A CN 110399729 A CN110399729 A CN 110399729A
- Authority
- CN
- China
- Prior art keywords
- feature
- component
- value
- weight
- characteristic value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses a kind of binary software analysis1 methods based on module diagnostic weight, binary software component is described by introducing various features, and different weights is assigned to the influence degree of component according to it to different features, solve the problems, such as to cover due to module diagnostic comprehensively caused by fail to report existing for binary software analysis1 and erroneous judgement problem, realize it is expansible, have a wide range of application, high-efficient component fingerprint extraction and determination method based on feature weight.
Description
Technical field
The invention belongs to software stationary detection technique fields, and in particular to a kind of binary system based on module diagnostic weight is soft
Part analysis method.
Background technique
Component (Component) be have in software systems relatively independent function, interface by contract is specified and context have it is bright
Aobvious dependence can be disposed independently, assemblnig software entity, be the simplified package to data and method.It is held for known
Row binary code needs quickly to determine the loophole for the component and component liaison that it is used, thus clear binary code
Security risk.For known there are the component of loophole, need quickly to determine the whole binary codes for using the component, thus
Solve the coverage of the component.For the loophole known, the component and binary code that clearly the loophole is influenced are needed, from
And confirm degree of risk caused by loophole.
In the prior art, Universal Extractor is one can extract text from any kind of history file
The program of part, either simple ZIP file, installation procedure (such as Wise or NSIS) or Windows setup (.msi)
Packet.Universal Extractor allows user's extraction document from almost any type of archive, without considering its source
With compression method etc..It can provide one simply and easily approach with from installation kit (such as Inno Setup or Windows
Installer packet) in extraction document, without every time use order line.AppCheck is one for checking equipment comprehensively
The analysis platform of software sharing and risk status, with the safety for helping developer and equipment user to develop skill.On but
Stating method all is according to constant text string extracting characteristic fingerprint, and such method is although high-efficient, but exists and fail to report, judge by accident, is special
Sign covers incomplete problem.
Summary of the invention
In view of this, the present invention provides a kind of binary software analysis1 method based on module diagnostic weight, by for
Different features assigns different weights, solves the problems, such as that component recognition is failed to report, judged by accident, by increasing characteristic item, solves feature
Cover incomplete problem, thus realize it is expansible, have a wide range of application, the high-efficient module diagnostic based on feature weight mentions
It takes and determination method.
A kind of binary software analysis1 method based on module diagnostic weight provided by the invention, extracts Binary Element
The feature of multiple types, and weight is assigned according to influence degree of each feature to component, construct module diagnostic library;
The feature for extracting the multiple type in binary software to be analyzed, by the spy of the binary software extracted
Sign is matched respectively with the feature of the same type of various components in the module diagnostic library, if matching result is greater than threshold
Value, it is determined that the component matches with the binary software.
Further, the feature of multiple types of the Binary Element includes dynamic symbol table, head information and constant word
Symbol string.
Further, the detailed process in the building module diagnostic library are as follows:
Step 3.1 extracts the component i's by corresponding dis-assembling method according to the file type of Binary Element i
Feature, this feature characteristic value array representation, the quantity of the characteristic value in each characteristic value array of the component i are { n1,n2,
n3, wherein n1For the quantity of element contained by dynamic symbol table feature, n2For the quantity of element contained by head information characteristics, n3It is normal
Measure the quantity of element contained by character string feature;
Step 3.2, each characteristic value for component i, the characteristic value of each all features of component in traverse component feature database
Array is found there are the component of same characteristic features value, forms interim matching result, then computation module i each feature characteristic value
The intersection of the characteristic value array of each component individual features, records in all intersections of every category feature in array and interim matching result
Maximum value { the m of contained element number1,m2,m3, wherein m1The maximum of element number contained by intersection for dynamic symbol table feature
Value, m2The maximum value of element number contained by intersection for head information characteristics, m3Element contained by intersection for constant character string feature
Several maximum values;
The weight of each feature of step 3.3, according to the following formula computation module i:
a1=1-m1/n1
a2=1-m2/n2
a3=1-m3/n3
Wherein, a1For the weight of the dynamic symbol table feature of component i, a2For the weight of the head information characteristics of component i, a3For
The weight of the constant character string feature of component i;
Weight corresponding to step 3.4, feature and feature by component i is stored in module diagnostic library;
Step 3.5, the next component of selection, execute step 3.2, until the last one component is finished.
Further, the phase of the feature of the binary software that will be extracted and various components in the module diagnostic library
The feature of same type carries out matched detailed process respectively, includes the following steps:
Step 4.1, the feature that binary software is extracted by the method for dis-assembling, method are identical as step 3.1;
Step 4.2, each characteristic value for binary software are looked into the module diagnostic library that step 3.4 constructs
The component with same characteristic features value is looked for, interim matching result the component list is formed;
The characteristic value array and component phase each in interim matching result of step 4.3, each feature of calculating binary software
The intersection for answering the characteristic value array of feature, the quantity of the element contained by the intersection illustrate if being more than the threshold value set this feature as
With feature, and the characteristic value quantity that will match to normalize after with sum again after the multiplied by weight of the module diagnostic, be somebody's turn to do
The matching factor of component thinks that the component is matching component if matching factor is greater than threshold value, then the component is exported.
Further, the element in the characteristic value array is characterized the hash value of value.
Further, the realization of reverse indexing library can be used in the matching process, and the reverse indexing library is a set, should
Set element is the reverse indexing of each feature, and in each reverse indexing, index key value is characterized the hash value of value, value value
The character string dimension constituted for the component name comprising this feature.
The utility model has the advantages that
The present invention is described binary software component by introducing various features, and right according to its to different features
The influence degree of component assigns different weights, solve due to module diagnostic covering comprehensively caused by binary software point
Problem and erroneous judgement problem are failed to report existing for analysis, realize it is expansible, have a wide range of application, the high-efficient component based on feature weight
Fingerprint extraction and determination method.
Detailed description of the invention
Fig. 1 is that the module diagnostic library of the binary software analysis1 method provided by the invention based on module diagnostic weight constructs
Flow chart.
Fig. 2 is the homology determination flow of the binary software analysis1 method provided by the invention based on module diagnostic weight
Figure.
Specific embodiment
The present invention will now be described in detail with reference to the accompanying drawings and examples.
The present invention provides a kind of binary software analysis1 methods based on module diagnostic weight, and basic thought is: first
The feature for first extracting multiple types of Binary Element calculates characteristic value and forms module diagnostic array, and according to feature weight meter
The weight of calculation method assignment component feature, so that module diagnostic library is constructed, it is soft to binary system further according to the module diagnostic library of generation
Part is analyzed, with the matched Binary Element of determination.
A kind of binary software analysis1 method based on module diagnostic weight provided by the invention, including two aspects, one
It is building module diagnostic library, second is that analyzing binary software according to the module diagnostic library of generation.
First it is the building in module diagnostic library, as shown in Figure 1, including the following steps:
Step 1.1 judges that the file type of Binary Element i extracts the spy of Binary Element by the method for dis-assembling
Sign.Since component cannot be uniquely determined completely using single features, therefore the present invention is logical for different Binary Element files
The uniqueness for increasing characteristic item to guarantee component is crossed, the coverage rate of feature is improved.The module diagnostic extracted in the present invention includes dynamic
State symbol table, head information and constant character string can also increase more characteristic items according to the requirement of actual analysis, and component i is each
Characteristic value quantity in the feature array of feature is { n1,n2,n3, wherein n1For the number of element contained by dynamic symbol table feature
Amount, n2For the quantity of element contained by head information characteristics, n3For the quantity of element contained by constant character string feature.Wherein, for difference
The component needs of file type are handled differently, specific as follows:
A, for the Binary Element of PE32 format .rdata sections of extractions are read using the pefile module of python language
Constant character string reads the function name in dynamic symbol table extraction assembly;
B, for the Binary Element of Linux format .rodata is extracted using the readelf order in Linux,
.data .dynstr sections of extraction assembly features;
C, it for Jar packet, needs first to decompress jar packet, then using the javap order of Java language to class
File carries out decompiling, extracts individual features.
Step 1.2, module diagnostic library initialization, the characteristic set of the component i generated in input step 1.1, traverse component
The characteristic value array of each all features of component in feature database, reverse indexing feature database also can be used here and scan for, and point
Classification and matching is not carried out with the characteristic value array of each feature of component i, is found there are the component of same characteristic features value, is formed interim
Matching result, then each component individual features in the characteristic value array of each feature of computation module i and interim matching result again
Characteristic value array intersection, and record respectively acquisition every category feature each intersection in contained element maximum value, i.e. { m1,
m2,m3, wherein m1The maximum value of element contained by intersection for dynamic symbol table feature, m2Contained by intersection for head information characteristics
The maximum value of element, m3The maximum value of element contained by intersection for constant character string feature.
In the present invention, the structure in module diagnostic library is the set of a component, wherein each component is the collection of multiple features
It closes, each feature is the array of a characteristic value.For example, component 1 includes dynamic symbol table, head information and constant character string three
Feature, wherein constant character feature is a characteristic value array, wherein including the characteristic value of multiple constant character strings, similarly, is moved
State symbol table and head information characteristics are also respectively to contain the array of multiple dynamic symbols and head information characteristics value.
Here, in order to save storage, the array element of feature is characterized the hash value of value.It is fast in order to improve inquiry simultaneously
Degree establishes reverse indexing library for match query.The structure in reverse indexing library is a set, and set element is each feature
Reverse indexing, in each reverse indexing, index key value is characterized the hash value of value, and value is the component name structure comprising this feature
At character string dimension.
The weight of each feature of step 1.3, according to the following formula computation module i:
Wherein, a1For the weight of the dynamic symbol table feature of component i, a2For the weight of the head information characteristics of component i, a3For
The weight of the constant character string feature of component i;
Weight corresponding to step 1.4, feature and feature by component i is stored in module diagnostic library, here, when not finding
When the feature to match with component i, can also rule of thumb in artificial assignment component i each feature weight.
Step 1.5, the next component of selection, execute step 1.2, until the last one component is finished, exit journey
Sequence.
Binary software is finally analyzed according to the module diagnostic library of generation, as shown in Fig. 2, including the following steps:
Step 2.1, the feature that binary software is extracted by the method for dis-assembling, specific method are identical as step 1.1;
The feature of step 2.2, the binary software extracted according to step 2.1, according in the characteristic value array of feature
Characteristic value searches the component with same characteristic features value in the module diagnostic library that step 1.4 constructs, and forms interim the component list, this
In the reverse indexing feature database that can use in step 1.2 scan for;
Step 2.3, the interim the component list of traversal, it is and to be checked by the characteristic value array of each feature of wherein each component
The characteristic value of the same characteristic features of the binary software of survey is mutually matched, and is found there are the component of same characteristic features value, is formed interim
With as a result, then calculating each component individual features in the characteristic value array and interim matching result of each feature of binary software
Characteristic value array intersection, the quantity of the element contained by the intersection illustrates that this feature is to match spy if being more than threshold value set
Sign, and after being normalized according to the characteristic value quantity that formula (1) will match to sum after the multiplied by weight of the module diagnostic, obtain
To the matching factor of the component, think that the component is matching component if matching factor is greater than threshold value, then outputs it.Pass through
Obtained binary software institute matching component as a result, simultaneously vulnerability information present in coupling unit, can analyze obtain two into
Vulnerability information present in software processed.
In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention.
All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention
Within protection scope.
Claims (6)
1. a kind of binary software analysis1 method based on module diagnostic weight, which is characterized in that extract the more of Binary Element
The feature of a type, and weight is assigned according to influence degree of each feature to component, construct module diagnostic library;
The feature for extracting the multiple type in binary software to be analyzed, by the feature of the binary software extracted with
The feature of the same type of various components is matched respectively in the module diagnostic library, if matching result is greater than threshold value,
Determine that the component matches with the binary software.
2. the method according to claim 1, wherein the feature of multiple types of the Binary Element includes dynamic
State symbol table, head information and constant character string.
3. according to the method described in claim 2, it is characterized in that, the detailed process in the building module diagnostic library are as follows:
Step 3.1, the feature for extracting the component i by corresponding dis-assembling method according to the file type of Binary Element i,
This feature characteristic value array representation, the quantity of the characteristic value in each characteristic value array of the component i are { n1,n2,n3,
In, n1For the quantity of element contained by dynamic symbol table feature, n2For the quantity of element contained by head information characteristics, n3For constant character
The quantity of element contained by string feature;
Step 3.2, each characteristic value for component i, the characteristic value array of each all features of component in traverse component feature database,
Find there are the component of same characteristic features value, form interim matching result, then computation module i each feature characteristic value array with
The intersection of the characteristic value array of each component individual features, records contained member in all intersections of every category feature in interim matching result
Maximum value { the m of plain number1,m2,m3, wherein m1The maximum value of element number contained by intersection for dynamic symbol table feature, m2For
The maximum value of element number contained by the intersection of head information characteristics, m3Element number contained by intersection for constant character string feature is most
Big value;
The weight of each feature of step 3.3, according to the following formula computation module i:
a1=1-m1/n1
a2=1-m2/n2
a3=1-m3/n3
Wherein, a1For the weight of the dynamic symbol table feature of component i, a2For the weight of the head information characteristics of component i, a3For component i
Constant character string feature weight;
Weight corresponding to step 3.4, feature and feature by component i is stored in module diagnostic library;
Step 3.5, the next component of selection, execute step 3.2, until the last one component is finished.
4. according to the method described in claim 3, it is characterized in that, the feature of the binary software that will be extracted with it is described
The feature of the same type of various components carries out matched detailed process respectively in module diagnostic library, includes the following steps:
Step 4.1, the feature that binary software is extracted by the method for dis-assembling, method are identical as step 3.1;
Step 4.2, each characteristic value for binary software search tool in the module diagnostic library that step 3.4 constructs
There is the component of same characteristic features value, forms interim matching result the component list;
Step 4.3, the characteristic value array of each feature of calculating binary software and component each in interim matching result are corresponding special
The intersection of the characteristic value array of sign, the quantity of the element contained by the intersection illustrate that this feature is to match spy if being more than threshold value set
Sign, and the characteristic value quantity that will match to normalize after with sum again after the multiplied by weight of the module diagnostic, obtain the component
Matching factor, think that the component is matching component if matching factor is greater than threshold value, then the component exported.
5. the method according to claim 3 or 4, which is characterized in that the element in the characteristic value array is characterized value
Hash value.
6. according to the method described in claim 5, it is characterized in that, the realization of reverse indexing library, institute can be used in the matching process
Stating reverse indexing library is a set, which is the reverse indexing of each feature, in each reverse indexing, indexes key
Value is characterized the hash value of value, and value value is the character string dimension that the component name comprising this feature is constituted.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910286878 | 2019-04-11 | ||
CN2019102868789 | 2019-04-11 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110399729A true CN110399729A (en) | 2019-11-01 |
CN110399729B CN110399729B (en) | 2021-04-27 |
Family
ID=68325877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910669789.2A Active CN110399729B (en) | 2019-04-11 | 2019-07-24 | Binary software analysis method based on component characteristic weight |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110399729B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046388A (en) * | 2019-12-16 | 2020-04-21 | 北京智游网安科技有限公司 | Method for identifying third-party SDK in application, intelligent terminal and storage medium |
CN116954701A (en) * | 2023-08-09 | 2023-10-27 | 软安科技有限公司 | Binary detection method and system based on blood-edge relation |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779257A (en) * | 2012-06-28 | 2012-11-14 | 奇智软件(北京)有限公司 | Security detection method and system of Android application program |
CN103226583A (en) * | 2013-04-08 | 2013-07-31 | 北京奇虎科技有限公司 | Method and device for recognizing advertisement plugin |
CN104517053A (en) * | 2013-09-29 | 2015-04-15 | 北京金山网络科技有限公司 | Software recognition method and device |
US20160203330A1 (en) * | 2012-06-19 | 2016-07-14 | Deja Vu Security, Llc | Code repository intrusion detection |
CN106650450A (en) * | 2016-12-29 | 2017-05-10 | 哈尔滨安天科技股份有限公司 | Malicious script heuristic detection method and system based on code fingerprint identification |
CN107704501A (en) * | 2017-08-28 | 2018-02-16 | 中国科学院信息工程研究所 | A kind of method and system for identifying homologous binary file |
CN107844705A (en) * | 2017-11-14 | 2018-03-27 | 苏州棱镜七彩信息科技有限公司 | Third party's component leak detection method based on binary code feature |
CN108763928A (en) * | 2018-05-03 | 2018-11-06 | 北京邮电大学 | A kind of open source software leak analysis method, apparatus and storage medium |
CN109062792A (en) * | 2018-07-21 | 2018-12-21 | 东南大学 | A kind of Open Source Code detection method based on String matching and characteristic matching |
CN109543408A (en) * | 2018-10-29 | 2019-03-29 | 卓望数码技术(深圳)有限公司 | A kind of Malware recognition methods and system |
-
2019
- 2019-07-24 CN CN201910669789.2A patent/CN110399729B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160203330A1 (en) * | 2012-06-19 | 2016-07-14 | Deja Vu Security, Llc | Code repository intrusion detection |
CN102779257A (en) * | 2012-06-28 | 2012-11-14 | 奇智软件(北京)有限公司 | Security detection method and system of Android application program |
CN103226583A (en) * | 2013-04-08 | 2013-07-31 | 北京奇虎科技有限公司 | Method and device for recognizing advertisement plugin |
CN104517053A (en) * | 2013-09-29 | 2015-04-15 | 北京金山网络科技有限公司 | Software recognition method and device |
CN106650450A (en) * | 2016-12-29 | 2017-05-10 | 哈尔滨安天科技股份有限公司 | Malicious script heuristic detection method and system based on code fingerprint identification |
CN107704501A (en) * | 2017-08-28 | 2018-02-16 | 中国科学院信息工程研究所 | A kind of method and system for identifying homologous binary file |
CN107844705A (en) * | 2017-11-14 | 2018-03-27 | 苏州棱镜七彩信息科技有限公司 | Third party's component leak detection method based on binary code feature |
CN108763928A (en) * | 2018-05-03 | 2018-11-06 | 北京邮电大学 | A kind of open source software leak analysis method, apparatus and storage medium |
CN109062792A (en) * | 2018-07-21 | 2018-12-21 | 东南大学 | A kind of Open Source Code detection method based on String matching and characteristic matching |
CN109543408A (en) * | 2018-10-29 | 2019-03-29 | 卓望数码技术(深圳)有限公司 | A kind of Malware recognition methods and system |
Non-Patent Citations (1)
Title |
---|
左玲: "《基于Android恶意软件检测系统的设计与实现》", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046388A (en) * | 2019-12-16 | 2020-04-21 | 北京智游网安科技有限公司 | Method for identifying third-party SDK in application, intelligent terminal and storage medium |
CN116954701A (en) * | 2023-08-09 | 2023-10-27 | 软安科技有限公司 | Binary detection method and system based on blood-edge relation |
CN116954701B (en) * | 2023-08-09 | 2024-05-14 | 软安科技有限公司 | Binary component detection method and system based on blood relationship |
Also Published As
Publication number | Publication date |
---|---|
CN110399729B (en) | 2021-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240028571A1 (en) | Automatic entity resolution with rules detection and generation system | |
KR100902966B1 (en) | Method and system for mapping strings for comparison | |
US7606784B2 (en) | Uncertainty management in a decision-making system | |
CN102171702B (en) | The detection of confidential information | |
CN110427755A (en) | A kind of method and device identifying script file | |
CN111291070A (en) | Abnormal SQL detection method, equipment and medium | |
CN108268886B (en) | Method and system for identifying plug-in operation | |
CN113297580B (en) | Code semantic analysis-based electric power information system safety protection method and device | |
CN110399729A (en) | A kind of binary software analysis1 method based on module diagnostic weight | |
CN112115326B (en) | Multi-label classification and vulnerability detection method for Etheng intelligent contracts | |
KR102091633B1 (en) | Searching Method for Related Law | |
Shivaji et al. | Plagiarism detection by using karp-rabin and string matching algorithm together | |
CN115658080A (en) | Method and system for identifying open source code components of software | |
CN116340185A (en) | Method, device and equipment for analyzing software open source code components | |
CN113312258A (en) | Interface testing method, device, equipment and storage medium | |
CN109472145A (en) | A kind of code reuse recognition methods and system based on graph theory | |
CN116975881A (en) | LLVM (LLVM) -based vulnerability fine-granularity positioning method | |
Periyasamy et al. | Prediction of future vulnerability discovery in software applications using vulnerability syntax tree (PFVD-VST). | |
CN111859896B (en) | Formula document detection method and device, computer readable medium and electronic equipment | |
Meng et al. | Detecting buffer boundary violations based on SVM | |
CN107463845A (en) | A kind of detection method, system and the computer-processing equipment of SQL injection attack | |
KR20210142443A (en) | Method and system for providing continuous adaptive learning over time for real time attack detection in cyberspace | |
EP2793145A2 (en) | Computer device for minimizing computer resources for database accesses | |
CN109408713A (en) | A kind of software requirement searching system based on field feedback | |
CN111382267B (en) | Question classification method, question classification device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |