CN110399729A - A kind of binary software analysis1 method based on module diagnostic weight - Google Patents

A kind of binary software analysis1 method based on module diagnostic weight Download PDF

Info

Publication number
CN110399729A
CN110399729A CN201910669789.2A CN201910669789A CN110399729A CN 110399729 A CN110399729 A CN 110399729A CN 201910669789 A CN201910669789 A CN 201910669789A CN 110399729 A CN110399729 A CN 110399729A
Authority
CN
China
Prior art keywords
feature
component
value
weight
characteristic value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910669789.2A
Other languages
Chinese (zh)
Other versions
CN110399729B (en
Inventor
于渤
付海涛
高卫栋
何清林
刘中金
何跃鹰
袁开国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Publication of CN110399729A publication Critical patent/CN110399729A/en
Application granted granted Critical
Publication of CN110399729B publication Critical patent/CN110399729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a kind of binary software analysis1 methods based on module diagnostic weight, binary software component is described by introducing various features, and different weights is assigned to the influence degree of component according to it to different features, solve the problems, such as to cover due to module diagnostic comprehensively caused by fail to report existing for binary software analysis1 and erroneous judgement problem, realize it is expansible, have a wide range of application, high-efficient component fingerprint extraction and determination method based on feature weight.

Description

A kind of binary software analysis1 method based on module diagnostic weight
Technical field
The invention belongs to software stationary detection technique fields, and in particular to a kind of binary system based on module diagnostic weight is soft Part analysis method.
Background technique
Component (Component) be have in software systems relatively independent function, interface by contract is specified and context have it is bright Aobvious dependence can be disposed independently, assemblnig software entity, be the simplified package to data and method.It is held for known Row binary code needs quickly to determine the loophole for the component and component liaison that it is used, thus clear binary code Security risk.For known there are the component of loophole, need quickly to determine the whole binary codes for using the component, thus Solve the coverage of the component.For the loophole known, the component and binary code that clearly the loophole is influenced are needed, from And confirm degree of risk caused by loophole.
In the prior art, Universal Extractor is one can extract text from any kind of history file The program of part, either simple ZIP file, installation procedure (such as Wise or NSIS) or Windows setup (.msi) Packet.Universal Extractor allows user's extraction document from almost any type of archive, without considering its source With compression method etc..It can provide one simply and easily approach with from installation kit (such as Inno Setup or Windows Installer packet) in extraction document, without every time use order line.AppCheck is one for checking equipment comprehensively The analysis platform of software sharing and risk status, with the safety for helping developer and equipment user to develop skill.On but Stating method all is according to constant text string extracting characteristic fingerprint, and such method is although high-efficient, but exists and fail to report, judge by accident, is special Sign covers incomplete problem.
Summary of the invention
In view of this, the present invention provides a kind of binary software analysis1 method based on module diagnostic weight, by for Different features assigns different weights, solves the problems, such as that component recognition is failed to report, judged by accident, by increasing characteristic item, solves feature Cover incomplete problem, thus realize it is expansible, have a wide range of application, the high-efficient module diagnostic based on feature weight mentions It takes and determination method.
A kind of binary software analysis1 method based on module diagnostic weight provided by the invention, extracts Binary Element The feature of multiple types, and weight is assigned according to influence degree of each feature to component, construct module diagnostic library;
The feature for extracting the multiple type in binary software to be analyzed, by the spy of the binary software extracted Sign is matched respectively with the feature of the same type of various components in the module diagnostic library, if matching result is greater than threshold Value, it is determined that the component matches with the binary software.
Further, the feature of multiple types of the Binary Element includes dynamic symbol table, head information and constant word Symbol string.
Further, the detailed process in the building module diagnostic library are as follows:
Step 3.1 extracts the component i's by corresponding dis-assembling method according to the file type of Binary Element i Feature, this feature characteristic value array representation, the quantity of the characteristic value in each characteristic value array of the component i are { n1,n2, n3, wherein n1For the quantity of element contained by dynamic symbol table feature, n2For the quantity of element contained by head information characteristics, n3It is normal Measure the quantity of element contained by character string feature;
Step 3.2, each characteristic value for component i, the characteristic value of each all features of component in traverse component feature database Array is found there are the component of same characteristic features value, forms interim matching result, then computation module i each feature characteristic value The intersection of the characteristic value array of each component individual features, records in all intersections of every category feature in array and interim matching result Maximum value { the m of contained element number1,m2,m3, wherein m1The maximum of element number contained by intersection for dynamic symbol table feature Value, m2The maximum value of element number contained by intersection for head information characteristics, m3Element contained by intersection for constant character string feature Several maximum values;
The weight of each feature of step 3.3, according to the following formula computation module i:
a1=1-m1/n1
a2=1-m2/n2
a3=1-m3/n3
Wherein, a1For the weight of the dynamic symbol table feature of component i, a2For the weight of the head information characteristics of component i, a3For The weight of the constant character string feature of component i;
Weight corresponding to step 3.4, feature and feature by component i is stored in module diagnostic library;
Step 3.5, the next component of selection, execute step 3.2, until the last one component is finished.
Further, the phase of the feature of the binary software that will be extracted and various components in the module diagnostic library The feature of same type carries out matched detailed process respectively, includes the following steps:
Step 4.1, the feature that binary software is extracted by the method for dis-assembling, method are identical as step 3.1;
Step 4.2, each characteristic value for binary software are looked into the module diagnostic library that step 3.4 constructs The component with same characteristic features value is looked for, interim matching result the component list is formed;
The characteristic value array and component phase each in interim matching result of step 4.3, each feature of calculating binary software The intersection for answering the characteristic value array of feature, the quantity of the element contained by the intersection illustrate if being more than the threshold value set this feature as With feature, and the characteristic value quantity that will match to normalize after with sum again after the multiplied by weight of the module diagnostic, be somebody's turn to do The matching factor of component thinks that the component is matching component if matching factor is greater than threshold value, then the component is exported.
Further, the element in the characteristic value array is characterized the hash value of value.
Further, the realization of reverse indexing library can be used in the matching process, and the reverse indexing library is a set, should Set element is the reverse indexing of each feature, and in each reverse indexing, index key value is characterized the hash value of value, value value The character string dimension constituted for the component name comprising this feature.
The utility model has the advantages that
The present invention is described binary software component by introducing various features, and right according to its to different features The influence degree of component assigns different weights, solve due to module diagnostic covering comprehensively caused by binary software point Problem and erroneous judgement problem are failed to report existing for analysis, realize it is expansible, have a wide range of application, the high-efficient component based on feature weight Fingerprint extraction and determination method.
Detailed description of the invention
Fig. 1 is that the module diagnostic library of the binary software analysis1 method provided by the invention based on module diagnostic weight constructs Flow chart.
Fig. 2 is the homology determination flow of the binary software analysis1 method provided by the invention based on module diagnostic weight Figure.
Specific embodiment
The present invention will now be described in detail with reference to the accompanying drawings and examples.
The present invention provides a kind of binary software analysis1 methods based on module diagnostic weight, and basic thought is: first The feature for first extracting multiple types of Binary Element calculates characteristic value and forms module diagnostic array, and according to feature weight meter The weight of calculation method assignment component feature, so that module diagnostic library is constructed, it is soft to binary system further according to the module diagnostic library of generation Part is analyzed, with the matched Binary Element of determination.
A kind of binary software analysis1 method based on module diagnostic weight provided by the invention, including two aspects, one It is building module diagnostic library, second is that analyzing binary software according to the module diagnostic library of generation.
First it is the building in module diagnostic library, as shown in Figure 1, including the following steps:
Step 1.1 judges that the file type of Binary Element i extracts the spy of Binary Element by the method for dis-assembling Sign.Since component cannot be uniquely determined completely using single features, therefore the present invention is logical for different Binary Element files The uniqueness for increasing characteristic item to guarantee component is crossed, the coverage rate of feature is improved.The module diagnostic extracted in the present invention includes dynamic State symbol table, head information and constant character string can also increase more characteristic items according to the requirement of actual analysis, and component i is each Characteristic value quantity in the feature array of feature is { n1,n2,n3, wherein n1For the number of element contained by dynamic symbol table feature Amount, n2For the quantity of element contained by head information characteristics, n3For the quantity of element contained by constant character string feature.Wherein, for difference The component needs of file type are handled differently, specific as follows:
A, for the Binary Element of PE32 format .rdata sections of extractions are read using the pefile module of python language Constant character string reads the function name in dynamic symbol table extraction assembly;
B, for the Binary Element of Linux format .rodata is extracted using the readelf order in Linux, .data .dynstr sections of extraction assembly features;
C, it for Jar packet, needs first to decompress jar packet, then using the javap order of Java language to class File carries out decompiling, extracts individual features.
Step 1.2, module diagnostic library initialization, the characteristic set of the component i generated in input step 1.1, traverse component The characteristic value array of each all features of component in feature database, reverse indexing feature database also can be used here and scan for, and point Classification and matching is not carried out with the characteristic value array of each feature of component i, is found there are the component of same characteristic features value, is formed interim Matching result, then each component individual features in the characteristic value array of each feature of computation module i and interim matching result again Characteristic value array intersection, and record respectively acquisition every category feature each intersection in contained element maximum value, i.e. { m1, m2,m3, wherein m1The maximum value of element contained by intersection for dynamic symbol table feature, m2Contained by intersection for head information characteristics The maximum value of element, m3The maximum value of element contained by intersection for constant character string feature.
In the present invention, the structure in module diagnostic library is the set of a component, wherein each component is the collection of multiple features It closes, each feature is the array of a characteristic value.For example, component 1 includes dynamic symbol table, head information and constant character string three Feature, wherein constant character feature is a characteristic value array, wherein including the characteristic value of multiple constant character strings, similarly, is moved State symbol table and head information characteristics are also respectively to contain the array of multiple dynamic symbols and head information characteristics value.
Here, in order to save storage, the array element of feature is characterized the hash value of value.It is fast in order to improve inquiry simultaneously Degree establishes reverse indexing library for match query.The structure in reverse indexing library is a set, and set element is each feature Reverse indexing, in each reverse indexing, index key value is characterized the hash value of value, and value is the component name structure comprising this feature At character string dimension.
The weight of each feature of step 1.3, according to the following formula computation module i:
Wherein, a1For the weight of the dynamic symbol table feature of component i, a2For the weight of the head information characteristics of component i, a3For The weight of the constant character string feature of component i;
Weight corresponding to step 1.4, feature and feature by component i is stored in module diagnostic library, here, when not finding When the feature to match with component i, can also rule of thumb in artificial assignment component i each feature weight.
Step 1.5, the next component of selection, execute step 1.2, until the last one component is finished, exit journey Sequence.
Binary software is finally analyzed according to the module diagnostic library of generation, as shown in Fig. 2, including the following steps:
Step 2.1, the feature that binary software is extracted by the method for dis-assembling, specific method are identical as step 1.1;
The feature of step 2.2, the binary software extracted according to step 2.1, according in the characteristic value array of feature Characteristic value searches the component with same characteristic features value in the module diagnostic library that step 1.4 constructs, and forms interim the component list, this In the reverse indexing feature database that can use in step 1.2 scan for;
Step 2.3, the interim the component list of traversal, it is and to be checked by the characteristic value array of each feature of wherein each component The characteristic value of the same characteristic features of the binary software of survey is mutually matched, and is found there are the component of same characteristic features value, is formed interim With as a result, then calculating each component individual features in the characteristic value array and interim matching result of each feature of binary software Characteristic value array intersection, the quantity of the element contained by the intersection illustrates that this feature is to match spy if being more than threshold value set Sign, and after being normalized according to the characteristic value quantity that formula (1) will match to sum after the multiplied by weight of the module diagnostic, obtain To the matching factor of the component, think that the component is matching component if matching factor is greater than threshold value, then outputs it.Pass through Obtained binary software institute matching component as a result, simultaneously vulnerability information present in coupling unit, can analyze obtain two into Vulnerability information present in software processed.
In conclusion the above is merely preferred embodiments of the present invention, being not intended to limit the scope of the present invention. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (6)

1. a kind of binary software analysis1 method based on module diagnostic weight, which is characterized in that extract the more of Binary Element The feature of a type, and weight is assigned according to influence degree of each feature to component, construct module diagnostic library;
The feature for extracting the multiple type in binary software to be analyzed, by the feature of the binary software extracted with The feature of the same type of various components is matched respectively in the module diagnostic library, if matching result is greater than threshold value, Determine that the component matches with the binary software.
2. the method according to claim 1, wherein the feature of multiple types of the Binary Element includes dynamic State symbol table, head information and constant character string.
3. according to the method described in claim 2, it is characterized in that, the detailed process in the building module diagnostic library are as follows:
Step 3.1, the feature for extracting the component i by corresponding dis-assembling method according to the file type of Binary Element i, This feature characteristic value array representation, the quantity of the characteristic value in each characteristic value array of the component i are { n1,n2,n3, In, n1For the quantity of element contained by dynamic symbol table feature, n2For the quantity of element contained by head information characteristics, n3For constant character The quantity of element contained by string feature;
Step 3.2, each characteristic value for component i, the characteristic value array of each all features of component in traverse component feature database, Find there are the component of same characteristic features value, form interim matching result, then computation module i each feature characteristic value array with The intersection of the characteristic value array of each component individual features, records contained member in all intersections of every category feature in interim matching result Maximum value { the m of plain number1,m2,m3, wherein m1The maximum value of element number contained by intersection for dynamic symbol table feature, m2For The maximum value of element number contained by the intersection of head information characteristics, m3Element number contained by intersection for constant character string feature is most Big value;
The weight of each feature of step 3.3, according to the following formula computation module i:
a1=1-m1/n1
a2=1-m2/n2
a3=1-m3/n3
Wherein, a1For the weight of the dynamic symbol table feature of component i, a2For the weight of the head information characteristics of component i, a3For component i Constant character string feature weight;
Weight corresponding to step 3.4, feature and feature by component i is stored in module diagnostic library;
Step 3.5, the next component of selection, execute step 3.2, until the last one component is finished.
4. according to the method described in claim 3, it is characterized in that, the feature of the binary software that will be extracted with it is described The feature of the same type of various components carries out matched detailed process respectively in module diagnostic library, includes the following steps:
Step 4.1, the feature that binary software is extracted by the method for dis-assembling, method are identical as step 3.1;
Step 4.2, each characteristic value for binary software search tool in the module diagnostic library that step 3.4 constructs There is the component of same characteristic features value, forms interim matching result the component list;
Step 4.3, the characteristic value array of each feature of calculating binary software and component each in interim matching result are corresponding special The intersection of the characteristic value array of sign, the quantity of the element contained by the intersection illustrate that this feature is to match spy if being more than threshold value set Sign, and the characteristic value quantity that will match to normalize after with sum again after the multiplied by weight of the module diagnostic, obtain the component Matching factor, think that the component is matching component if matching factor is greater than threshold value, then the component exported.
5. the method according to claim 3 or 4, which is characterized in that the element in the characteristic value array is characterized value Hash value.
6. according to the method described in claim 5, it is characterized in that, the realization of reverse indexing library, institute can be used in the matching process Stating reverse indexing library is a set, which is the reverse indexing of each feature, in each reverse indexing, indexes key Value is characterized the hash value of value, and value value is the character string dimension that the component name comprising this feature is constituted.
CN201910669789.2A 2019-04-11 2019-07-24 Binary software analysis method based on component characteristic weight Active CN110399729B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910286878 2019-04-11
CN2019102868789 2019-04-11

Publications (2)

Publication Number Publication Date
CN110399729A true CN110399729A (en) 2019-11-01
CN110399729B CN110399729B (en) 2021-04-27

Family

ID=68325877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910669789.2A Active CN110399729B (en) 2019-04-11 2019-07-24 Binary software analysis method based on component characteristic weight

Country Status (1)

Country Link
CN (1) CN110399729B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046388A (en) * 2019-12-16 2020-04-21 北京智游网安科技有限公司 Method for identifying third-party SDK in application, intelligent terminal and storage medium
CN116954701A (en) * 2023-08-09 2023-10-27 软安科技有限公司 Binary detection method and system based on blood-edge relation

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779257A (en) * 2012-06-28 2012-11-14 奇智软件(北京)有限公司 Security detection method and system of Android application program
CN103226583A (en) * 2013-04-08 2013-07-31 北京奇虎科技有限公司 Method and device for recognizing advertisement plugin
CN104517053A (en) * 2013-09-29 2015-04-15 北京金山网络科技有限公司 Software recognition method and device
US20160203330A1 (en) * 2012-06-19 2016-07-14 Deja Vu Security, Llc Code repository intrusion detection
CN106650450A (en) * 2016-12-29 2017-05-10 哈尔滨安天科技股份有限公司 Malicious script heuristic detection method and system based on code fingerprint identification
CN107704501A (en) * 2017-08-28 2018-02-16 中国科学院信息工程研究所 A kind of method and system for identifying homologous binary file
CN107844705A (en) * 2017-11-14 2018-03-27 苏州棱镜七彩信息科技有限公司 Third party's component leak detection method based on binary code feature
CN108763928A (en) * 2018-05-03 2018-11-06 北京邮电大学 A kind of open source software leak analysis method, apparatus and storage medium
CN109062792A (en) * 2018-07-21 2018-12-21 东南大学 A kind of Open Source Code detection method based on String matching and characteristic matching
CN109543408A (en) * 2018-10-29 2019-03-29 卓望数码技术(深圳)有限公司 A kind of Malware recognition methods and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160203330A1 (en) * 2012-06-19 2016-07-14 Deja Vu Security, Llc Code repository intrusion detection
CN102779257A (en) * 2012-06-28 2012-11-14 奇智软件(北京)有限公司 Security detection method and system of Android application program
CN103226583A (en) * 2013-04-08 2013-07-31 北京奇虎科技有限公司 Method and device for recognizing advertisement plugin
CN104517053A (en) * 2013-09-29 2015-04-15 北京金山网络科技有限公司 Software recognition method and device
CN106650450A (en) * 2016-12-29 2017-05-10 哈尔滨安天科技股份有限公司 Malicious script heuristic detection method and system based on code fingerprint identification
CN107704501A (en) * 2017-08-28 2018-02-16 中国科学院信息工程研究所 A kind of method and system for identifying homologous binary file
CN107844705A (en) * 2017-11-14 2018-03-27 苏州棱镜七彩信息科技有限公司 Third party's component leak detection method based on binary code feature
CN108763928A (en) * 2018-05-03 2018-11-06 北京邮电大学 A kind of open source software leak analysis method, apparatus and storage medium
CN109062792A (en) * 2018-07-21 2018-12-21 东南大学 A kind of Open Source Code detection method based on String matching and characteristic matching
CN109543408A (en) * 2018-10-29 2019-03-29 卓望数码技术(深圳)有限公司 A kind of Malware recognition methods and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
左玲: "《基于Android恶意软件检测系统的设计与实现》", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046388A (en) * 2019-12-16 2020-04-21 北京智游网安科技有限公司 Method for identifying third-party SDK in application, intelligent terminal and storage medium
CN116954701A (en) * 2023-08-09 2023-10-27 软安科技有限公司 Binary detection method and system based on blood-edge relation
CN116954701B (en) * 2023-08-09 2024-05-14 软安科技有限公司 Binary component detection method and system based on blood relationship

Also Published As

Publication number Publication date
CN110399729B (en) 2021-04-27

Similar Documents

Publication Publication Date Title
US20240028571A1 (en) Automatic entity resolution with rules detection and generation system
KR100902966B1 (en) Method and system for mapping strings for comparison
US7606784B2 (en) Uncertainty management in a decision-making system
CN102171702B (en) The detection of confidential information
CN110427755A (en) A kind of method and device identifying script file
CN111291070A (en) Abnormal SQL detection method, equipment and medium
CN108268886B (en) Method and system for identifying plug-in operation
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
CN110399729A (en) A kind of binary software analysis1 method based on module diagnostic weight
CN112115326B (en) Multi-label classification and vulnerability detection method for Etheng intelligent contracts
KR102091633B1 (en) Searching Method for Related Law
Shivaji et al. Plagiarism detection by using karp-rabin and string matching algorithm together
CN115658080A (en) Method and system for identifying open source code components of software
CN116340185A (en) Method, device and equipment for analyzing software open source code components
CN113312258A (en) Interface testing method, device, equipment and storage medium
CN109472145A (en) A kind of code reuse recognition methods and system based on graph theory
CN116975881A (en) LLVM (LLVM) -based vulnerability fine-granularity positioning method
Periyasamy et al. Prediction of future vulnerability discovery in software applications using vulnerability syntax tree (PFVD-VST).
CN111859896B (en) Formula document detection method and device, computer readable medium and electronic equipment
Meng et al. Detecting buffer boundary violations based on SVM
CN107463845A (en) A kind of detection method, system and the computer-processing equipment of SQL injection attack
KR20210142443A (en) Method and system for providing continuous adaptive learning over time for real time attack detection in cyberspace
EP2793145A2 (en) Computer device for minimizing computer resources for database accesses
CN109408713A (en) A kind of software requirement searching system based on field feedback
CN111382267B (en) Question classification method, question classification device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant