CN104731705B - A kind of dirty data propagation path based on complex network finds method - Google Patents

A kind of dirty data propagation path based on complex network finds method Download PDF

Info

Publication number
CN104731705B
CN104731705B CN201310750367.0A CN201310750367A CN104731705B CN 104731705 B CN104731705 B CN 104731705B CN 201310750367 A CN201310750367 A CN 201310750367A CN 104731705 B CN104731705 B CN 104731705B
Authority
CN
China
Prior art keywords
complex network
node
dirty data
network
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310750367.0A
Other languages
Chinese (zh)
Other versions
CN104731705A (en
Inventor
胡昌振
赵小林
郝刚
薛静锋
马锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201310750367.0A priority Critical patent/CN104731705B/en
Publication of CN104731705A publication Critical patent/CN104731705A/en
Application granted granted Critical
Publication of CN104731705B publication Critical patent/CN104731705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Stored Programmes (AREA)

Abstract

The present invention provides a kind of dirty data propagation path based on complex network and finds method, can be to being translated without the binary program for providing source code, and obtained result is handled, and excavates generation useful information.The first step:Decompiling is carried out to binary file, and obtains the intermediate code of C language, after testing a simple C programmer, intermediate language code is obtained;Second step:Function call path is captured, function address is resolved into function name, and is handled and is simplified, and generator matrix form;Ultimately produce function call relationship graph;3rd step:Analytical function call graph, obtains node, side, weight information, and calculating obtains node degree, sets up the complex network figure with key node;4th step:According to the nonuniformity of the power law distribution of complex network figure, find out the point related to construction dirty data and call the high point of frequency.

Description

A kind of dirty data propagation path based on complex network finds method
Technical field
Method is found the present invention relates to a kind of dirty data propagation path based on complex network, belongs to software security techniques neck Domain.
Background technology
In the research method based on complex network, there are many concepts and method to can be used to reflect that the ASSOCIATE STATISTICS of network is special Property, the degree that most important of which has node is distributed.In software network, the degree of a node can be extended to such in software The number of times called by other classes.Therefore, intuitively, if a called number of times of class is more, then its importance It is higher.But, software network is typically all weighting directed networkses, and the importance of class is only weighed from " degree " and is forbidden Really.For example, for a class with specific function, it degree itself is simultaneously little, if can but it is removed from software Can directly result in the software can not operate.
Known by the research to scales-free network, the situation that defect occurs for general node there are two kinds:One kind is random section There is a part of random node in point design defect, that is, software network has design defect, but the function of this part of nodes Missing is general to have no effect on the overall normal operation of software;Two be the design defect of importance node, i.e. software network interior joint The higher node of importance has design defect.In software network, because the nonuniformity of power law distribution is presented in degree distribution, This make it that the called number of times of this kind of node is very high, and they account for all node numbers in software network and, less than 5%, but realized Software topmost function.Therefore, once the function of these classes is lacked, it is possible to system crash will be directly resulted in. In consideration of it, if being treated in software development initial stage and software test procedure to the node emphasis of these only a fews, then Software development test period can not only be shortened, and the quality of software can also be made to increase.Therefore, it is how more effective and accurate It is the work currently to be done really to find these important nodes.
Forefathers carry out the correlative study of decompiling on the basis of IDA dis-assemblings, and itself provides abundant dis-assembling Object information and data represent definition, therefore the realization of intermediate language may not necessarily simulate compilation semanteme as semiology analysis Perform, be also not required to as SSL is compared the description of complete and complicated instruction system.Intermediate language realization herein, mainly According to the semanteme and IDA dis-assembling object information of assembly instruction, by building a relatively simple instruction semantic describing word Allusion quotation, by searching matching accordingly, and then realizes assembler language to the conversion of intermediate language.Have in relating generally to:It is middle The definition of language, the semantic dictionary description of intermediate language, specific translation is realized.
Existing dirty data discovery technique is not comprehensive enough, can run through the method for running software whole cycle without a set of. Therefore, once the function of these classes is lacked, it is possible to system crash will be directly resulted in.If in consideration of it, opened in software If being treated in hair initial stage and software test procedure to the node emphasis of these only a fews, then can not only shorten software development Test period, and the quality of software can also be made to increase.Therefore, it is how more effective and correctly find these important sections Point is the work currently to be done.And existing discovery dirty data communications is all based on source code, does not support to binary system journey The discovery of sequence.
The content of the invention
The present invention provides a kind of dirty data propagation path based on complex network and finds method, can be to no offer source code Binary program translated, and obtained result is handled, excavates generation useful information, in can be in software Defect is found in operation, the reliability of software is improved.
Realize that technical scheme is as follows:
A kind of dirty data propagation path based on complex network finds method, comprises the following steps:
The first step:Decompiling is carried out to binary file using IDA plug-in unit Hex-Rays, obtained in similar C language Between code;
Second step:The intermediate code that the instrument provided using GNU compilers is generated to the first step carries out data collection, capture Function call path, after path is obtained, resolves to function name, afterwards to tracking data using Addr2line by function address Handled and simplified, and generator matrix form;Finally use Graphviz generating function call graphs;
3rd step:The function call relationship graph that parsing previous step is obtained, obtains node, side, weight information, and calculate Node degree is obtained, the complex network figure with key node is set up;
4th step:According to the nonuniformity of the power law distribution of complex network figure, the complex network generated with reference to previous step Figure, finds out with constructing the related point of dirty data and calling frequency very high point, key node is labeled with different colors, The path of dirty data is marked with special color, so as to find software defect hidden danger.
3rd step sets up complex network figure and uses following methods, comprises the following steps that:
(1) all functions of binary file obtained second step are used as the node in network;
(2) according to whether have between node relation set up have no right network;
(3) degree of correlation between calculate node;
(4) according to relatedness computation weights;
(5) weighted network figure is set up.
Beneficial effects of the present invention:
The present invention is studied based on disassemblers IDA, is translated to the binary program without offer source code, And obtained result is handled, excavate generation useful information.To in can be found in running software defect, improve it is soft The reliability of part.
The object for focusing on analysis of this technology is binary file rather than source code, so premise prepares to need two Carry system code is converted into the similar form of source code.Secondly, this selected topic innovative point is the knowledge analysis software of complex network The relations such as function call, largely beneficial to the discovery of software defect.
Brief description of the drawings
Fig. 1 has found the flow chart of method for a kind of dirty data propagation path based on complex network of the present invention;
Fig. 2 collects for the present invention, simplified and visualization track path procedure chart;
Fig. 3 is the function call result schematic diagram of application program in the embodiment of the present invention.
Embodiment
Further the present invention is described in detail below in conjunction with the accompanying drawings.
The embodiment of the present invention is roughly divided into three parts:One is to carry out decompiling to binary file, obtains manageable Source code or analyzable intermediate language, this part is completed using IDA plug-in unit Hex-Rays;Two be that decompiling result is carried out Analysis, provides the call graph of function, this part using GNU compiler instruments chain, Addr2line instruments, it is fixed and Graphviz instruments are completed;Three be to generate the complex network figure with key node according to graph of a relation, utilizes graph theory and complex web The knowledge of network, finds out the propagation path of dirty data.
The workflow of the present invention is described in detail with reference to Fig. 1:
1. decompiling
After a variety of decompiling instruments are contrasted, the present invention determines the plug-in unit Hex-Rays using IDA, and binary file is entered Row decompiling, and obtain the intermediate code of similar C language.
2. drafting function call graph
In order to capture the calling figure of simultaneously explicit function, it is necessary to 4 indispensable elements:GNU compiler instruments chain, Addr2line Intermediate code and Graphviz instruments obtained by instrument, previous step.Addr2line instruments can be with recognition function, given address Source code line number and executable image.The intermediate code of customization is a very simple instrument, and it can be reduced advises to figure The address tracking of model, can do simple processing to the code of decompiling here.Graphviz instruments can generate figure image. Whole step is as shown in Figure 2.
The intermediate code that the instrument provided first by GNU compilers is generated to the first step carries out data collection, captures letter Number calls path.After path is obtained, function address is resolved into function name using Addr2line.Tracking data is entered afterwards Row is handled and simplified, and generator matrix form.Finally use Graphviz generating function call graphs.
3. find out dirty data propagation path
Dirty data is to represent that a data are changed, but does not also preserve or further handle.Or itself The data that value has been lost.Its propagation path can be obtained by constructing dirty data and recording its nodal information, then use complex web The knowledge clustered in network obtains similar node to achieve the goal.
The graphic file that parsing previous step is obtained first, obtains the information such as node, side, weight, and calculating is saved Point degree etc..
The complex network model of software systems is set up, is concretely comprised the following steps:
(1) all functions of binary file obtained second step are used as the node in network;
(2) according to whether have between node relation set up have no right network;
(3) degree of correlation between calculate node;
(4) according to relatedness computation weights;
(5) weighted network figure is set up.
According to the nonuniformity of the power law distribution of complex network, the complex network model generated with reference to previous step is found out To the related point of construction dirty data and calling frequency very high point, key node is labeled with different colors, dirty data Path marked with special color, so as to find software defect hidden danger.
Although combining the embodiment that accompanying drawing describes the present invention, it will be apparent to those skilled in the art that Under the premise without departing from the principles of the invention, some deformations can also be made, replaces and improves, these also should be regarded as belonging to this hair Bright protection domain.

Claims (2)

1. a kind of dirty data propagation path based on complex network finds method, it is characterised in that comprise the following steps:
The first step:Decompiling is carried out to binary file using IDA plug-in unit Hex-Rays, the middle generation of similar C language is obtained Code;
Second step:The intermediate code that the instrument provided using GNU compilers is generated to the first step carries out data collection, captures function Path is called, after path is obtained, function address is resolved into function name using Addr2line, tracking data is carried out afterwards Handle and simplify, and generator matrix form;Finally use Graphviz generating function call graphs;
3rd step:The function call relationship graph that parsing previous step is obtained, obtains node, side, weight information, and calculating is obtained Node degree, sets up the complex network figure with key node;
4th step:According to the nonuniformity of the power law distribution of complex network figure, the complex network figure generated with reference to previous step is looked for Go out and the related point of construction dirty data and call the high point of frequency, key node is labeled with different colors, dirty data Path marked with special color, so as to find software defect hidden danger.
2. a kind of dirty data propagation path based on complex network as claimed in claim 1 finds method, it is characterised in that the Three steps set up complex network figure and use following methods, comprise the following steps that:
(1) all functions of binary file obtained second step are used as the node in network;
(2) according to whether have between node relation set up have no right network;
(3) degree of correlation between calculate node;
(4) according to relatedness computation weights;
(5) weighted network figure is set up.
CN201310750367.0A 2013-12-31 2013-12-31 A kind of dirty data propagation path based on complex network finds method Active CN104731705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310750367.0A CN104731705B (en) 2013-12-31 2013-12-31 A kind of dirty data propagation path based on complex network finds method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310750367.0A CN104731705B (en) 2013-12-31 2013-12-31 A kind of dirty data propagation path based on complex network finds method

Publications (2)

Publication Number Publication Date
CN104731705A CN104731705A (en) 2015-06-24
CN104731705B true CN104731705B (en) 2017-09-01

Family

ID=53455615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310750367.0A Active CN104731705B (en) 2013-12-31 2013-12-31 A kind of dirty data propagation path based on complex network finds method

Country Status (1)

Country Link
CN (1) CN104731705B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104932865B (en) * 2015-07-10 2017-10-10 武汉工程大学 A kind of component agreement method for digging, apparatus and system
CN105068928A (en) * 2015-08-04 2015-11-18 中国人民解放军理工大学 Complex network theory based software test use-case generating method
CN114748875B (en) * 2022-05-20 2023-03-24 一点灵犀信息技术(广州)有限公司 Data saving method, device, equipment, storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140588A (en) * 2007-10-10 2008-03-12 华为技术有限公司 Method and apparatus for ordering incidence relation search result
US7434046B1 (en) * 1999-09-10 2008-10-07 Cisco Technology, Inc. Method and apparatus providing secure multicast group communication
CN101330417A (en) * 2008-07-24 2008-12-24 安徽大学 Quotient space overlay model for calculating network shortest path and building method thereof
CN102841844A (en) * 2012-07-13 2012-12-26 北京航空航天大学 Method for binary code vulnerability discovery on basis of simple symbolic execution
CN103200096A (en) * 2013-03-13 2013-07-10 南京理工大学 Heuristic routing method avoiding key nodes in complex network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7434046B1 (en) * 1999-09-10 2008-10-07 Cisco Technology, Inc. Method and apparatus providing secure multicast group communication
CN101140588A (en) * 2007-10-10 2008-03-12 华为技术有限公司 Method and apparatus for ordering incidence relation search result
CN101330417A (en) * 2008-07-24 2008-12-24 安徽大学 Quotient space overlay model for calculating network shortest path and building method thereof
CN102841844A (en) * 2012-07-13 2012-12-26 北京航空航天大学 Method for binary code vulnerability discovery on basis of simple symbolic execution
CN103200096A (en) * 2013-03-13 2013-07-10 南京理工大学 Heuristic routing method avoiding key nodes in complex network

Also Published As

Publication number Publication date
CN104731705A (en) 2015-06-24

Similar Documents

Publication Publication Date Title
CN106503496B (en) Based on operation code replacement and combined Python shell script anti-reversal method
CN104572072B (en) A kind of language transfer method and equipment to the program based on MVC pattern
Buinevich et al. The life cycle of vulnerabilities in the representations of software for telecommunication devices
CN105677574B (en) Android application leak detection method and system based on function control stream
CN103377045B (en) Method and system for Translation Verification Test
CN106371887A (en) System and method for MSVL compiling
CN110196720B (en) Optimization method for generating dynamic link library by Simulink
CN112163420A (en) NLP technology-based RPA process automatic generation method
CN112104709A (en) Intelligent contract processing method, device, medium and electronic equipment
CN110196815A (en) Software fuzzy test method
CN104731705B (en) A kind of dirty data propagation path based on complex network finds method
CN112540767A (en) Program code generation method, program code generation device, electronic device and storage medium
CN106777529A (en) Integrated circuit fault-resistant injection attacks capability assessment method based on FPGA
Martinez et al. Recovering sequence diagrams from object-oriented code: An ADM approach
CN109155129B (en) Language program control system
Balsamo et al. Deriving performance models from software architecture specifications
CN117093222A (en) Code parameter abstract generation method and system based on improved converter model
CN111176995B (en) Test method and test system based on big data test case
CN112685291A (en) System joint test method and related device
Zhang et al. Automated extraction of grammar optimization rule configurations for metamodel-grammar co-evolution
Lerchner et al. An open S-BPM runtime environment based on abstract state machines
Zhang An Approach for Extracting UML Diagram from Object-Oriented Program Based on J2X
Berti et al. Evaluating Large Language Models in Process Mining: Capabilities, Benchmarks, Evaluation Strategies, and Future Challenges
Haga et al. Inconsistency Checking of UML Sequence Diagrams and State Machines Using the Structure-Behavior Coalescence Method
CN111651773B (en) Automatic binary security vulnerability mining method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant