CN107229563A - A kind of binary program leak function correlating method across framework - Google Patents

A kind of binary program leak function correlating method across framework Download PDF

Info

Publication number
CN107229563A
CN107229563A CN201610178368.6A CN201610178368A CN107229563A CN 107229563 A CN107229563 A CN 107229563A CN 201610178368 A CN201610178368 A CN 201610178368A CN 107229563 A CN107229563 A CN 107229563A
Authority
CN
China
Prior art keywords
function
measured
leak
similarity
basic block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610178368.6A
Other languages
Chinese (zh)
Other versions
CN107229563B (en
Inventor
石志强
常青
陈昱
王猛涛
孙利民
朱红松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201610178368.6A priority Critical patent/CN107229563B/en
Publication of CN107229563A publication Critical patent/CN107229563A/en
Application granted granted Critical
Publication of CN107229563B publication Critical patent/CN107229563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3616Software analysis for verifying properties of programs using software metrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a kind of binary program leak function correlating method across framework.This method is:1) binary file for treating binary program carries out conversed analysis, obtains a function library to be measured;Then according to the function library to be measured, function call graph, control flow graph, function base attribute are obtained;2) according to function call graph, control flow graph, each function to be measured of the basic attributes extraction of function feature;Then according to the feature of extraction and the feature of leak function, the numerical value similarity of each function to be measured and leak function is calculated;3) for each function to be measured, the tax power bigraph (bipartite graph) of the function to be measured and leak function is constructed respectively, the overall similarity of the function to be measured and leak function is calculated using bigraph (bipartite graph) algorithm;4) if the overall similarity of function to be measured and leak function is more than setting decision threshold, the function to be measured is judged as doubtful leak function, otherwise be determined as normal function.This method is realized simple, it is easy to promote.

Description

A kind of binary program leak function correlating method across framework
Technical field
The present invention relates to binary program bug excavation and conversed analysis field, and in particular to a kind of binary program leak function correlating method across framework, belongs to computer program detection technique field.
Background technology
With the rapid popularization of the high speed development and information system, geo-informatization system of global information technology, computer software has become World Economics, science and technology, military affairs and the important composition of social development.Practice have shown that, most information security events are all that attacker initiates by software vulnerability.Therefore, security breaches are the deciding factors for directly affecting information safety system, it is necessary to software vulnerability is analyzed and utilized.Leak analysis can be divided into source code level and binary level by analyzed object.The leak analysis technology of source code level is directly to being analyzed with the program of high level language.Analyst, which can utilize, enriches complete semantic information in source code, by a series of leak analysis technologies, code error and design defect in discovery procedure.But a large amount of business softwares exist in binary code form in actual applications, and source code is but difficult to obtain.Therefore, binary program leak analysis has been increasingly becoming an important branch of information security field.
Function corresponding technology is based primarily upon the detection of binary code similitude.The application scenarios of early stage are to calculate for the similarity of two binary files with schema compilation to enter line function association, due to being for being compiled with framework, the assembler obtained after dis-assembling is same instruction set, therefore assembler can be regarded to character string as, directly carries out similarity analysis and processing.2013, Arun Lakhotia, which propose a kind of method of semantic template, was used for the quick positioning of similar codes fragment.2014, Yaniv David compiled distance to calculate the similarity degree of basic block using character.But, researcher has found, if the compiling optimization option used during compiling binary file is different, even when being that there is also very big difference for the obtained assembler of same section of source code dis-assembling, this means that showing assembler form relies on stronger method to compiling optimization option sensitivity, so researcher will study point and turn to the semantic information relatively low to assembler performance form dependence, the semantic information for starting extraction procedure fragment is used as feature.2014, Jannik Pewny proposed a kind of leak association algorithm based on semantic signature, by it is instruction morphing in basic block be expression formula, and be stored as tree construction, similarity calculated using tree compiling distance, and realize prototype TEDEM.The same year, Manuel Egele propose a kind of binary code similarity detection method based on dynamic instrumentation technology, mainly analog function dynamic operation environment enters line code retrieval as the feature of function, it ensures that each basic block is at least executed once by being performed again from the basic BOB(beginning of block) of function entrance along certain execution route, and realizes prototype BLEX.Later, third party code storehouse was compiled and was deployed on different CPU platforms by increasing IOT producers, and this means that the demand that leak function can be searched in the binary file for any schema compilation will be increasing.Existing function corresponding technology or due to method limit to (such as the function corresponding technology based on detection assembler character string similarity degree) or due to instrument limitation (such as dynamic instrumentation instrument PIN only face x86 platforms) can not be applied directly to across in the scene of framework come.Jannik Pewny in 2015 have delivered Cross-Architecture Bug Search in Binary Executables on S&P.The paper is proposed across this application scenarios of framework first, and (x86, ARM, MIPS) basic block Semantic features extraction across framework is realized using methods such as lifting intermediate language representation, numerical sample and min-hashs.But the accuracy rate of this method is unsatisfactory, rank1 just reaches 32.4% during the function similarity degree for the openssl firmwares for being used for being directed to ARM frameworks and MIPS frameworks more respectively using this method.It is therefore desirable to study the leak corresponding technology across framework, a kind of higher correlating method of accuracy rate is proposed.
At present, a kind of realization is lacked simply, accuracy rate is high, across the binary program leak corresponding technology of framework.
The content of the invention
Present invention aims at provide a kind of binary program leak function correlating method across framework.Method flow of the present invention mainly includes:Conversed analysis is carried out to binary file and obtains function library to be measured, the numerical value similarity of function to be measured and leak function is calculated;Two structure subgraphs of partial structurtes information structure of two functions to be compared are intercepted from function call graph;By two structure subgraph Hierarchical abstractions to assign power bigraph (bipartite graph), calculated using Bipartite Matching algorithm and assign power bigraph (bipartite graph) maximum weight matching, weighted sum and is ranked up accordingly as the overall similarity of two functions;Decision threshold is calculated based on ROC curve, the function that similarity is more than decision threshold is judged as doubtful leak function, carries out next step analysis, otherwise is determined as normal function, does not deal with.
Reconstruction of function controlling stream graph algorithm and structure match algorithm when calculating overall similarity when the innovation point of the present invention is to calculate similarity.The present invention has merged the numerical information and structural information of function, and the extraction of feature can enter line function association, as a result accuracy rate is high independent of specific instruction set to the binary file under different frameworks, realize simple.
To achieve the above object, the present invention is adopted the following technical scheme that:
A kind of binary program leak function correlating method across framework, mainly comprising following 3 steps:
1) the numerical value similarity of function to be measured and leak function is calculated.Conversed analysis is carried out to binary file first and obtains function library to be measured;Extract call relation information (i.e. function call graph) between function to be measured, the basic aspect information of attribute information three of controlling stream graph information, function carries out the processing that quantizes in function, is used as the characteristic vector of function;, as training sample, integrated classifier is trained using from compiling, multi-platform, tape symbol table collection of functions;Calculate function to be measured and the similarity of each feature of leak function constitutes similarity vector, bring into integrated classifier and be predicted, obtain numerical value similarity.
2) construction assigns power bigraph (bipartite graph), and overall similarity is calculated using bigraph (bipartite graph) algorithm.Two structure subgraphs of partial structurtes information structure of two functions to be compared are intercepted from function call graph, the number of plies of interception can be determined according to actual needs.Two structure subgraph Hierarchical abstractions are weighed into bigraph (bipartite graph) to assign, wherein set of node is the function that two structure subgraph respective layers are included, side integrates as the similarity degree of any two function, side right is that previous step calculating obtains numerical value similarity, then the maximum weight matching of power bigraph (bipartite graph) is assigned using Bipartite Matching algorithm layered method, weighted sum is used as function to be measured and the overall similarity of leak function.
3) judged according to the decision threshold calculated based on ROC curve.The overall similarity vector-drawn ROC curve of collection of functions to be measured and leak function is obtained, takes the corresponding threshold value of peak of Y-X curves as decision threshold, the function that similarity is more than decision threshold is judged as doubtful leak function, otherwise is determined as normal function.The each point for constituting ROC curve is (x, y), then the curve that (x, y-x) is constituted is the Y-X curves based on ROC curve, and wherein x domain of definition is M.
The present invention can obtain following beneficial effect:
The present invention is when calculating the numerical value similarity of function to be measured and leak function, mainly consider call relation feature, stack space feature, character string feature, code size feature, path sequence feature, path essential characteristic, degree series feature, degree essential characteristic, 9 aspect features such as figure scale feature, the more complete characteristic feature for reflecting a function, the extraction of feature is independent of specific instruction set, therefore the present invention can carry out leak association to the binary file for two different schema compilations.Simultaneously, when extracting feature, using being extracted by the way of writing IDA plug-in units from IDA analysis results, and IDA has difference when carrying out conversed analysis constructed fuction controlling stream graph to the binary files of different frameworks in itself, the present invention proposes control flow graph restructing algorithm, the real structure of control flow graph is reduced to a certain extent, improves the degree of accuracy of Function feature extraction.
The present invention employs cutted function calling figure in the numerical information in fusion function and structural information, and construction assigns the method that power bigraph (bipartite graph) calculates maximum weight matching.Assuming that contribution of the function node nearer apart from function to be checked to matching is bigger, function node is layered by the hop count apart from function to be checked, the similarity that minimum bipartite graph matching obtains individual layer is carried out to individual layer function node using Kuhn-Munkres algorithms, the Similarity-Weighted summation of each layer is finally obtained into function overall similarity.This method is when calculating the overall similarity of function to be matched, based on the recalls information between function, it is contemplated that the similarity degree of other function pairs treats the influence of adaptation function pair.It is more objective and accurate compared to the method for only using numerical value.
Of the invention and existing technology ratio, independent of specific instruction set, can carry out leak association to the binary file of different frameworks, realize simple, it is easy to promote.
Brief description of the drawings
Fig. 1 is protocol procedures schematic diagram;
Fig. 2 is that IDA differs greatly schematic diagram to the CFG figures of the Functional Analysis under different frameworks, wherein
(a) scheme for the CFG of the busybox-1.20.0 of arm schema compilations mencap_main functions,
(b) scheme for the CFG of the busybox-1.20.0 of mips schema compilations mencap_main functions;
Fig. 3 is reconstruct control flow graph schematic diagram;
Fig. 4 is that structure subgraph is layered schematic diagram;
Fig. 5 assigns power bigraph (bipartite graph) schematic diagram for construction;
Fig. 6 is to determine optimal threshold schematic diagram based on ROC curve.
Embodiment
A kind of binary program leak correlating method across framework, embodiment is as follows:
1) IDA plug-in units are write conversed analysis is carried out to binary file, obtain function library to be measured and function base attribute, function call graph and control flow graph.
2) the numerical value similarity of function to be measured and leak function is calculated.Whole process is extracted including numerical characteristics, three steps of Similarity Measure and neural network prediction similarity.
The stage is extracted in numerical characteristics, numerical characteristics extraction is carried out in terms of function base attribute, function call graph and control flow graph three respectively.Mainly extract call relation feature, character string feature, stack space feature, code size feature, path sequence feature, the path essential characteristic of function to be measured, degree series feature, degree essential characteristic, the nine aspect features such as figure scale feature.This nine aspects feature more intactly reflects the Representative properties of a function.
Analytic function calling figure, calculate each function to be measured by the number of times of other function calls, calculate the number of times after the number of times and duplicate removal of the function call other functions, constitute call relation feature.
Analytic function base attribute, calculates stack space, constitutes stack space feature;Jump instruction number, number of instructions are calculated, size of code constitutes code size feature;The character string quantity called and the string assemble called are calculated, character string feature is constituted.
Before analyzing control flow graph, to feature extraction directly can not be carried out using the IDA control flow graphs (CFG figures) analyzed.In a few cases, CFG figure of the Same Function under different frameworks can be very different, such as the memcap_main functions of busybox, and its CFG figure under ARM frameworks and MIPS frameworks has very big difference, as shown in Figure 2.This is because, the cpu instruction collection of every kind of platform is all responsible for processing by corresponding IDA processor modules.But the strategy of each platform processor module generation CFG figures is simultaneously differed, such as busybox rmdir_main functions, ARM platforms bl instructions are divided to basic block, and the jal (being all function call instruction) under MIPS platforms is not divided to basic block.For the basic block division rule of unified CFG figures, it would be desirable to CFG figures are rebuild, restructing algorithm is as follows
A) address end to end of all basic blocks of recognition function and original side end dot address.
B) all basic blocks are ranked up by basic block leading address ascending order order, count the in-degree and out-degree of each basic block.
C) basic block is scanned from small to large by basic block leading address ascending order order.If the out-degree of n-th of basic block is 0 and (n+1)th basic block in-degree is 0, it is n-th new of basic block then to merge the two basic blocks, delete former n-th and former (n+1)th basic block, and to resetting as the side of end-point addresses using the leading address of former (n+1)th basic block, be changed to be used as end-point addresses using the leading address of n-th of basic block;If the out-degree of n-th of basic block is 0 and (n+1)th basic block in-degree is not 0, the side that (n+1)th basic block is pointed in addition one by n-th basic block, the leading address that its terminal point information is the leading address of n-th basic block and terminal point information is n-th of basic block.
D) until last basic block is arrived in scanning, restructuring procedure terminates.
The reconstruct CFG nomography source codes realized with python are as follows, wherein input parameter bbList refers to the list constituted end to end of all basic blocks, edgeList is the list on all original sides of IDA analyses, startPoint is the function entrance address, wherein output toDic is to rebuild CFG to scheme the dictionary that all sides are constituted, and bbDic is to rebuild the dictionary that all basic blocks are constituted after CFG figures.Memcap_main functions reconstruction effect to busybox is as shown in Figure 3.
Analytic function controlling stream graph, calculates the in-degree that goes out of each node (i.e. basic block), constructs CFG digraph adjacency matrix, control flow graph is converted into non-directed graph, calculate the degree of each node, construct CFG non-directed graph adjacency matrix.To CFG digraphs adjacency matrix and the adjacency matrix progress degree analysis of CFG non-directed graphs.In-degree ascending sequence, out-degree ascending sequence are calculated based on CFG digraphs adjacency matrix, based on CFG non-directed graph adjacency matrix calculating degree ascending sequences, three constitutes degree series feature.
Based on degree ascending sequence, the probability sequence of maximal degree, average degree and degree is calculated.Probability sequence based on degree calculates the entropy of figure, construction degree essential characteristic;Path analysis is carried out to CFG non-directed graphs adjacency matrix, the minimum range of any two node (i.e. basic block) is calculated by Floyd algorithms or dijkstra's algorithm, path sequence feature is constructed;Figure average path length, figure diameter and figure radius are calculated, path essential characteristic is constituted.Analyses of basic attributes of sci is carried out to CFG digraphs adjacency matrix, calculate node number, side number, the link one after another of figure, figure density, the cluster coefficients of figure constitute CFG figure scale features.
Operated by above step, call relation feature, character string feature, stack space feature, code size feature, path sequence feature, the path essential characteristic of function, degree series feature, degree essential characteristic and figure scale feature are extracted altogether.
In characteristic similarity calculation stages, the form of expression of feature based, sequence similarity computational methods using numeric type similarity calculating method, based on string editing distance algorithm and the set similarity calculating method based on Jaccard similarities, calculate function to be compared each feature similarity degree as integrated classifier input vector.
The overall similarity stage is predicted in integrated classifier, as training sample, integrated classifier is trained using from compiling, multi-platform, tape symbol table collection of functions first.Specific method is:Selection selects different compilers with a source code, and different optimization options is compiled for different frameworks, obtains many parts of binary executables.Conversed analysis is carried out to every a binary executable, a function library is obtained and extracts the multidimensional characteristic of each function.Feature based, calculates each two function different functions storehouse similarity as the input vector of integrated classifier.If two function names are identical, label is 1, as positive sample, if two function name differences, and label is 0, is used as negative sample.Set up some preliminary classification devices.There are some independent identically distributed sub- training sample sets of sample architecture for the extraction 80% put back to from original training set, be used as the training sample of each grader.Corresponding sub- training sample set input grader is trained, according to predicting the outcome, the parameter of grader is adjusted and is met the requirements until predicting the outcome, now classifier training is finished.Then it is predicted using the integrated classifier logarithm value similarity trained.Feature is extracted to the leak function and each function to be measured, similarity vector is calculated, is used as test sample.It is predicted with some graders in the integrated classifier trained and obtains some predicted values, takes its weighted average as final predicted value as numerical value similarity.
Such as, the training sample of this match patterns of MIPS-O2 → ARM-O2 is obtained if desired.
Step one:MIPS frameworks are directed to openssl source codes, using a binary file of-O2 optimization option compilings, openssl-MIPS-O2 are named as;ARM frameworks are directed to openssl source codes, using a binary file of-O2 optimization option compilings, openssl-ARM-O2 are named as.
Step 2:Conversed analysis is carried out respectively to this two parts of binary files and obtains two function libraries.If openssl-MIPS-O2 function library has m function, entitled X1- MIPS-O2, X2- MIPS-O2 ... ..., Xm-MIPS-O2;Openssl-ARM-O2 function library has n function, entitled Y1- ARM-O2, X2- ARM-O2 ... ..., Yn-ARM-O2.Feature is calculated to all functions in the two storehouses, one is obtained m+n bar features.
Step 3:Function similarity vector between storehouse is calculated, m × n similarity vector is obtained, if Xi=Yj, then it is considered that the function X in openssl-MIPS-O2 storehousesi- the MIPS-O2 and function Y in openssl-ARM-O2 storehousesj- ARM-O2 is same function, then label is classified as 1, is positive sample, conversely, being considered negative sample.
Step 4:In order to which positive and negative sample is balanced and also to speedup, function and 100 openssl-ARM-O2 function every time to 100 openssl-MIPS-O2 carry out Similarity Measure two-by-two and label mark, then can obtain 100 positive samples and 9900 negative samples.Collect the positive sample of whole and 100 are randomly selected from 9900 negative samples as negative sample.
The individual positive samples of min (m, n) and same amount of negative sample have thus been obtained, the original training set of this match patterns of MIPS-O2 → ARM-O2 is used as.
3) construction assigns power bigraph (bipartite graph), and overall similarity is calculated using Bipartite Matching algorithm (such as Kuhn-Munkres algorithms).Whole algorithm steps are as follows:
A) two structure subgraphs of partial structurtes information structure of function to be compared are intercepted from function call graph, wherein, the number of plies of interception can be determined according to experiment effect.
B) the structure subgraph of interception is pressed into the hop count layering from function to be compared (wherein, if function call graph of the structure subgraph from the binary file where leak function, function to be compared herein refers to leak function;If function call graph of the structure subgraph from the binary file where function to be measured, function to be compared herein refers to function to be measured), and weight is assigned by the significance level of comparison function is treated, as shown in Figure 4.
C) two subgraph respective layers are abstract to assign power complete bipartite graph, the function that wherein set of node includes for respective layer, side integrates as the similarity relation of any two function in set of node, and side right is the numerical value similarity of two functions of correspondence, as shown in Figure 5.So just obtain multiple assign and weigh bigraph (bipartite graph).
D) to each power bigraph (bipartite graph) of assigning using similarity of the corresponding maximum weight matching of each layer of Bipartite Matching algorithm layered method as respective layer.
E) overall similarity for summing every layer of Similarity-Weighted as function to be compared.
4) judged according to the decision threshold calculated based on ROC curve.Obtain the overall similarity vector-drawn ROC curve of collection of functions to be measured and leak function.Wherein ROC curve transverse axis is false positive rate, i.e., the ratio (FP/ (FP+TN)) of pseudo- positive example;The longitudinal axis is kidney-Yang rate, i.e., the ratio (TP/ (TP+FN)) of real example.What ROC curve was provided is the situation of change of vacation sun rate and kidney-Yang rate when changes of threshold, and it can be used for the performance of comparator-sorter.Ideally, optimal classification device should be located at the upper left corner, it is meant that grader obtains high kidney-Yang rate when false positive rate is very low, really leak function check will come out, and seldom normal function is mistaken for into leak function.It is the minimum optimal threshold of mistake closer to the point of the ROC curve in the upper left corner, minimum, the i.e. maximum Y-X point of its false positive and false negative sum on training set, as shown in Figure 6.Therefore we are using the corresponding threshold value of the peak of Y-X curves as decision threshold, and the function that similarity is more than decision threshold is judged as doubtful leak function, otherwise is determined as normal function.
In summary, the invention discloses a kind of binary program leak corresponding technology across framework.Application described above scene and embodiment, are not intended to limit the present invention, and any those skilled in the art without departing from the spirit and scope of the present invention, can make various changes and retouching.Therefore, protection scope of the present invention is defined depending on right.

Claims (5)

1. a kind of binary program leak function correlating method across framework, its step is:
1) binary file for treating binary program carries out conversed analysis, obtains a function library to be measured;Then according to the letter to be measured Number storehouse, obtains function call graph, control flow graph, function base attribute;
2) according to function call graph, control flow graph, each function to be measured of the basic attributes extraction of function feature;Then basis The feature of extraction and the feature of leak function, calculate the numerical value similarity of each function to be measured and leak function;
3) for each function to be measured, the tax power bigraph (bipartite graph) of the function to be measured and leak function is constructed respectively, using bigraph (bipartite graph) algorithm Calculate the overall similarity of the function to be measured and leak function;
4) if the overall similarity of function to be measured and leak function is more than setting decision threshold, judge the function to be measured to be doubtful Leak function, on the contrary it is determined as normal function.
2. the method as described in claim 1, it is characterised in that the method for calculating the numerical value similarity is:
21) many parts of binary executables of different frameworks are compiled as to same a source code;Then every a binary system can be held Part of composing a piece of writing carries out conversed analysis, obtains a function library and extracts the feature of each function to be measured;
22) feature based on extraction, calculates each two function different functions storehouse similarity as the input of integrated classifier Vector;If two function names are identical, label is 1, and correspondence input vector is as positive sample, otherwise as negative Sample, obtains an original training set;Wherein integrated classifier includes multiple graders;
23) there are the multiple samples of the extraction put back to from the original training set every time, construct some independent identically distributed sub- training samples Collection, is used as the training sample of each grader in the integrated classifier;
24) sub- training sample set is inputted into corresponding grader respectively to be trained, is then based on leak function and each letter to be measured Several features is predicted using the grader trained to leak function and function to be measured, if then by intervening for obtaining Measured value weighted average is used as the numerical value similarity.
3. method as claimed in claim 1 or 2, it is characterised in that to step 1) control flow graph that obtains rebuilds, Its method is:
A) head of all basic blocks of function, tail address and original side end dot address in recognition function controlling stream graph;
B) all basic blocks are ranked up by basic block leading address ascending order order, count the in-degree and out-degree of each basic block;
C) basic block is scanned from small to large by basic block leading address ascending order order:If the out-degree of n-th of basic block is 0 And (n+1)th basic block in-degree is 0, then it is n-th new of basic block to merge the two basic blocks, deletes original n-th Individual and former (n+1)th basic block, and to being changed to by the side of end-point addresses of the leading address of former (n+1)th basic block with the The leading address of n basic block is used as end-point addresses;If the out-degree of n-th of basic block is 0 and (n+1)th basic block In-degree is not 0, then adds a side that (n+1)th basic block is pointed to by n-th of basic block, the end point letter on the side The leading address that breath is the leading address of n-th of basic block, another terminal point information is n-th of basic block.
4. the method as described in claim 1, it is characterised in that the method for calculating the overall similarity is:
A) the structure subgraph of partial structurtes information structure one of the function to be measured is intercepted from the function call graph, from the leak function The structure subgraph of partial structurtes information structure one of the leak function is intercepted in the function call graph at place;
B) the structure subgraph of interception is pressed into the hop count layering from function to be compared, and power is assigned by the significance level of comparison function is treated Weight, so that it is tax power bigraph (bipartite graph) that two structure subgraph respective layers, which are distinguished abstract, wherein set of node includes for respective layer Function, side integrates as the similarity relation of any two function in set of node, and side right is the numerical value similarity of two functions of correspondence;
C) assign power bigraph (bipartite graph) to each using Bipartite Matching algorithm layered method maximum weight matching and be used as every layer of similarity;
D) overall similarity for summing every layer of Similarity-Weighted as function to be compared.
5. the method as described in claim 1, it is characterised in that the method for determining the decision threshold is:According to the function to be measured of acquisition ROC curve is drawn with the overall similarity of leak function, the corresponding threshold value of peak of Y-X curves is taken as decision threshold.
CN201610178368.6A 2016-03-25 2016-03-25 Cross-architecture binary program vulnerability function association method Active CN107229563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610178368.6A CN107229563B (en) 2016-03-25 2016-03-25 Cross-architecture binary program vulnerability function association method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610178368.6A CN107229563B (en) 2016-03-25 2016-03-25 Cross-architecture binary program vulnerability function association method

Publications (2)

Publication Number Publication Date
CN107229563A true CN107229563A (en) 2017-10-03
CN107229563B CN107229563B (en) 2020-07-10

Family

ID=59932522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610178368.6A Active CN107229563B (en) 2016-03-25 2016-03-25 Cross-architecture binary program vulnerability function association method

Country Status (1)

Country Link
CN (1) CN107229563B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944278A (en) * 2017-12-11 2018-04-20 北京奇虎科技有限公司 A kind of kernel leak detection method and device
CN107967152A (en) * 2017-12-12 2018-04-27 西安交通大学 Software based on minimum individual path function birthmark locally plagiarizes evidence generation method
CN108140091A (en) * 2015-10-09 2018-06-08 日本电信电话株式会社 Loophole finds that device, loophole find that method and loophole find program
CN108268777A (en) * 2018-01-18 2018-07-10 中国人民大学 A kind of similarity detection method that unknown loophole discovery is carried out using patch information
CN108491228A (en) * 2018-03-28 2018-09-04 清华大学 A kind of binary vulnerability Code Clones detection method and system
CN109472145A (en) * 2017-12-29 2019-03-15 北京安天网络安全技术有限公司 A kind of code reuse recognition methods and system based on graph theory
CN109670318A (en) * 2018-12-24 2019-04-23 中国科学院软件研究所 A kind of leak detection method based on the circulation verifying of nuclear control flow graph
CN109740347A (en) * 2018-11-23 2019-05-10 中国科学院信息工程研究所 A kind of identification of the fragile hash function for smart machine firmware and crack method
CN110083534A (en) * 2019-04-19 2019-08-02 西安邮电大学 It is a kind of based on the software plagiarism detection method for about subtracting shortest path birthmark
CN110414238A (en) * 2019-06-18 2019-11-05 中国科学院信息工程研究所 The search method and device of homologous binary code
CN110598417A (en) * 2019-09-05 2019-12-20 北京理工大学 Software vulnerability detection method based on graph mining
CN110674346A (en) * 2019-10-11 2020-01-10 北京达佳互联信息技术有限公司 Video processing method, device, equipment and storage medium
CN110943981A (en) * 2019-11-20 2020-03-31 中国人民解放军战略支援部队信息工程大学 Cross-architecture vulnerability mining method based on hierarchical learning
CN110968874A (en) * 2019-11-28 2020-04-07 腾讯科技(深圳)有限公司 Vulnerability detection method, device, server and storage medium
CN111046385A (en) * 2019-11-22 2020-04-21 北京达佳互联信息技术有限公司 Software type detection method and device, electronic equipment and storage medium
CN111310178A (en) * 2020-01-20 2020-06-19 武汉理工大学 Firmware vulnerability detection method and system under cross-platform scene
CN111914260A (en) * 2020-06-22 2020-11-10 西安交通大学 Binary program vulnerability detection method based on function difference
CN112540787A (en) * 2020-12-14 2021-03-23 北京知道未来信息技术有限公司 Program reverse analysis method and device and electronic equipment
CN112800425A (en) * 2021-02-03 2021-05-14 南京大学 Code analysis method and device based on graph calculation
CN114610606A (en) * 2022-02-25 2022-06-10 中国人民解放军国防科技大学 Binary system module similarity matching method and device based on arrival-fixed value analysis

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315599A (en) * 2007-05-29 2008-12-03 北京航空航天大学 Method and device for detecting similarity of source codes
CN101398758A (en) * 2008-10-30 2009-04-01 北京航空航天大学 Detection method of code copy
CN101739337A (en) * 2009-12-14 2010-06-16 北京理工大学 Method for analyzing characteristic of software vulnerability sequence based on cluster
CN101814053A (en) * 2010-03-29 2010-08-25 中国人民解放军信息工程大学 Method for discovering binary code vulnerability based on function model
CN101968766A (en) * 2010-10-21 2011-02-09 上海交通大学 System for detecting software bug triggered during practical running of computer program
KR20150047241A (en) * 2013-10-24 2015-05-04 한양대학교 산학협력단 Method and apparatus for determing plagiarism of program using control flow graph
CN105045715A (en) * 2015-07-27 2015-11-11 电子科技大学 Programming mode and mode matching based bug clustering method
CN105160206A (en) * 2015-10-08 2015-12-16 中国科学院数学与系统科学研究院 Method and system for predicting protein interaction target point of drug

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101315599A (en) * 2007-05-29 2008-12-03 北京航空航天大学 Method and device for detecting similarity of source codes
CN101398758A (en) * 2008-10-30 2009-04-01 北京航空航天大学 Detection method of code copy
CN101739337A (en) * 2009-12-14 2010-06-16 北京理工大学 Method for analyzing characteristic of software vulnerability sequence based on cluster
CN101814053A (en) * 2010-03-29 2010-08-25 中国人民解放军信息工程大学 Method for discovering binary code vulnerability based on function model
CN101968766A (en) * 2010-10-21 2011-02-09 上海交通大学 System for detecting software bug triggered during practical running of computer program
KR20150047241A (en) * 2013-10-24 2015-05-04 한양대학교 산학협력단 Method and apparatus for determing plagiarism of program using control flow graph
CN105045715A (en) * 2015-07-27 2015-11-11 电子科技大学 Programming mode and mode matching based bug clustering method
CN105160206A (en) * 2015-10-08 2015-12-16 中国科学院数学与系统科学研究院 Method and system for predicting protein interaction target point of drug

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108140091B (en) * 2015-10-09 2021-12-31 日本电信电话株式会社 Vulnerability discovery device, vulnerability discovery method, and storage medium
CN108140091A (en) * 2015-10-09 2018-06-08 日本电信电话株式会社 Loophole finds that device, loophole find that method and loophole find program
CN107944278A (en) * 2017-12-11 2018-04-20 北京奇虎科技有限公司 A kind of kernel leak detection method and device
CN107967152B (en) * 2017-12-12 2020-06-19 西安交通大学 Software local plagiarism evidence generation method based on minimum branch path function birthmarks
CN107967152A (en) * 2017-12-12 2018-04-27 西安交通大学 Software based on minimum individual path function birthmark locally plagiarizes evidence generation method
CN109472145A (en) * 2017-12-29 2019-03-15 北京安天网络安全技术有限公司 A kind of code reuse recognition methods and system based on graph theory
CN108268777B (en) * 2018-01-18 2020-06-30 中国人民大学 Similarity detection method for carrying out unknown vulnerability discovery by using patch information
CN108268777A (en) * 2018-01-18 2018-07-10 中国人民大学 A kind of similarity detection method that unknown loophole discovery is carried out using patch information
CN108491228A (en) * 2018-03-28 2018-09-04 清华大学 A kind of binary vulnerability Code Clones detection method and system
CN108491228B (en) * 2018-03-28 2020-03-17 清华大学 Binary vulnerability code clone detection method and system
CN109740347A (en) * 2018-11-23 2019-05-10 中国科学院信息工程研究所 A kind of identification of the fragile hash function for smart machine firmware and crack method
CN109740347B (en) * 2018-11-23 2020-07-10 中国科学院信息工程研究所 Method for identifying and cracking fragile hash function of intelligent device firmware
CN109670318A (en) * 2018-12-24 2019-04-23 中国科学院软件研究所 A kind of leak detection method based on the circulation verifying of nuclear control flow graph
CN110083534A (en) * 2019-04-19 2019-08-02 西安邮电大学 It is a kind of based on the software plagiarism detection method for about subtracting shortest path birthmark
CN110414238A (en) * 2019-06-18 2019-11-05 中国科学院信息工程研究所 The search method and device of homologous binary code
CN110598417A (en) * 2019-09-05 2019-12-20 北京理工大学 Software vulnerability detection method based on graph mining
CN110598417B (en) * 2019-09-05 2021-02-12 北京理工大学 Software vulnerability detection method based on graph mining
CN110674346A (en) * 2019-10-11 2020-01-10 北京达佳互联信息技术有限公司 Video processing method, device, equipment and storage medium
CN110943981A (en) * 2019-11-20 2020-03-31 中国人民解放军战略支援部队信息工程大学 Cross-architecture vulnerability mining method based on hierarchical learning
CN110943981B (en) * 2019-11-20 2022-04-08 中国人民解放军战略支援部队信息工程大学 Cross-architecture vulnerability mining method based on hierarchical learning
CN111046385A (en) * 2019-11-22 2020-04-21 北京达佳互联信息技术有限公司 Software type detection method and device, electronic equipment and storage medium
CN111046385B (en) * 2019-11-22 2022-04-22 北京达佳互联信息技术有限公司 Software type detection method and device, electronic equipment and storage medium
CN110968874A (en) * 2019-11-28 2020-04-07 腾讯科技(深圳)有限公司 Vulnerability detection method, device, server and storage medium
CN110968874B (en) * 2019-11-28 2023-04-14 腾讯科技(深圳)有限公司 Vulnerability detection method, device, server and storage medium
CN111310178A (en) * 2020-01-20 2020-06-19 武汉理工大学 Firmware vulnerability detection method and system under cross-platform scene
CN111310178B (en) * 2020-01-20 2024-01-23 武汉理工大学 Firmware vulnerability detection method and system in cross-platform scene
CN111914260B (en) * 2020-06-22 2023-03-31 西安交通大学 Binary program vulnerability detection method based on function difference
CN111914260A (en) * 2020-06-22 2020-11-10 西安交通大学 Binary program vulnerability detection method based on function difference
CN112540787A (en) * 2020-12-14 2021-03-23 北京知道未来信息技术有限公司 Program reverse analysis method and device and electronic equipment
CN112800425A (en) * 2021-02-03 2021-05-14 南京大学 Code analysis method and device based on graph calculation
CN114610606A (en) * 2022-02-25 2022-06-10 中国人民解放军国防科技大学 Binary system module similarity matching method and device based on arrival-fixed value analysis
CN114610606B (en) * 2022-02-25 2023-03-03 中国人民解放军国防科技大学 Binary system module similarity matching method and device based on arrival-fixed value analysis

Also Published As

Publication number Publication date
CN107229563B (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN107229563A (en) A kind of binary program leak function correlating method across framework
CN111639344B (en) Vulnerability detection method and device based on neural network
CN107516041B (en) WebShell detection method and system based on deep neural network
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
CN111459799B (en) Software defect detection model establishing and detecting method and system based on Github
CN111460450B (en) Source code vulnerability detection method based on graph convolution network
CN110427755A (en) A kind of method and device identifying script file
Ganz et al. Explaining graph neural networks for vulnerability discovery
CN113536308B (en) Binary code tracing method for multi-granularity information fusion under software gene view angle
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
CN115146279A (en) Program vulnerability detection method, terminal device and storage medium
US11914507B2 (en) Software test apparatus and software test method
CN117454387A (en) Vulnerability code detection method based on multidimensional feature extraction
CN116578980A (en) Code analysis method and device based on neural network and electronic equipment
Assefa et al. Intelligent phishing website detection using deep learning
Tang et al. An attention-based automatic vulnerability detection approach with GGNN
CN117725592A (en) Intelligent contract vulnerability detection method based on directed graph annotation network
CN117370980A (en) Malicious code detection model generation and detection method, device, equipment and medium
Zhou et al. Deeptle: Learning code-level features to predict code performance before it runs
CN116975881A (en) LLVM (LLVM) -based vulnerability fine-granularity positioning method
CN116383832A (en) Intelligent contract vulnerability detection method based on graph neural network
Jiang et al. Software vulnerability detection method based on code attribute graph presentation and Bi-LSTM neural network extraction
CN115640577B (en) Vulnerability detection method and system for binary Internet of things firmware program
CN118585996B (en) Malicious mining software detection method based on large language model
CN114610606B (en) Binary system module similarity matching method and device based on arrival-fixed value analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant