CN107229563A - A kind of binary program leak function correlating method across framework - Google Patents
A kind of binary program leak function correlating method across framework Download PDFInfo
- Publication number
- CN107229563A CN107229563A CN201610178368.6A CN201610178368A CN107229563A CN 107229563 A CN107229563 A CN 107229563A CN 201610178368 A CN201610178368 A CN 201610178368A CN 107229563 A CN107229563 A CN 107229563A
- Authority
- CN
- China
- Prior art keywords
- function
- measured
- leak
- similarity
- basic block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3604—Software analysis for verifying properties of programs
- G06F11/3616—Software analysis for verifying properties of programs using software metrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
- Stored Programmes (AREA)
Abstract
The invention discloses a kind of binary program leak function correlating method across framework.This method is:1) binary file for treating binary program carries out conversed analysis, obtains a function library to be measured;Then according to the function library to be measured, function call graph, control flow graph, function base attribute are obtained;2) according to function call graph, control flow graph, each function to be measured of the basic attributes extraction of function feature;Then according to the feature of extraction and the feature of leak function, the numerical value similarity of each function to be measured and leak function is calculated;3) for each function to be measured, the tax power bigraph (bipartite graph) of the function to be measured and leak function is constructed respectively, the overall similarity of the function to be measured and leak function is calculated using bigraph (bipartite graph) algorithm;4) if the overall similarity of function to be measured and leak function is more than setting decision threshold, the function to be measured is judged as doubtful leak function, otherwise be determined as normal function.This method is realized simple, it is easy to promote.
Description
Technical field
The present invention relates to binary program bug excavation and conversed analysis field, and in particular to a kind of binary program leak function correlating method across framework, belongs to computer program detection technique field.
Background technology
With the rapid popularization of the high speed development and information system, geo-informatization system of global information technology, computer software has become World Economics, science and technology, military affairs and the important composition of social development.Practice have shown that, most information security events are all that attacker initiates by software vulnerability.Therefore, security breaches are the deciding factors for directly affecting information safety system, it is necessary to software vulnerability is analyzed and utilized.Leak analysis can be divided into source code level and binary level by analyzed object.The leak analysis technology of source code level is directly to being analyzed with the program of high level language.Analyst, which can utilize, enriches complete semantic information in source code, by a series of leak analysis technologies, code error and design defect in discovery procedure.But a large amount of business softwares exist in binary code form in actual applications, and source code is but difficult to obtain.Therefore, binary program leak analysis has been increasingly becoming an important branch of information security field.
Function corresponding technology is based primarily upon the detection of binary code similitude.The application scenarios of early stage are to calculate for the similarity of two binary files with schema compilation to enter line function association, due to being for being compiled with framework, the assembler obtained after dis-assembling is same instruction set, therefore assembler can be regarded to character string as, directly carries out similarity analysis and processing.2013, Arun Lakhotia, which propose a kind of method of semantic template, was used for the quick positioning of similar codes fragment.2014, Yaniv David compiled distance to calculate the similarity degree of basic block using character.But, researcher has found, if the compiling optimization option used during compiling binary file is different, even when being that there is also very big difference for the obtained assembler of same section of source code dis-assembling, this means that showing assembler form relies on stronger method to compiling optimization option sensitivity, so researcher will study point and turn to the semantic information relatively low to assembler performance form dependence, the semantic information for starting extraction procedure fragment is used as feature.2014, Jannik Pewny proposed a kind of leak association algorithm based on semantic signature, by it is instruction morphing in basic block be expression formula, and be stored as tree construction, similarity calculated using tree compiling distance, and realize prototype TEDEM.The same year, Manuel Egele propose a kind of binary code similarity detection method based on dynamic instrumentation technology, mainly analog function dynamic operation environment enters line code retrieval as the feature of function, it ensures that each basic block is at least executed once by being performed again from the basic BOB(beginning of block) of function entrance along certain execution route, and realizes prototype BLEX.Later, third party code storehouse was compiled and was deployed on different CPU platforms by increasing IOT producers, and this means that the demand that leak function can be searched in the binary file for any schema compilation will be increasing.Existing function corresponding technology or due to method limit to (such as the function corresponding technology based on detection assembler character string similarity degree) or due to instrument limitation (such as dynamic instrumentation instrument PIN only face x86 platforms) can not be applied directly to across in the scene of framework come.Jannik Pewny in 2015 have delivered Cross-Architecture Bug Search in Binary Executables on S&P.The paper is proposed across this application scenarios of framework first, and (x86, ARM, MIPS) basic block Semantic features extraction across framework is realized using methods such as lifting intermediate language representation, numerical sample and min-hashs.But the accuracy rate of this method is unsatisfactory, rank1 just reaches 32.4% during the function similarity degree for the openssl firmwares for being used for being directed to ARM frameworks and MIPS frameworks more respectively using this method.It is therefore desirable to study the leak corresponding technology across framework, a kind of higher correlating method of accuracy rate is proposed.
At present, a kind of realization is lacked simply, accuracy rate is high, across the binary program leak corresponding technology of framework.
The content of the invention
Present invention aims at provide a kind of binary program leak function correlating method across framework.Method flow of the present invention mainly includes:Conversed analysis is carried out to binary file and obtains function library to be measured, the numerical value similarity of function to be measured and leak function is calculated;Two structure subgraphs of partial structurtes information structure of two functions to be compared are intercepted from function call graph;By two structure subgraph Hierarchical abstractions to assign power bigraph (bipartite graph), calculated using Bipartite Matching algorithm and assign power bigraph (bipartite graph) maximum weight matching, weighted sum and is ranked up accordingly as the overall similarity of two functions;Decision threshold is calculated based on ROC curve, the function that similarity is more than decision threshold is judged as doubtful leak function, carries out next step analysis, otherwise is determined as normal function, does not deal with.
Reconstruction of function controlling stream graph algorithm and structure match algorithm when calculating overall similarity when the innovation point of the present invention is to calculate similarity.The present invention has merged the numerical information and structural information of function, and the extraction of feature can enter line function association, as a result accuracy rate is high independent of specific instruction set to the binary file under different frameworks, realize simple.
To achieve the above object, the present invention is adopted the following technical scheme that:
A kind of binary program leak function correlating method across framework, mainly comprising following 3 steps:
1) the numerical value similarity of function to be measured and leak function is calculated.Conversed analysis is carried out to binary file first and obtains function library to be measured;Extract call relation information (i.e. function call graph) between function to be measured, the basic aspect information of attribute information three of controlling stream graph information, function carries out the processing that quantizes in function, is used as the characteristic vector of function;, as training sample, integrated classifier is trained using from compiling, multi-platform, tape symbol table collection of functions;Calculate function to be measured and the similarity of each feature of leak function constitutes similarity vector, bring into integrated classifier and be predicted, obtain numerical value similarity.
2) construction assigns power bigraph (bipartite graph), and overall similarity is calculated using bigraph (bipartite graph) algorithm.Two structure subgraphs of partial structurtes information structure of two functions to be compared are intercepted from function call graph, the number of plies of interception can be determined according to actual needs.Two structure subgraph Hierarchical abstractions are weighed into bigraph (bipartite graph) to assign, wherein set of node is the function that two structure subgraph respective layers are included, side integrates as the similarity degree of any two function, side right is that previous step calculating obtains numerical value similarity, then the maximum weight matching of power bigraph (bipartite graph) is assigned using Bipartite Matching algorithm layered method, weighted sum is used as function to be measured and the overall similarity of leak function.
3) judged according to the decision threshold calculated based on ROC curve.The overall similarity vector-drawn ROC curve of collection of functions to be measured and leak function is obtained, takes the corresponding threshold value of peak of Y-X curves as decision threshold, the function that similarity is more than decision threshold is judged as doubtful leak function, otherwise is determined as normal function.The each point for constituting ROC curve is (x, y), then the curve that (x, y-x) is constituted is the Y-X curves based on ROC curve, and wherein x domain of definition is M.
The present invention can obtain following beneficial effect:
The present invention is when calculating the numerical value similarity of function to be measured and leak function, mainly consider call relation feature, stack space feature, character string feature, code size feature, path sequence feature, path essential characteristic, degree series feature, degree essential characteristic, 9 aspect features such as figure scale feature, the more complete characteristic feature for reflecting a function, the extraction of feature is independent of specific instruction set, therefore the present invention can carry out leak association to the binary file for two different schema compilations.Simultaneously, when extracting feature, using being extracted by the way of writing IDA plug-in units from IDA analysis results, and IDA has difference when carrying out conversed analysis constructed fuction controlling stream graph to the binary files of different frameworks in itself, the present invention proposes control flow graph restructing algorithm, the real structure of control flow graph is reduced to a certain extent, improves the degree of accuracy of Function feature extraction.
The present invention employs cutted function calling figure in the numerical information in fusion function and structural information, and construction assigns the method that power bigraph (bipartite graph) calculates maximum weight matching.Assuming that contribution of the function node nearer apart from function to be checked to matching is bigger, function node is layered by the hop count apart from function to be checked, the similarity that minimum bipartite graph matching obtains individual layer is carried out to individual layer function node using Kuhn-Munkres algorithms, the Similarity-Weighted summation of each layer is finally obtained into function overall similarity.This method is when calculating the overall similarity of function to be matched, based on the recalls information between function, it is contemplated that the similarity degree of other function pairs treats the influence of adaptation function pair.It is more objective and accurate compared to the method for only using numerical value.
Of the invention and existing technology ratio, independent of specific instruction set, can carry out leak association to the binary file of different frameworks, realize simple, it is easy to promote.
Brief description of the drawings
Fig. 1 is protocol procedures schematic diagram;
Fig. 2 is that IDA differs greatly schematic diagram to the CFG figures of the Functional Analysis under different frameworks, wherein
(a) scheme for the CFG of the busybox-1.20.0 of arm schema compilations mencap_main functions,
(b) scheme for the CFG of the busybox-1.20.0 of mips schema compilations mencap_main functions;
Fig. 3 is reconstruct control flow graph schematic diagram;
Fig. 4 is that structure subgraph is layered schematic diagram;
Fig. 5 assigns power bigraph (bipartite graph) schematic diagram for construction;
Fig. 6 is to determine optimal threshold schematic diagram based on ROC curve.
Embodiment
A kind of binary program leak correlating method across framework, embodiment is as follows:
1) IDA plug-in units are write conversed analysis is carried out to binary file, obtain function library to be measured and function base attribute, function call graph and control flow graph.
2) the numerical value similarity of function to be measured and leak function is calculated.Whole process is extracted including numerical characteristics, three steps of Similarity Measure and neural network prediction similarity.
The stage is extracted in numerical characteristics, numerical characteristics extraction is carried out in terms of function base attribute, function call graph and control flow graph three respectively.Mainly extract call relation feature, character string feature, stack space feature, code size feature, path sequence feature, the path essential characteristic of function to be measured, degree series feature, degree essential characteristic, the nine aspect features such as figure scale feature.This nine aspects feature more intactly reflects the Representative properties of a function.
Analytic function calling figure, calculate each function to be measured by the number of times of other function calls, calculate the number of times after the number of times and duplicate removal of the function call other functions, constitute call relation feature.
Analytic function base attribute, calculates stack space, constitutes stack space feature;Jump instruction number, number of instructions are calculated, size of code constitutes code size feature;The character string quantity called and the string assemble called are calculated, character string feature is constituted.
Before analyzing control flow graph, to feature extraction directly can not be carried out using the IDA control flow graphs (CFG figures) analyzed.In a few cases, CFG figure of the Same Function under different frameworks can be very different, such as the memcap_main functions of busybox, and its CFG figure under ARM frameworks and MIPS frameworks has very big difference, as shown in Figure 2.This is because, the cpu instruction collection of every kind of platform is all responsible for processing by corresponding IDA processor modules.But the strategy of each platform processor module generation CFG figures is simultaneously differed, such as busybox rmdir_main functions, ARM platforms bl instructions are divided to basic block, and the jal (being all function call instruction) under MIPS platforms is not divided to basic block.For the basic block division rule of unified CFG figures, it would be desirable to CFG figures are rebuild, restructing algorithm is as follows
A) address end to end of all basic blocks of recognition function and original side end dot address.
B) all basic blocks are ranked up by basic block leading address ascending order order, count the in-degree and out-degree of each basic block.
C) basic block is scanned from small to large by basic block leading address ascending order order.If the out-degree of n-th of basic block is 0 and (n+1)th basic block in-degree is 0, it is n-th new of basic block then to merge the two basic blocks, delete former n-th and former (n+1)th basic block, and to resetting as the side of end-point addresses using the leading address of former (n+1)th basic block, be changed to be used as end-point addresses using the leading address of n-th of basic block;If the out-degree of n-th of basic block is 0 and (n+1)th basic block in-degree is not 0, the side that (n+1)th basic block is pointed in addition one by n-th basic block, the leading address that its terminal point information is the leading address of n-th basic block and terminal point information is n-th of basic block.
D) until last basic block is arrived in scanning, restructuring procedure terminates.
The reconstruct CFG nomography source codes realized with python are as follows, wherein input parameter bbList refers to the list constituted end to end of all basic blocks, edgeList is the list on all original sides of IDA analyses, startPoint is the function entrance address, wherein output toDic is to rebuild CFG to scheme the dictionary that all sides are constituted, and bbDic is to rebuild the dictionary that all basic blocks are constituted after CFG figures.Memcap_main functions reconstruction effect to busybox is as shown in Figure 3.
Analytic function controlling stream graph, calculates the in-degree that goes out of each node (i.e. basic block), constructs CFG digraph adjacency matrix, control flow graph is converted into non-directed graph, calculate the degree of each node, construct CFG non-directed graph adjacency matrix.To CFG digraphs adjacency matrix and the adjacency matrix progress degree analysis of CFG non-directed graphs.In-degree ascending sequence, out-degree ascending sequence are calculated based on CFG digraphs adjacency matrix, based on CFG non-directed graph adjacency matrix calculating degree ascending sequences, three constitutes degree series feature.
Based on degree ascending sequence, the probability sequence of maximal degree, average degree and degree is calculated.Probability sequence based on degree calculates the entropy of figure, construction degree essential characteristic;Path analysis is carried out to CFG non-directed graphs adjacency matrix, the minimum range of any two node (i.e. basic block) is calculated by Floyd algorithms or dijkstra's algorithm, path sequence feature is constructed;Figure average path length, figure diameter and figure radius are calculated, path essential characteristic is constituted.Analyses of basic attributes of sci is carried out to CFG digraphs adjacency matrix, calculate node number, side number, the link one after another of figure, figure density, the cluster coefficients of figure constitute CFG figure scale features.
Operated by above step, call relation feature, character string feature, stack space feature, code size feature, path sequence feature, the path essential characteristic of function, degree series feature, degree essential characteristic and figure scale feature are extracted altogether.
In characteristic similarity calculation stages, the form of expression of feature based, sequence similarity computational methods using numeric type similarity calculating method, based on string editing distance algorithm and the set similarity calculating method based on Jaccard similarities, calculate function to be compared each feature similarity degree as integrated classifier input vector.
The overall similarity stage is predicted in integrated classifier, as training sample, integrated classifier is trained using from compiling, multi-platform, tape symbol table collection of functions first.Specific method is:Selection selects different compilers with a source code, and different optimization options is compiled for different frameworks, obtains many parts of binary executables.Conversed analysis is carried out to every a binary executable, a function library is obtained and extracts the multidimensional characteristic of each function.Feature based, calculates each two function different functions storehouse similarity as the input vector of integrated classifier.If two function names are identical, label is 1, as positive sample, if two function name differences, and label is 0, is used as negative sample.Set up some preliminary classification devices.There are some independent identically distributed sub- training sample sets of sample architecture for the extraction 80% put back to from original training set, be used as the training sample of each grader.Corresponding sub- training sample set input grader is trained, according to predicting the outcome, the parameter of grader is adjusted and is met the requirements until predicting the outcome, now classifier training is finished.Then it is predicted using the integrated classifier logarithm value similarity trained.Feature is extracted to the leak function and each function to be measured, similarity vector is calculated, is used as test sample.It is predicted with some graders in the integrated classifier trained and obtains some predicted values, takes its weighted average as final predicted value as numerical value similarity.
Such as, the training sample of this match patterns of MIPS-O2 → ARM-O2 is obtained if desired.
Step one:MIPS frameworks are directed to openssl source codes, using a binary file of-O2 optimization option compilings, openssl-MIPS-O2 are named as;ARM frameworks are directed to openssl source codes, using a binary file of-O2 optimization option compilings, openssl-ARM-O2 are named as.
Step 2:Conversed analysis is carried out respectively to this two parts of binary files and obtains two function libraries.If openssl-MIPS-O2 function library has m function, entitled X1- MIPS-O2, X2- MIPS-O2 ... ..., Xm-MIPS-O2;Openssl-ARM-O2 function library has n function, entitled Y1- ARM-O2, X2- ARM-O2 ... ..., Yn-ARM-O2.Feature is calculated to all functions in the two storehouses, one is obtained m+n bar features.
Step 3:Function similarity vector between storehouse is calculated, m × n similarity vector is obtained, if Xi=Yj, then it is considered that the function X in openssl-MIPS-O2 storehousesi- the MIPS-O2 and function Y in openssl-ARM-O2 storehousesj- ARM-O2 is same function, then label is classified as 1, is positive sample, conversely, being considered negative sample.
Step 4:In order to which positive and negative sample is balanced and also to speedup, function and 100 openssl-ARM-O2 function every time to 100 openssl-MIPS-O2 carry out Similarity Measure two-by-two and label mark, then can obtain 100 positive samples and 9900 negative samples.Collect the positive sample of whole and 100 are randomly selected from 9900 negative samples as negative sample.
The individual positive samples of min (m, n) and same amount of negative sample have thus been obtained, the original training set of this match patterns of MIPS-O2 → ARM-O2 is used as.
3) construction assigns power bigraph (bipartite graph), and overall similarity is calculated using Bipartite Matching algorithm (such as Kuhn-Munkres algorithms).Whole algorithm steps are as follows:
A) two structure subgraphs of partial structurtes information structure of function to be compared are intercepted from function call graph, wherein, the number of plies of interception can be determined according to experiment effect.
B) the structure subgraph of interception is pressed into the hop count layering from function to be compared (wherein, if function call graph of the structure subgraph from the binary file where leak function, function to be compared herein refers to leak function;If function call graph of the structure subgraph from the binary file where function to be measured, function to be compared herein refers to function to be measured), and weight is assigned by the significance level of comparison function is treated, as shown in Figure 4.
C) two subgraph respective layers are abstract to assign power complete bipartite graph, the function that wherein set of node includes for respective layer, side integrates as the similarity relation of any two function in set of node, and side right is the numerical value similarity of two functions of correspondence, as shown in Figure 5.So just obtain multiple assign and weigh bigraph (bipartite graph).
D) to each power bigraph (bipartite graph) of assigning using similarity of the corresponding maximum weight matching of each layer of Bipartite Matching algorithm layered method as respective layer.
E) overall similarity for summing every layer of Similarity-Weighted as function to be compared.
4) judged according to the decision threshold calculated based on ROC curve.Obtain the overall similarity vector-drawn ROC curve of collection of functions to be measured and leak function.Wherein ROC curve transverse axis is false positive rate, i.e., the ratio (FP/ (FP+TN)) of pseudo- positive example;The longitudinal axis is kidney-Yang rate, i.e., the ratio (TP/ (TP+FN)) of real example.What ROC curve was provided is the situation of change of vacation sun rate and kidney-Yang rate when changes of threshold, and it can be used for the performance of comparator-sorter.Ideally, optimal classification device should be located at the upper left corner, it is meant that grader obtains high kidney-Yang rate when false positive rate is very low, really leak function check will come out, and seldom normal function is mistaken for into leak function.It is the minimum optimal threshold of mistake closer to the point of the ROC curve in the upper left corner, minimum, the i.e. maximum Y-X point of its false positive and false negative sum on training set, as shown in Figure 6.Therefore we are using the corresponding threshold value of the peak of Y-X curves as decision threshold, and the function that similarity is more than decision threshold is judged as doubtful leak function, otherwise is determined as normal function.
In summary, the invention discloses a kind of binary program leak corresponding technology across framework.Application described above scene and embodiment, are not intended to limit the present invention, and any those skilled in the art without departing from the spirit and scope of the present invention, can make various changes and retouching.Therefore, protection scope of the present invention is defined depending on right.
Claims (5)
1. a kind of binary program leak function correlating method across framework, its step is:
1) binary file for treating binary program carries out conversed analysis, obtains a function library to be measured;Then according to the letter to be measured
Number storehouse, obtains function call graph, control flow graph, function base attribute;
2) according to function call graph, control flow graph, each function to be measured of the basic attributes extraction of function feature;Then basis
The feature of extraction and the feature of leak function, calculate the numerical value similarity of each function to be measured and leak function;
3) for each function to be measured, the tax power bigraph (bipartite graph) of the function to be measured and leak function is constructed respectively, using bigraph (bipartite graph) algorithm
Calculate the overall similarity of the function to be measured and leak function;
4) if the overall similarity of function to be measured and leak function is more than setting decision threshold, judge the function to be measured to be doubtful
Leak function, on the contrary it is determined as normal function.
2. the method as described in claim 1, it is characterised in that the method for calculating the numerical value similarity is:
21) many parts of binary executables of different frameworks are compiled as to same a source code;Then every a binary system can be held
Part of composing a piece of writing carries out conversed analysis, obtains a function library and extracts the feature of each function to be measured;
22) feature based on extraction, calculates each two function different functions storehouse similarity as the input of integrated classifier
Vector;If two function names are identical, label is 1, and correspondence input vector is as positive sample, otherwise as negative
Sample, obtains an original training set;Wherein integrated classifier includes multiple graders;
23) there are the multiple samples of the extraction put back to from the original training set every time, construct some independent identically distributed sub- training samples
Collection, is used as the training sample of each grader in the integrated classifier;
24) sub- training sample set is inputted into corresponding grader respectively to be trained, is then based on leak function and each letter to be measured
Several features is predicted using the grader trained to leak function and function to be measured, if then by intervening for obtaining
Measured value weighted average is used as the numerical value similarity.
3. method as claimed in claim 1 or 2, it is characterised in that to step 1) control flow graph that obtains rebuilds,
Its method is:
A) head of all basic blocks of function, tail address and original side end dot address in recognition function controlling stream graph;
B) all basic blocks are ranked up by basic block leading address ascending order order, count the in-degree and out-degree of each basic block;
C) basic block is scanned from small to large by basic block leading address ascending order order:If the out-degree of n-th of basic block is 0
And (n+1)th basic block in-degree is 0, then it is n-th new of basic block to merge the two basic blocks, deletes original n-th
Individual and former (n+1)th basic block, and to being changed to by the side of end-point addresses of the leading address of former (n+1)th basic block with the
The leading address of n basic block is used as end-point addresses;If the out-degree of n-th of basic block is 0 and (n+1)th basic block
In-degree is not 0, then adds a side that (n+1)th basic block is pointed to by n-th of basic block, the end point letter on the side
The leading address that breath is the leading address of n-th of basic block, another terminal point information is n-th of basic block.
4. the method as described in claim 1, it is characterised in that the method for calculating the overall similarity is:
A) the structure subgraph of partial structurtes information structure one of the function to be measured is intercepted from the function call graph, from the leak function
The structure subgraph of partial structurtes information structure one of the leak function is intercepted in the function call graph at place;
B) the structure subgraph of interception is pressed into the hop count layering from function to be compared, and power is assigned by the significance level of comparison function is treated
Weight, so that it is tax power bigraph (bipartite graph) that two structure subgraph respective layers, which are distinguished abstract, wherein set of node includes for respective layer
Function, side integrates as the similarity relation of any two function in set of node, and side right is the numerical value similarity of two functions of correspondence;
C) assign power bigraph (bipartite graph) to each using Bipartite Matching algorithm layered method maximum weight matching and be used as every layer of similarity;
D) overall similarity for summing every layer of Similarity-Weighted as function to be compared.
5. the method as described in claim 1, it is characterised in that the method for determining the decision threshold is:According to the function to be measured of acquisition
ROC curve is drawn with the overall similarity of leak function, the corresponding threshold value of peak of Y-X curves is taken as decision threshold.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610178368.6A CN107229563B (en) | 2016-03-25 | 2016-03-25 | Cross-architecture binary program vulnerability function association method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610178368.6A CN107229563B (en) | 2016-03-25 | 2016-03-25 | Cross-architecture binary program vulnerability function association method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107229563A true CN107229563A (en) | 2017-10-03 |
CN107229563B CN107229563B (en) | 2020-07-10 |
Family
ID=59932522
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610178368.6A Active CN107229563B (en) | 2016-03-25 | 2016-03-25 | Cross-architecture binary program vulnerability function association method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107229563B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944278A (en) * | 2017-12-11 | 2018-04-20 | 北京奇虎科技有限公司 | A kind of kernel leak detection method and device |
CN107967152A (en) * | 2017-12-12 | 2018-04-27 | 西安交通大学 | Software based on minimum individual path function birthmark locally plagiarizes evidence generation method |
CN108140091A (en) * | 2015-10-09 | 2018-06-08 | 日本电信电话株式会社 | Loophole finds that device, loophole find that method and loophole find program |
CN108268777A (en) * | 2018-01-18 | 2018-07-10 | 中国人民大学 | A kind of similarity detection method that unknown loophole discovery is carried out using patch information |
CN108491228A (en) * | 2018-03-28 | 2018-09-04 | 清华大学 | A kind of binary vulnerability Code Clones detection method and system |
CN109472145A (en) * | 2017-12-29 | 2019-03-15 | 北京安天网络安全技术有限公司 | A kind of code reuse recognition methods and system based on graph theory |
CN109670318A (en) * | 2018-12-24 | 2019-04-23 | 中国科学院软件研究所 | A kind of leak detection method based on the circulation verifying of nuclear control flow graph |
CN109740347A (en) * | 2018-11-23 | 2019-05-10 | 中国科学院信息工程研究所 | A kind of identification of the fragile hash function for smart machine firmware and crack method |
CN110083534A (en) * | 2019-04-19 | 2019-08-02 | 西安邮电大学 | It is a kind of based on the software plagiarism detection method for about subtracting shortest path birthmark |
CN110414238A (en) * | 2019-06-18 | 2019-11-05 | 中国科学院信息工程研究所 | The search method and device of homologous binary code |
CN110598417A (en) * | 2019-09-05 | 2019-12-20 | 北京理工大学 | Software vulnerability detection method based on graph mining |
CN110674346A (en) * | 2019-10-11 | 2020-01-10 | 北京达佳互联信息技术有限公司 | Video processing method, device, equipment and storage medium |
CN110943981A (en) * | 2019-11-20 | 2020-03-31 | 中国人民解放军战略支援部队信息工程大学 | Cross-architecture vulnerability mining method based on hierarchical learning |
CN110968874A (en) * | 2019-11-28 | 2020-04-07 | 腾讯科技(深圳)有限公司 | Vulnerability detection method, device, server and storage medium |
CN111046385A (en) * | 2019-11-22 | 2020-04-21 | 北京达佳互联信息技术有限公司 | Software type detection method and device, electronic equipment and storage medium |
CN111310178A (en) * | 2020-01-20 | 2020-06-19 | 武汉理工大学 | Firmware vulnerability detection method and system under cross-platform scene |
CN111914260A (en) * | 2020-06-22 | 2020-11-10 | 西安交通大学 | Binary program vulnerability detection method based on function difference |
CN112540787A (en) * | 2020-12-14 | 2021-03-23 | 北京知道未来信息技术有限公司 | Program reverse analysis method and device and electronic equipment |
CN112800425A (en) * | 2021-02-03 | 2021-05-14 | 南京大学 | Code analysis method and device based on graph calculation |
CN114610606A (en) * | 2022-02-25 | 2022-06-10 | 中国人民解放军国防科技大学 | Binary system module similarity matching method and device based on arrival-fixed value analysis |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315599A (en) * | 2007-05-29 | 2008-12-03 | 北京航空航天大学 | Method and device for detecting similarity of source codes |
CN101398758A (en) * | 2008-10-30 | 2009-04-01 | 北京航空航天大学 | Detection method of code copy |
CN101739337A (en) * | 2009-12-14 | 2010-06-16 | 北京理工大学 | Method for analyzing characteristic of software vulnerability sequence based on cluster |
CN101814053A (en) * | 2010-03-29 | 2010-08-25 | 中国人民解放军信息工程大学 | Method for discovering binary code vulnerability based on function model |
CN101968766A (en) * | 2010-10-21 | 2011-02-09 | 上海交通大学 | System for detecting software bug triggered during practical running of computer program |
KR20150047241A (en) * | 2013-10-24 | 2015-05-04 | 한양대학교 산학협력단 | Method and apparatus for determing plagiarism of program using control flow graph |
CN105045715A (en) * | 2015-07-27 | 2015-11-11 | 电子科技大学 | Programming mode and mode matching based bug clustering method |
CN105160206A (en) * | 2015-10-08 | 2015-12-16 | 中国科学院数学与系统科学研究院 | Method and system for predicting protein interaction target point of drug |
-
2016
- 2016-03-25 CN CN201610178368.6A patent/CN107229563B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101315599A (en) * | 2007-05-29 | 2008-12-03 | 北京航空航天大学 | Method and device for detecting similarity of source codes |
CN101398758A (en) * | 2008-10-30 | 2009-04-01 | 北京航空航天大学 | Detection method of code copy |
CN101739337A (en) * | 2009-12-14 | 2010-06-16 | 北京理工大学 | Method for analyzing characteristic of software vulnerability sequence based on cluster |
CN101814053A (en) * | 2010-03-29 | 2010-08-25 | 中国人民解放军信息工程大学 | Method for discovering binary code vulnerability based on function model |
CN101968766A (en) * | 2010-10-21 | 2011-02-09 | 上海交通大学 | System for detecting software bug triggered during practical running of computer program |
KR20150047241A (en) * | 2013-10-24 | 2015-05-04 | 한양대학교 산학협력단 | Method and apparatus for determing plagiarism of program using control flow graph |
CN105045715A (en) * | 2015-07-27 | 2015-11-11 | 电子科技大学 | Programming mode and mode matching based bug clustering method |
CN105160206A (en) * | 2015-10-08 | 2015-12-16 | 中国科学院数学与系统科学研究院 | Method and system for predicting protein interaction target point of drug |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108140091B (en) * | 2015-10-09 | 2021-12-31 | 日本电信电话株式会社 | Vulnerability discovery device, vulnerability discovery method, and storage medium |
CN108140091A (en) * | 2015-10-09 | 2018-06-08 | 日本电信电话株式会社 | Loophole finds that device, loophole find that method and loophole find program |
CN107944278A (en) * | 2017-12-11 | 2018-04-20 | 北京奇虎科技有限公司 | A kind of kernel leak detection method and device |
CN107967152B (en) * | 2017-12-12 | 2020-06-19 | 西安交通大学 | Software local plagiarism evidence generation method based on minimum branch path function birthmarks |
CN107967152A (en) * | 2017-12-12 | 2018-04-27 | 西安交通大学 | Software based on minimum individual path function birthmark locally plagiarizes evidence generation method |
CN109472145A (en) * | 2017-12-29 | 2019-03-15 | 北京安天网络安全技术有限公司 | A kind of code reuse recognition methods and system based on graph theory |
CN108268777B (en) * | 2018-01-18 | 2020-06-30 | 中国人民大学 | Similarity detection method for carrying out unknown vulnerability discovery by using patch information |
CN108268777A (en) * | 2018-01-18 | 2018-07-10 | 中国人民大学 | A kind of similarity detection method that unknown loophole discovery is carried out using patch information |
CN108491228A (en) * | 2018-03-28 | 2018-09-04 | 清华大学 | A kind of binary vulnerability Code Clones detection method and system |
CN108491228B (en) * | 2018-03-28 | 2020-03-17 | 清华大学 | Binary vulnerability code clone detection method and system |
CN109740347A (en) * | 2018-11-23 | 2019-05-10 | 中国科学院信息工程研究所 | A kind of identification of the fragile hash function for smart machine firmware and crack method |
CN109740347B (en) * | 2018-11-23 | 2020-07-10 | 中国科学院信息工程研究所 | Method for identifying and cracking fragile hash function of intelligent device firmware |
CN109670318A (en) * | 2018-12-24 | 2019-04-23 | 中国科学院软件研究所 | A kind of leak detection method based on the circulation verifying of nuclear control flow graph |
CN110083534A (en) * | 2019-04-19 | 2019-08-02 | 西安邮电大学 | It is a kind of based on the software plagiarism detection method for about subtracting shortest path birthmark |
CN110414238A (en) * | 2019-06-18 | 2019-11-05 | 中国科学院信息工程研究所 | The search method and device of homologous binary code |
CN110598417A (en) * | 2019-09-05 | 2019-12-20 | 北京理工大学 | Software vulnerability detection method based on graph mining |
CN110598417B (en) * | 2019-09-05 | 2021-02-12 | 北京理工大学 | Software vulnerability detection method based on graph mining |
CN110674346A (en) * | 2019-10-11 | 2020-01-10 | 北京达佳互联信息技术有限公司 | Video processing method, device, equipment and storage medium |
CN110943981A (en) * | 2019-11-20 | 2020-03-31 | 中国人民解放军战略支援部队信息工程大学 | Cross-architecture vulnerability mining method based on hierarchical learning |
CN110943981B (en) * | 2019-11-20 | 2022-04-08 | 中国人民解放军战略支援部队信息工程大学 | Cross-architecture vulnerability mining method based on hierarchical learning |
CN111046385A (en) * | 2019-11-22 | 2020-04-21 | 北京达佳互联信息技术有限公司 | Software type detection method and device, electronic equipment and storage medium |
CN111046385B (en) * | 2019-11-22 | 2022-04-22 | 北京达佳互联信息技术有限公司 | Software type detection method and device, electronic equipment and storage medium |
CN110968874A (en) * | 2019-11-28 | 2020-04-07 | 腾讯科技(深圳)有限公司 | Vulnerability detection method, device, server and storage medium |
CN110968874B (en) * | 2019-11-28 | 2023-04-14 | 腾讯科技(深圳)有限公司 | Vulnerability detection method, device, server and storage medium |
CN111310178A (en) * | 2020-01-20 | 2020-06-19 | 武汉理工大学 | Firmware vulnerability detection method and system under cross-platform scene |
CN111310178B (en) * | 2020-01-20 | 2024-01-23 | 武汉理工大学 | Firmware vulnerability detection method and system in cross-platform scene |
CN111914260B (en) * | 2020-06-22 | 2023-03-31 | 西安交通大学 | Binary program vulnerability detection method based on function difference |
CN111914260A (en) * | 2020-06-22 | 2020-11-10 | 西安交通大学 | Binary program vulnerability detection method based on function difference |
CN112540787A (en) * | 2020-12-14 | 2021-03-23 | 北京知道未来信息技术有限公司 | Program reverse analysis method and device and electronic equipment |
CN112800425A (en) * | 2021-02-03 | 2021-05-14 | 南京大学 | Code analysis method and device based on graph calculation |
CN114610606A (en) * | 2022-02-25 | 2022-06-10 | 中国人民解放军国防科技大学 | Binary system module similarity matching method and device based on arrival-fixed value analysis |
CN114610606B (en) * | 2022-02-25 | 2023-03-03 | 中国人民解放军国防科技大学 | Binary system module similarity matching method and device based on arrival-fixed value analysis |
Also Published As
Publication number | Publication date |
---|---|
CN107229563B (en) | 2020-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107229563A (en) | A kind of binary program leak function correlating method across framework | |
CN111639344B (en) | Vulnerability detection method and device based on neural network | |
CN107516041B (en) | WebShell detection method and system based on deep neural network | |
CN108446540B (en) | Program code plagiarism type detection method and system based on source code multi-label graph neural network | |
CN111459799B (en) | Software defect detection model establishing and detecting method and system based on Github | |
CN111460450B (en) | Source code vulnerability detection method based on graph convolution network | |
CN110427755A (en) | A kind of method and device identifying script file | |
Ganz et al. | Explaining graph neural networks for vulnerability discovery | |
CN113536308B (en) | Binary code tracing method for multi-granularity information fusion under software gene view angle | |
CN113297580B (en) | Code semantic analysis-based electric power information system safety protection method and device | |
CN115146279A (en) | Program vulnerability detection method, terminal device and storage medium | |
US11914507B2 (en) | Software test apparatus and software test method | |
CN117454387A (en) | Vulnerability code detection method based on multidimensional feature extraction | |
CN116578980A (en) | Code analysis method and device based on neural network and electronic equipment | |
Assefa et al. | Intelligent phishing website detection using deep learning | |
Tang et al. | An attention-based automatic vulnerability detection approach with GGNN | |
CN117725592A (en) | Intelligent contract vulnerability detection method based on directed graph annotation network | |
CN117370980A (en) | Malicious code detection model generation and detection method, device, equipment and medium | |
Zhou et al. | Deeptle: Learning code-level features to predict code performance before it runs | |
CN116975881A (en) | LLVM (LLVM) -based vulnerability fine-granularity positioning method | |
CN116383832A (en) | Intelligent contract vulnerability detection method based on graph neural network | |
Jiang et al. | Software vulnerability detection method based on code attribute graph presentation and Bi-LSTM neural network extraction | |
CN115640577B (en) | Vulnerability detection method and system for binary Internet of things firmware program | |
CN118585996B (en) | Malicious mining software detection method based on large language model | |
CN114610606B (en) | Binary system module similarity matching method and device based on arrival-fixed value analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |