CN106649218A - Quick binary file comparing method based on SimHash algorithm - Google Patents
Quick binary file comparing method based on SimHash algorithm Download PDFInfo
- Publication number
- CN106649218A CN106649218A CN201611009372.6A CN201611009372A CN106649218A CN 106649218 A CN106649218 A CN 106649218A CN 201611009372 A CN201611009372 A CN 201611009372A CN 106649218 A CN106649218 A CN 106649218A
- Authority
- CN
- China
- Prior art keywords
- binary file
- function
- simhash
- keyword
- basic block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a binary file comparing method based on a SimHash algorithm. The method comprises the steps that firstly, a writing plug-in conducts information extraction on a binary file by means of the extended function of an IDA Pro, wherein the information comprises an assembly instruction sequence, a control flow diagram and call flow diagram information of the binary file; secondly, the extracted binary file information is preprocessed; thirdly, key word definition is conducted on the preprocessed binary file information; fourthly, weight evaluation is conducted on extracted key words; fifthly, by means of the extracted key words and the weights thereof, the SimHash minutia feature of a function is calculated and stored; sixthly, based on an inquired analogue result, accurate matching is conducted by means of a classical algorithm based on structured matching. The binary file comparing method has the advantages of being good in universality, high in efficiency, high in accuracy and the like.
Description
Technical field
Present invention relates generally to computer system security technical field, a kind of two entering based on SimHash algorithms is refered in particular to
The quick comparative approach of file processed.
Background technology
With the extensive application developed rapidly with internet of computer technology, the scale of software itself is also with function
Diversity and become increasing.Increasingly abundant function brings many peaces while good experience is brought to user, also
Full problem.Meanwhile, the software of certain scale can use unavoidably third-party component, and third-party component often lacks source generation
Code:Such as the dynamic link library file of microsoft system, it is intended to carry out such software code inspection, reverse-engineering means are almost only
One selection.
In reverse-engineering, the main task of static analysis is erased completely in all or part of recovery binary file
Function and data message, be that the work such as the network analysis of later release, the crucial point location that equipment is attacked and control is utilized are established
Fixed basis.But because current software version is numerous, in large scale, this work increasingly cannot be completed by manpower.Cause
This reverse demand of automation is arisen at the historic moment.It is exactly the comparison technology of binary file in main method used among these.
At present, conventional binary file comparison method substantially has four kinds, is respectively based on the binary system byte of source file
Content is compared, is compared based on the assembly instruction after source file dis-assembling, the similitude graphics Web publishing based on assembly instruction and be based on
The structured graphics of assembly instruction compares.This several method constantly overcomes the defect of previous methods presence according to sequencing, but
For some large complicated binary files, these methods there are still deficiency.
In existing binary system comparative approach, in addition to the method for directly comparing binary content, mostly employ:Extract special
Levy, traversal function set finds adaptation function, the three-step-march flow process of iterated revision matching result.For two binary files A
And B.Assume that N (A) represents the number of function in A, N (B) represents the number of function in B, and t represents that two characteristic signatures are more required
The time wanted, n represents loop iteration number of times.The then time complexity of the algorithm flow substantially O (n*N (A) * N (B) * t).Though
Then continuous certain methods are improved, can progressively drawdown ratio compared with set, but analyze from overall time complexity, not
It is obviously improved.For some large softwares, the function set after dis-assembling is often even up to a million for hundreds of thousands, for
The number of comparisons of conventional method has reached 10 for these softwares10~12Secondary (passing over characteristic matching time and iterations), this
The time efficiency of sample is not acceptable, need a kind of new method energy drawdown ratio compared with set, only calculating those may be similar
Function between matched.
The content of the invention
The technical problem to be solved in the present invention is that:For the technical problem that prior art is present, the present invention provides one
Kind versatility is good, the binary file comparative approach based on SimHash algorithms with greater efficiency and the degree of accuracy.
To solve above-mentioned technical problem, the present invention is employed the following technical solutions:
A kind of binary file comparative approach based on SimHash algorithms, its step is:
S1:Using the expanded function of IDA Pro, write plug-in unit carries out information extraction to binary file;Described information bag
Include the assembly instruction sequence of binary file, controlling stream graph, call flow graph information;
S2:Binary file information to extracting is pre-processed;
S3:Keyword definition is carried out to the binary file information after pretreated;
S4:Keyword to extracting carries out weight measurement;
S5:Using the keyword and its weight that extract, the SimHash fingerprint characteristics of function are calculated, and to fingerprint characteristic
Stored;
S6:Based on the analog result after inquiry, then accurately mate is carried out using the classic algorithm based on structure match.
As a further improvement on the present invention:In step s3, the definition to keyword considers:Single instruction, basic block,
Elementary path;The command code and operand of single instruction are taken as keyword, with SPP algorithms the SPP values of each basic block are calculated,
The SPP values of elementary path are used as keyword.
As a further improvement on the present invention:In step s 4, keyword importance degree is:Elementary path > basic blocks
> single instructions;For single instruction, the weight of its command code and operand is 1;For basic block, its weight is it
SPP values;For elementary path, its weight is the number of instructions that the path is passed through.
As a further improvement on the present invention:It is in the detailed process of step S3:
S301:Segmentation merged block is carried out to controlling stream graph;
S302:Register Fuzzy processing is carried out to command sequence;
S303:Address information in command sequence is redirected.
As a further improvement on the present invention:In step S301, when a basic block only one of which sub-block in CFG, and
The sub-block also only one of which father block when, the two basic blocks be defined as split block;Assume the collection of all basic blocks in function F
It is combined into B, p (a) is the set of the father node of basic block a in CFG, c (a) is the set of the child node of basic block a in CFG, e (a,
B) for, with a as starting point, b is a line of terminal in CFG, flow process is:
I. the basic block in set B is traveled through;If traveled through, step v is jumped to;Otherwise, take from set B basic
Block b, execution step ii;
If ii. the size of child node set c (b) of basic block b is 1, execution step iii otherwise returns i;
Iii. a=c (b) is made, if the size of father node set p (a) of basic block a is 1, execution step iv is otherwise returned
Return i;
Iv. b and a are merged into into new basic block c;Meanwhile, remove side e (b, a), { e (x, b) | x ∈ p (b) } and e (a,
x)|x∈c(a)};And set up new side { e (x, c) | x ∈ p (b) } and { e (c, x) | x ∈ c (a) };Basic block c is added simultaneously
In set B, return to step i;
V. terminate.
As a further improvement on the present invention:It is in the flow process of step S303:
I. for code segment jump instruction, the offset address after instruction is ignored when characteristic value is extracted, selects purpose
The cryptographic Hash of address block is used as feature;
Ii. for the data pointer of data segment, the opcode of the data value that the pointer is indexed is extracted as feature.
As a further improvement on the present invention:The flow process of step S5 is:
S501:The SimHash fingerprint characteristics of function are calculated;
S502:The SimHash fingerprint characteristics storage of function;
S503:The SimHash fingerprint characteristics inquiry of function.
As a further improvement on the present invention:The flow process of step S501 is:
I. the binary number S of vectorial V and 32 of one 32 dimension is set, and they is initialized as into 0;
Ii. set set of keywords and be combined into K, s (k)={ hash (k) | k ∈ K } is the cryptographic Hash that the length of keyword k is 32,
V (k) is the weight of keyword k;
Iii. for each keyword k ∈ K, for (i=0 to 31):If the i-th bit of s (k) is the i-th of 1, V
Individual element adds v (k), otherwise deducts v (k);
Iv. after operating according to step iii to all of keyword, if i-th element of V is more than 0, by S's
I-th bit is set to 1, is otherwise 0;Last output S is the fingerprint characteristic of function.
As a further improvement on the present invention:The flow process of step S502 is:
I. the function set F of binary file is traveled through, for each function f ∈ F, fingerprint S (f) is calculated;
Ii. the concordance list of 8 is set up, index range is:0~255;
Iii. to 32 binary representations of each fingerprint S (f), according to 31~24,23~16,15~8,7~0 4 are divided into
Section, 32 complete fingerprints is present in the index entry represented by this 4 sections by 8 per section.
As a further improvement on the present invention:The flow process of step S503 is:
I. the function set F of file to be matched is traveled through, for each function f ∈ F, fingerprint S (f) is calculated;
Ii. to 32 binary representations of each fingerprint S (f), according to 31~24,23~16,15~8,7~0 4 are divided into
Section, 8 per section;It is designated as S1, S2, S3, S4;
Iii. by S1, S2, S3, S4As index entry, inquired about in the binary file for possessing knowledge base.
Compared with prior art, it is an advantage of the current invention that:
1. the binary file comparative approach based on SimHash algorithms of the present invention, with good versatility, emphasis solution
Time efficiency of having determined this root problem so that being relatively possibly realized to large software.Meanwhile, feature is being extracted to function
When, the architectural feature and incidence relation of more concern functions, and the different characteristic of function is carried out by synthesis using SimHash algorithms
Evaluate so that for being relatively possibly realized using the complex software of Code Obfuscation Security Technology etc..
2. the binary file comparative approach based on SimHash algorithms of the present invention, the accuracy that binary file compares
It is good.In method proposed by the invention, first SPP algorithms are employed to basic block and elementary path and extract its keyword, solved
The problem that its built-in command of having determined is reset;Meanwhile, when to the SimHash fingerprint characteristics for extracting function, due to the nothing of SimHash
The structural features of sequence and keyword, further solve erroneous judgement caused by resetting due to minor alteration and instruction, improve
The matching accuracy rate of function.
3. the binary file comparative approach based on SimHash algorithms of the present invention, with preferable high efficiency.Using this
The proposed method of invention, for the characteristic fingerprint (32) of each function in A, is classified as 4 sections (8 per section), utilizes
This 4 sections, as index, in the fingerprint index table of B index entry are searched.Then for each is indexed, 2 can be at most returned(y-8)Individual candidate
As a result.Time, more complicated degree was from 2x+yIt is reduced to 4*2x+y-8=2x+y-6.For some large programs, greatly reduce and compare consumption
When, reach the purpose of quick search.
Description of the drawings
Fig. 1 is the schematic flow sheet of the inventive method.
Specific embodiment
The present invention is described in further details below with reference to Figure of description and specific embodiment.
As shown in figure 1, one kind of the present invention is based on SimHash algorithms (similar Hash, Similarity Hashing)
Binary file comparative approach, step is:
S1:Using the expanded function of IDA Pro, write plug-in unit carries out information extraction to binary file;Described information bag
Include the assembly instruction sequence of binary file, controlling stream graph, call the information such as flow graph.The IDA Pro are interactive dis-assembling
Device professional version (Interactive Disassembler Professional).
S2:Binary file information to extracting is pre-processed.
S3:Keyword definition is carried out to the binary file information after pretreated.
In concrete application example, the definition of keyword mainly considers the following aspects:Single instruction, basic block, base
This path.
A) single instruction realizes the elementary cell of its function as function, is one of essential characteristic of function.Take single finger
The command code and operand of order is used as keyword;
B) basic block reflects the inner structural features of function as the node in control flow graph.The present invention is adopted
SPP algorithms calculate the SPP values of each basic block.Due to the randomness and repeatability of SPP algorithms, solve instruction rearrangement and made
Into the different problem of command sequence.Therefore can be using SPP values as keyword;
C) elementary path, the approach that elementary path is realized for function performance, the function that function is substantially the same, its elementary path
It is essentially identical, can be using the SPP values of elementary path as keyword.
S4:Keyword to extracting carries out weight measurement.
In concrete application example, keyword importance degree is:Elementary path > basic block > single instructions.For list
For individual instruction, the weight of its command code and operand is 1.For basic block, its weight is its SPP value.For basic
For path, its weight is the number of instructions that the path is passed through.
S5:Using the keyword and its weight that extract, the SimHash fingerprint characteristics of function are calculated, and to fingerprint characteristic
Stored.
All functions have carried out fingerprint characteristic calculating and have deposited in binary file and benchmark binary file are matched to band
Chu Hou, treating adaptation function using fingerprint characteristic carries out similar function inquiry.
S6:Based on the analog result after inquiry, then accurately mate is carried out using the classic algorithm based on structure match.
Used as preferred embodiment, the detailed process of above-mentioned steps S3 of the present invention is:
S301:Segmentation merged block is carried out to controlling stream graph.Segmentation block definition be:" when a basic block only has one in CFG
Individual sub-block, and the sub-block also only one of which father block when, the two basic blocks be defined as split block, CFG be function controlling stream
Journey figure (Control Flow Graph) ".The collection for assuming all basic blocks in function F is combined into B, and p (a) is basic block a in CFG
The set of father node, c (a) is the set of the child node of basic block a in CFG, and e (a, b) is that b is terminal with a as starting point in CFG
A line, merge algorithm it is as follows:
Vi. the basic block in set B is traveled through.If traveled through, step v is jumped to.Otherwise, base is taken from set B
This block b, execution step ii
If vii. the size of child node set c (b) of basic block b is 1, execution step iii otherwise returns i
Viii. a=c (b) is made, if the size of father node set p (a) of basic block a is 1, execution step iv is otherwise returned
Return i
Ix. b and a are merged into into new basic block c.Meanwhile, remove side e (b, a), { e (x, b) | x ∈ p (b) } and e (a,
x)|x∈c(a)}.And set up new side { e (x, c) | x ∈ p (b) } and { e (c, x) | x ∈ c (a) }.Basic block c is added simultaneously
In set B, return to step i
X. algorithm terminates.
S302:Register Fuzzy processing is carried out to command sequence.General register type often optimizes for compiler
Option, therefore Fuzzy processing is carried out to it, it is believed that (EAX=EBX=ECX=EDX), for 16 (AX, BX, CX, DX)
And high (AH, BH, CH, DH), low level (AL, BL, CL, DL) do same operation.
S303:Address information in command sequence is redirected.
Concrete operations are as follows:
Iii. for code segment jump instruction, the offset address after instruction is ignored when characteristic value is extracted, selects mesh
Address block cryptographic Hash as feature.
Iv. for the data pointer of data segment, the opcode of the data value that the pointer is indexed is extracted as feature.
Used as preferred embodiment, the detailed process of above-mentioned steps S5 of the present invention is:
S501:The SimHash fingerprint characteristics of function are calculated:
V. the binary number S of vectorial V and 32 of one 32 dimension is set, and they is initialized as into 0.
Vi. set set of keywords and be combined into K, s (k)={ hash (k) | k ∈ K } is the cryptographic Hash that the length of keyword k is 32,
V (k) is the weight of keyword k.
Vii. for each keyword k ∈ K, for (i=0 to 31):If the i-th bit of s (k) is the i-th of 1, V
Individual element adds v (k), otherwise deducts v (k).
Viii. after operating according to step iii to all of keyword, if i-th element of V is more than 0, by S
I-th bit be set to 1, be otherwise 0.Last output S is the fingerprint characteristic of function.
S502:The SimHash fingerprint characteristics storage of function:
Iv. the function set F of binary file is traveled through, for each function f ∈ F, fingerprint S (f) is calculated.
V. the concordance list of 8 is set up, index range is:0~255.
Vi. to 32 binary representations of each fingerprint S (f), according to 31~24,23~16,15~8,7~0 4 are divided into
Section, 32 complete fingerprints is present in the index entry represented by this 4 sections by 8 per section.
S503:The SimHash fingerprint characteristics inquiry of function:
Iv. the function set F of file to be matched is traveled through, for each function f ∈ F, fingerprint S (f) is calculated.
V. to 32 binary representations of each fingerprint S (f), according to 31~24,23~16,15~8,7~0 4 sections are divided into,
8 per section.It is designated as S1, S2, S3, S4。
Vi. by S1, S2, S3, S4As index entry, inquired about in the binary file for possessing knowledge base.
To the function list item that each is inquired, its its complete S imHash fingerprint is calculated complete with function to be matched
The Hamming distances of SimHash fingerprints.If Hamming distances are less than or equal to 3, then it is assumed that be similar function.
The binary file of current main-stream compares the comparison that instrument does not support large scope software, and for some presence
The comparison accuracy rate of Code Obfuscation Security Technology is relatively low.Main reason is that:Large scope software function is numerous and changes complicated, when
Between efficiency and matching accuracy be difficult to hold.In said method proposed by the invention, because emphasis solves time efficiency
This root problem so that being relatively possibly realized to large software.Meanwhile, when feature is extracted to function, more concern letters
Several architectural feature and incidence relation, and the different characteristic of function is carried out by overall merit using SimHash algorithms so that for
Relatively it is possibly realized using the complex software of Code Obfuscation Security Technology etc..
In conventional methods where, the instruction signature whether based on function signature or basic block is extracted and all must be compatible with compiling
The impact that device is produced to code.When basic block is matched using structured signature, can not find sometimes because command sequence is run
The problem that etc. original is thus resulted in, these problems will cause error in judgement.Equally, for the function label based on structural comparison
Name, if only considering the architectural feature of the functions such as internal basic block number, call instruction number, jump instruction number, and ignores instruction
Content, may be identical by two structures but semantic different function is matched together, such as:Max function and minimum of a value letter
Number.In said method proposed by the invention, first, SPP algorithms are employed to basic block and elementary path and extracts its key
Word, solves the problems, such as that its built-in command is reset;Meanwhile, when to the SimHash fingerprint characteristics for extracting function, due to
The randomness of SimHash and the structural features of keyword, are further solved caused by being reset due to minor alteration and instruction
Erroneous judgement, improves the matching accuracy rate of function.
Traditional binary file comparative approach, employs the mode for comparing one by one of blindness to determine matching object.It is false
If two binary files respectively have 2xWith 2yIndividual function, then the number of comparisons of conventional method be at least 2x+yIt is secondary.Time consumption is huge,
And do many useless comparisons.And said method proposed by the invention is adopted, for the feature of each function in A refers to
Line (32), is classified as 4 sections (8 per section), by the use of this 4 sections as index, in the fingerprint index table of B index entry is searched.
Then for each is indexed, 2 can be at most returned(y-8)Individual candidate result.Time, more complicated degree was from 2x+yIt is reduced to 4*2x+y-8=2x+y-6。
For some large programs, greatly reduce than relatively time-consuming, reached the purpose of quick search.
The above is only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment,
All technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that for the art
For those of ordinary skill, some improvements and modifications without departing from the principles of the present invention should be regarded as the protection of the present invention
Scope.
Claims (10)
1. a kind of binary file comparative approach based on SimHash algorithms, it is characterised in that step is:
S1:Using the expanded function of IDA Pro, write plug-in unit carries out information extraction to binary file;Described information includes two
The assembly instruction sequence of binary file, controlling stream graph, call flow graph information;
S2:Binary file information to extracting is pre-processed;
S3:Keyword definition is carried out to the binary file information after pretreated;
S4:Keyword to extracting carries out weight measurement;
S5:Using the keyword and its weight that extract, the SimHash fingerprint characteristics of function are calculated, and fingerprint characteristic is carried out
Storage;
S6:Based on the analog result after inquiry, then accurately mate is carried out using the classic algorithm based on structure match.
2. the binary file comparative approach based on SimHash algorithms according to claim 1, it is characterised in that in step
In rapid S3, the definition to keyword considers:Single instruction, basic block, elementary path;Take the command code and operand of single instruction
As keyword, the SPP values of each basic block are calculated with SPP algorithms, the SPP values of elementary path are used as keyword.
3. the binary file comparative approach based on SimHash algorithms according to claim 1, it is characterised in that in step
In rapid S4, keyword importance degree is:Elementary path > basic block > single instructions;For single instruction, its operation
The weight of code and operand is 1;For basic block, its weight is its SPP value;For elementary path, its weight is should
The number of instructions that path is passed through.
4. the binary file comparative approach based on SimHash algorithms according to claim 1 or 2 or 3, its feature exists
In being in the detailed process of step S3:
S301:Segmentation merged block is carried out to controlling stream graph;
S302:Register Fuzzy processing is carried out to command sequence;
S303:Address information in command sequence is redirected.
5. the binary file comparative approach based on SimHash algorithms according to claim 4, it is characterised in that in step
In rapid S301, when a basic block only one of which sub-block in CFG, and the sub-block also only one of which father block when, the two basic blocks
It is defined as splitting block;The collection for assuming all basic blocks in function F is combined into B, and p (a) is the father node of basic block a in CFG
Set, c (a) is the set of the child node of basic block a in CFG, and e (a, b) is that, with a as starting point, b is a line of terminal in CFG,
Flow process is:
I. the basic block in set B is traveled through;If traveled through, step v is jumped to;Otherwise, basic block b is taken from set B,
Execution step ii;
If ii. the size of child node set c (b) of basic block b is 1, execution step iii otherwise returns i;
Iii. a=c (b) is made, if the size of father node set p (a) of basic block a is 1, execution step iv otherwise returns i;
Iv. b and a are merged into into new basic block c;Meanwhile, remove side e (b, a), { e (x, b) | x ∈ p (b) } and e (a, x) | x
∈c(a)};And set up new side { e (x, c) | x ∈ p (b) } and { e (c, x) | x ∈ c (a) };Simultaneously basic block c is added into set B
In, return to step i;
V. terminate.
6. the binary file comparative approach based on SimHash algorithms according to claim 5, it is characterised in that in step
Suddenly the flow process of S303 is:
I. for code segment jump instruction, the offset address after instruction is ignored when characteristic value is extracted, selects destination address
The cryptographic Hash of block is used as feature;
Ii. for the data pointer of data segment, the opcode of the data value that the pointer is indexed is extracted as feature.
7. the binary file comparative approach based on SimHash algorithms according to claim 1 or 2 or 3, its feature exists
In the flow process of step S5 is:
S501:The SimHash fingerprint characteristics of function are calculated;
S502:The SimHash fingerprint characteristics storage of function;
S503:The SimHash fingerprint characteristics inquiry of function.
8. the binary file comparative approach based on SimHash algorithms according to claim 7, it is characterised in that described
The flow process of step S501 is:
I. the binary number S of vectorial V and 32 of one 32 dimension is set, and they is initialized as into 0;
Ii. set set of keywords and be combined into K, s (k)={ hash (k) | k ∈ K } is the cryptographic Hash that the length of keyword k is 32, v (k)
For the weight of keyword k;
Iii. for each keyword k ∈ K, for (i=0to 31):If the i-th bit of s (k) is i-th element of 1, V
Plus v (k), v (k) is otherwise deducted;
Iv. after operating according to step iii to all of keyword, if i-th element of V is more than 0, by the i-th bit of S
1 is set to, is otherwise 0;Last output S is the fingerprint characteristic of function.
9. the binary file comparative approach based on SimHash algorithms according to claim 8, it is characterised in that described
The flow process of step S502 is:
I. the function set F of binary file is traveled through, for each function f ∈ F, fingerprint S (f) is calculated;
Ii. the concordance list of 8 is set up, index range is:0~255;
Iii. to 32 binary representations of each fingerprint S (f), it is divided into 4 sections according to 31~24,23~16,15~8,7~0, often
Section 8,32 complete fingerprints are present in the index entry represented by this 4 sections.
10. the binary file comparative approach based on SimHash algorithms according to claim 8, it is characterised in that described
The flow process of step S503 is:
I. the function set F of file to be matched is traveled through, for each function f ∈ F, fingerprint S (f) is calculated;
Ii. to 32 binary representations of each fingerprint S (f), it is divided into 4 sections according to 31~24,23~16,15~8,7~0, often
Section 8;It is designated as S1, S2, S3, S4;
Iii. by S1, S2, S3, S4As index entry, inquired about in the binary file for possessing knowledge base.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611009372.6A CN106649218A (en) | 2016-11-16 | 2016-11-16 | Quick binary file comparing method based on SimHash algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611009372.6A CN106649218A (en) | 2016-11-16 | 2016-11-16 | Quick binary file comparing method based on SimHash algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106649218A true CN106649218A (en) | 2017-05-10 |
Family
ID=58807152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611009372.6A Pending CN106649218A (en) | 2016-11-16 | 2016-11-16 | Quick binary file comparing method based on SimHash algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649218A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704501A (en) * | 2017-08-28 | 2018-02-16 | 中国科学院信息工程研究所 | A kind of method and system for identifying homologous binary file |
CN108280197A (en) * | 2018-01-29 | 2018-07-13 | 中国科学院信息工程研究所 | A kind of method and system of the homologous binary file of identification |
CN109670317A (en) * | 2018-12-24 | 2019-04-23 | 中国科学院软件研究所 | A kind of internet of things equipment inheritance bug excavation method based on atom controlling stream graph |
CN112100318A (en) * | 2020-11-12 | 2020-12-18 | 北京智慧星光信息技术有限公司 | Multi-dimensional information merging method, device, equipment and storage medium |
CN115016843A (en) * | 2022-05-23 | 2022-09-06 | 北京计算机技术及应用研究所 | High-precision binary code similarity comparison method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105718506A (en) * | 2016-01-04 | 2016-06-29 | 胡新伟 | Duplicate-checking comparison method for science and technology projects |
CN106055602A (en) * | 2016-05-24 | 2016-10-26 | 腾讯科技(深圳)有限公司 | File verification method and apparatus |
-
2016
- 2016-11-16 CN CN201611009372.6A patent/CN106649218A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105718506A (en) * | 2016-01-04 | 2016-06-29 | 胡新伟 | Duplicate-checking comparison method for science and technology projects |
CN106055602A (en) * | 2016-05-24 | 2016-10-26 | 腾讯科技(深圳)有限公司 | File verification method and apparatus |
Non-Patent Citations (2)
Title |
---|
刘春红 等: "二进制文件同源性检测的结构化相似度计算", 《北京邮电大学学报》 * |
张广庆 等: "基于simhash的海量相似文档快速搜索优化方法", 《指挥信息系统与技术》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704501A (en) * | 2017-08-28 | 2018-02-16 | 中国科学院信息工程研究所 | A kind of method and system for identifying homologous binary file |
CN107704501B (en) * | 2017-08-28 | 2020-04-24 | 中国科学院信息工程研究所 | Method and system for identifying homologous binary file |
CN108280197A (en) * | 2018-01-29 | 2018-07-13 | 中国科学院信息工程研究所 | A kind of method and system of the homologous binary file of identification |
CN108280197B (en) * | 2018-01-29 | 2020-09-11 | 中国科学院信息工程研究所 | Method and system for identifying homologous binary file |
CN109670317A (en) * | 2018-12-24 | 2019-04-23 | 中国科学院软件研究所 | A kind of internet of things equipment inheritance bug excavation method based on atom controlling stream graph |
CN112100318A (en) * | 2020-11-12 | 2020-12-18 | 北京智慧星光信息技术有限公司 | Multi-dimensional information merging method, device, equipment and storage medium |
CN115016843A (en) * | 2022-05-23 | 2022-09-06 | 北京计算机技术及应用研究所 | High-precision binary code similarity comparison method |
CN115016843B (en) * | 2022-05-23 | 2024-03-26 | 北京计算机技术及应用研究所 | High-precision binary code similarity comparison method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chua et al. | Neural nets can learn function type signatures from binaries | |
CN109697162B (en) | Software defect automatic detection method based on open source code library | |
CN108446540B (en) | Program code plagiarism type detection method and system based on source code multi-label graph neural network | |
CN106649218A (en) | Quick binary file comparing method based on SimHash algorithm | |
CN112733137B (en) | Binary code similarity analysis method for vulnerability detection | |
CN111459799B (en) | Software defect detection model establishing and detecting method and system based on Github | |
WO2020215563A1 (en) | Training sample generation method and device for text classification, and computer apparatus | |
CN109344230B (en) | Code library file generation, code search, coupling, optimization and migration method | |
CN106407809A (en) | A Linux platform malicious software detection method | |
CN104142822A (en) | Source code flow analysis using information retrieval | |
Yu et al. | A feature selection approach based on a similarity measure for software defect prediction | |
CN113326187A (en) | Data-driven intelligent detection method and system for memory leakage | |
CN103534696A (en) | Exploiting query click logs for domain detection in spoken language understanding | |
JP5780036B2 (en) | Extraction program, extraction method and extraction apparatus | |
JP6588661B2 (en) | Information retrieval accuracy evaluation method, system, apparatus, and computer-readable storage medium | |
CN113886832A (en) | Intelligent contract vulnerability detection method, system, computer equipment and storage medium | |
CN115373737B (en) | Code clone detection method based on feature fusion | |
CN110554952B (en) | Search-based hierarchical regression test data generation method | |
CN110928550A (en) | Method for eliminating redundancy of GCC abstract syntax tree based on keyword Trie tree | |
CN112328710B (en) | Entity information processing method, device, electronic equipment and storage medium | |
Li et al. | A Deep Learning Based Approach to Detect Code Clones | |
CN114398069B (en) | Method and system for identifying accurate version of public component library based on cross fingerprint analysis | |
CN113536077B (en) | Mobile APP specific event content detection method and device | |
JP5514682B2 (en) | Batch processing program analysis method and apparatus | |
Yang et al. | RouAlign: Cross-Version Function Alignment and Routine Recovery with Graphlet Edge Embedding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |
|
RJ01 | Rejection of invention patent application after publication |