CN106126235A - A kind of multiplexing code library construction method, the quick source tracing method of multiplexing code and system - Google Patents

A kind of multiplexing code library construction method, the quick source tracing method of multiplexing code and system Download PDF

Info

Publication number
CN106126235A
CN106126235A CN201610474461.1A CN201610474461A CN106126235A CN 106126235 A CN106126235 A CN 106126235A CN 201610474461 A CN201610474461 A CN 201610474461A CN 106126235 A CN106126235 A CN 106126235A
Authority
CN
China
Prior art keywords
function
code
block
similar
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610474461.1A
Other languages
Chinese (zh)
Other versions
CN106126235B (en
Inventor
张永铮
乔延臣
云晓春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201610474461.1A priority Critical patent/CN106126235B/en
Publication of CN106126235A publication Critical patent/CN106126235A/en
Application granted granted Critical
Publication of CN106126235B publication Critical patent/CN106126235B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/36Software reuse
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a kind of multiplexing code library construction method, the quick source tracing method of multiplexing code and system.Native system includes that pretreatment module, code library build module function and trace to the source module;Pretreatment module, obtains the assembly code of each sample, and extracts the function in each assembly code;And according to the jump instruction in each function and jump address, this function is split as some code blocks and calculates the simhash value of each code block;Code library builds module, builds the code block that simhash value index is corresponding, and code block index comprises the function of this code block, and index functions comprises three grades of inverted indexs of the sample of this function;Function is traced to the source module, retrieves the similar codes block of this function to be traced to the source in code library, the potential similarity function that each similar codes block is corresponding, then according to the relation that redirects between similar codes block with determine similar function whether with functional similarity to be traced to the source.The present invention improves the automaticity of homology judgement work.

Description

A kind of multiplexing code library construction method, the quick source tracing method of multiplexing code and system
Technical field
The present invention relates to conversed analysis and malicious code analysis field, be specifically related to one and based on simhash and arrange rope The multiplexing code library construction method drawn, quick source tracing method and system.
Background technology
Code reuse is generally with function as ultimate unit, even if being compiled device height optimization, still to retain big flow function whole Body, more conforms to multiplexing scene so carrying out tracing to the source in units of function herein to similar judgement.Malicious code homology judges Main Basis is malicious code authors writes the multiplexing of code in different malicious codes to individual, as Sasser Yu Netsky, The homology of Flame Yu Gauss etc. judges the special function all shared according to them.But, for improving development rate, malicious code Other people disclosure write or semi-over code of the frequent multiplexing of author, if Chthonic is that a amendment on the basis of Zeus source code is opened The malicious code sent out.It was reported simultaneously, Equation APT (Advanced Persistent Threat, senior persistence prestige The side of body) attack in use a sample, be determined and belong to Zlob family, illustrate APT attack organize also can multiplexing Open Source Code.For Performing needs, compiler is typically inserted into a large amount of code when compiling.After tested, the C language code of a function is only had when compiling Time, the VC6.0 compiler under Windows inserts 103 functions, and the GCC4.7.2 compiler under Linux inserts 18 letters Number.The on position of the function function that different compilers inserts is the most different, needs a large amount of experience and skill could identify this A little functions.Multiplexing function causes the biggest interference to malicious code analysis and homology judgement work, currently mainly relies on malice generation The experience identification of code division analysis personnel, causes homology to judge inefficient.Quickly know multiplexing function and will be greatly improved efficiency, and promote Homology judges the credibility of conclusion.
The basis that multiplexing function is traced to the source is that similar function judges, if there is the similar of a function in certain binary sample Function, then illustrate that this function is multiplexing function.Similar function decision technology, has the highest accuracy rate and recall rate at present, but Being that judgement is inefficient, the multiplexing function being not suitable with magnanimity code is traced to the source.A small amount of amendment of one function source code, compiling option, The difference of position all can cause the difference of instruction sequences, depositor, jump location etc. in reverse rear assembly code, if therefore The methods such as Hash are used to carry out tracing to the source and will cause low-down recall rate.In function, the structure that redirects of code block is similar judgement Key character, and redirect relation extract, the comparison of structure chart to take considerable time, is to cause current similar judgement accurate Rate, recall rate and speed are difficult to the major reason got both.
Summary of the invention
The technical problem existed for prior art, in order to realize quickly tracing to the source and judging of multiplexing code, the present invention is public A kind of multiplexing code library construction method, quick source tracing method and system of based on simhash Yu inverted index are opened.
The present invention is in units of function, based on simhash and Inverted Index Technique, and phase of quickly tracing to the source in magnanimity code Like function.First reverse existing non-shell adding and the sample acquisition assembly code that shells, draw function therein according to jump instruction It is divided into multiple code block and calculates the simhash value of code block, building simhash value and code block, code block function, function And three grades of inverted indexs between sample.Trace to the source function, according to the simhash value quickly discovery similar codes block of code block, then Inverted index potential similarity function, and trace to the source to place sample.
The invention discloses a kind of code library structure side traced to the source for multiplexing code based on simhash Yu inverted index Method, based on this code library, uses source tracing method disclosed by the invention quickly to trace to the source and positions similar function and the sample at place thereof. Specifically include 4 steps:
(1) the reverse assembly code obtaining each executable program sample;
(2) according to call instruction call and call address, the function in assembly code is extracted, and according in each function This function is split as multiple code block by jump instruction and jump address;
(3) simhash algorithm is used (within 2002, to propose " Similarity estimation techniques from Rounding algorithms ") calculate the simhash value of each code block;
(4) building three grades of inverted indexs: the code block that simhash value index is corresponding, code block index comprises this code block Function, index functions comprises the sample of this function.
The invention discloses a kind of quick source tracing method of multiplexing code based on simhash Yu inverted index, for waiting to trace back Source function, uses the method quickly can be traceable to the function with this functional similarity and place sample thereof in magnanimity code library, if It is not traceable to similar function, it is believed that this function is multiplex function on the basis of code library.Specifically include 5 steps:
(1) function to be traced to the source is split as multiple code block according to jump instruction and jump address, and calculates code block Simhash value;
(2) Hamming distance of search and code block simhash value simhash value within 3 in code library, leads to then Cross inverted index and search out code block corresponding to the Hamming distance each simhash value within 3 as similar codes block;
(3) search out potential similarity function by similar codes block and inverted index, then according to potential similarity function with The quantity of the similar codes block of this function to be traced to the source is that each potential similarity function gives weights, filters out weights more than one Determine the function of threshold value;Such as by 3 similar codes blocks of function A, two potential similarity letters similar for function A are retrieved Number B, C, if function B function A has a code block similar, then the weights of function B are exactly 1/3, function C function A has Two code blocks are similar, then the weights of function C are 2/3;If threshold value is 1/2, then only will be considered that function C function A phase Seemingly;
(4) finally determined whether by the relation that redirects between the similar codes block of the similar function filtered out in comparison (3) Similar;The only relation that redirects when between code block that sets in the present invention complete similar time, just think functional similarity.Such as function A Function B has 2 code blocks (1,2) similar, if the relation that redirects between two code blocks in function A is 1 to jump to 2, but Function B is 2 jump to 1, then function A, B are dissimilar, when only the relation that redirects between code block is identical, just thinks letter A, B are similar for number.
(5) traced to the source by inverted index and navigate to the sample at similar function place.
The present invention discloses a kind of quick traceability system of multiplexing code based on simhash Yu inverted index, mainly Built module, function 3 modules of module of tracing to the source by pretreatment module, code library to form.
Compared with prior art, the positive effect of the present invention is:
The present invention quickly can trace to the source the function code with certain functional similarity and place sample thereof in great amount of samples, and has Higher accuracy rate and recall rate.Conversed analysis personnel can be helped to carry with instruments such as exploitation code search engines based on the method High efficiency, promotes the automaticity of homology judgement work.
Being shown experimentally that, the present invention has the highest accuracy rate, recall rate and speed of tracing to the source quickly:
(1) " Program Files " and all PE files under " Windows " file in 32 WinXP systems are used Construct a code library;
(2) VC6.0 compiling is used only to comprise a main function, and the C language source code of an only printf statement, Being compiled as Release version executable file, the reverse assembly code obtaining this document, IDA Pro can automatically identify and reject storehouse Function, so eventually also had 19 compilers to insert function in addition to main function in code;
(3) owing to WinXP system existing the file of a large amount of VC6.0 compiling, speculate that 19 compilers insert function accordingly There is certain probability to be traceable to similar function in the code library that WinXP file builds, therefore 19 compilers are inserted function and enter Row is traced to the source, and finds that wherein 16 exist similar function, and the other three sub_401010, sub_4057BC with sub_402AD1 do not trace back Source is to similar function, at the DellPower Edge R410 being furnished with 16 core Intel (R) Xeon (R) CPU E562 and 16G internal memory Carrying out this experiment on server, the averagely time of tracing to the source of each function is about 0.149 second.
Following table lists the similar function that is traceable to of part:
Accompanying drawing explanation
Fig. 1 is code library construction method flow chart based on simhash Yu inverted index;
Fig. 2 is three grades of inverted index structure figures;
Fig. 3 is that function is traced to the source flow chart;
Fig. 4 is multiplexing code quick traceability system Organization Chart based on simhash Yu inverted index.
Detailed description of the invention
Below, in conjunction with concrete embodiment, the present invention is described in detail.
Fig. 1 gives the flow process of the code library construction method that the present invention provides, and is embodied as step as follows:
(1) sample of shell adding is carried out heat treatment
1) PeiD is used to look into the shell that shell instrument judges that sample is used;
2) different shelling instruments is used to carry out heat treatment for different shells;
3) giving up other uses special shell to cause the sample that cannot shell.
Final sample is the sample that shells.
(2) reverse instrument is used to obtain the assembly code of each sample
The present invention is as a example by IDA Pro.
(3) function in assembly code is extracted
In the reverse assembly code obtained, " proc near " identifies the beginning of a function, and " endp " identifies a letter The end of number, extracts function according to the two mark from assembly code.
(4) code block in function is extracted
The address pointed to jump instruction according to jump instructions such as jnz, the jz in function, is divided into multiple by function code Code block, each code block or do not have jump instruction, or the last item is jump instruction.
(5) code standardization processes
Slight change to source code will result in the significantly variation of depositor in assembly code, immediate, memory address, for Ignore the impact that code is caused by this species diversity, be standardized processing to assembly code according to following rule:
● depositor such as eax, ax, al etc. are standardized as REG32, REG16, REG18 respectively according to shared figure place;
● internal memory such as [eax], [edi+4] etc. is represented as MEM;
● immediate such as 0,5A4Dh are expressed as VAL;
● during system library function outside call instruction calls, instruction does not processes, and calls intrinsic function such as " call sub_ 105A8 " time specification turn to " call INNER ";
● jump instruction such as " jz short loc_4023E7 " specification turns to " jz loc ".
(6) the simhash value of code block is calculated
Simhash is a kind of fuzzy Hash, is frequently utilized for the judgement work of the Similar Text of Light Difference, webpage.
1) create the vector of 64, and be initialized as 0;
2) standardized code block being carried out word segmentation processing, participle is the standardized job sequence of 2-gram;
3) it is that each participle gives weights, by the frequency of participle as basis weights, due to the API that calls to a great extent On determine the function of function, so call instruction has important function in code block, therefore to comprising call instruction participle Weights double;
4) each participle uses MD5 hash algorithm do Hash to process, take 64 Hash as this participle in MD5 value Value;
5) weighting merges, and to each of participle Hash, if this position is 1, then the value of vector corresponding positions is plus this participle Weights, otherwise deduct the weights of this participle;
6) dimensionality reduction, to each of vector, if this position more than 0, is then set to 1, is otherwise set to 0, form one 64 Simhash value.
(7) three grades of inverted indexs are built
Through front 6 steps, it is extracted in function that Massive Sample comprises, function the code block comprised, and calculates each generation The simhash value of code block, builds three grades of inverted indexs for these elements, as shown in Figure 2:
1) had the code block of this simhash value by simhash value index, simhash collision probability is relatively big, so existing Several code blocks have the situation of same simhash value;
2), by the function at code block index codes block place, after code block standardization, similar probability is increased, dissimilar Function be also possible to there is identical standardized codes block;
3) by the sample at index functions function place.
It is based ultimately upon simhash and on the basis of Massive Sample, constructs, with inverted index, the code traced to the source for function Storehouse.
Fig. 3 gives the flow process of function of quickly tracing to the source on the code library built, and is embodied as step as follows:
(1) function code P to be traced to the source, is split as multiple code block according to jump instruction, it is assumed that total n, calculate afterwards The simhash value of each code block:
P→{sh1,sh2,…,shn[formula 1]
(2) simhash multilist indexing means is used, for shi| i ∈ [1, n] quick-searching Hamming distances phase within 3 Like simhash, constitute similar simhash and gather:
shSeti={ sh:d (shi, sh) and≤3} | i ∈ [1, n] [formula 2]
d(shi, sh) and represent shiWith the Hamming distances of sh, if all of similar simhash set is empty set, illustrate not Existence and this function have the function of similar codes block, otherwise, continue next step;
(3) according to the inverted index of the simhash built with code block, retrieval simhash value belongs to shSetiGeneration Code block L, composition similar codes set of blocks:
LSeti={ L:simhash (L) ∈ shSeti| i ∈ [1, n] [formula 3]
Simhash (L) represents the simhash value of code block L;
(4) according to the inverted index of the code block built with place function, the function P ' at retrieval similar codes block place, Constituting-functions set:
(5) to each function retrieved, this function P ' code registration in function P to be traced to the source, i.e. this P ' are calculated With the similar codes of the P accounting in P, being expressed as iSim (P, P '), formula is as follows:
Wherein total instruction number of len (P) representative function P, len (Li) represent code block LiInstruction number, pass through experimental verification Find when the code registration of function is not less than 0.5 the most similar to function code to be traced to the source, the most finally draw latent It is combined at similar function collection:
PSet={P ': iSim (P, P ') >=0.5} [formula 6]
If potential similarity function set PSet is empty set, illustrates not exist and function to be traced to the source has enough similar codes Function, it is difficult to assert that they are similar, function to be traced to the source is regarded as original function on existing code set.
(6) if potential similarity function and function to be traced to the source all only have a similar codes block, then similar function it is judged to, if When there is multiple similar codes block, only it is judged to similar function when redirecting relation and being identical.
The quick traceability system of multiplexing code based on simhash with inverted index disclosed by the invention can be used for quickly tracing to the source Similar function, it is determined whether for multiplexing function and the sample of multiplexing, contribute to promoting malicious code analysis efficiency, multiplexing relation Detection and malicious code homology judge.The quick traceability system of multiplexing code based on simhash and inverted index mainly by Pretreatment module, code library build module function and trace to the source 3 modules of module composition.
System structure is as shown in Figure 4.It is as follows that system is embodied as step:
(1) pretreatment
1) to each sample, do the reverse dis-assembling of shelling and process acquisition assembly code;
2) to each assembly code, all functions in assembly code are extracted according to special identifier;
3) to each function, it is divided into multiple code block according to jump instruction with the relation of redirecting;
4) to each code block, do code standardization and process and calculate the simhash value of code library.
(2) code library builds
Based on the function obtained after Massive Sample, and pretreatment, code block, simhash value, build and comprise three grades of rows of falling The code library of index, is had the code block of this simhash value, by code block index codes block place by simhash value index Function, by the sample at index functions function place.
(3) function is traced to the source
Through pretreatment module, function to be traced to the source is divided into multiple code block, and obtains the simhash of code block Value, is primarily based on the similar codes block in the simhash value retrieval coding storehouse of code library, secondly through similar codes storehouse inverted index To potential similarity function, the relation that redirects being finally based between code registration and similar codes block determines whether similar function.

Claims (10)

1. a multiplexing code library construction method, the steps include:
1) assembly code of each executable program sample is obtained;
2) extract the function in each assembly code, and with jump address, this function is torn open according to the jump instruction in each function It is divided into some code blocks and is saved in code library;
3) the simhash value of each code block is calculated;
4) three grades of inverted indexs between simhash value and code block, code block function, function and sample are built.
2. the method for claim 1, it is characterised in that described three grades of inverted indexs are: simhash value index correspondence Code block, code block index comprises the function of this code block, and index functions comprises the sample of this function.
3. method as claimed in claim 1 or 2, it is characterised in that extract assembly code with call address according to call instruction In function.
4. the quick source tracing method of multiplexing code, the steps include:
1) according to jump instruction and jump address, function to be traced to the source is split as some code blocks, and calculates each code block Simhash value;
2) for each code block, in code library, search and the Hamming distance of the simhash value of this code block are at setpoint distance Interior code block is as the similar codes block of this code block;
3) in code library, search for the potential similarity function that each similar codes block is corresponding, then according to potential similarity function with should The similar codes number of blocks of function to be traced to the source gives weights for corresponding potential similarity function;Then filter out weights and exceed setting The potential similarity function of threshold value is as similar function;
4) according to the relation that redirects between similar codes block with determine this similar function whether with functional similarity to be traced to the source;If phase Seemingly, then this similar function is multiplexing code.
5. method as claimed in claim 4, it is characterised in that search step 4 in code library) the similar function place that determines Sample, return the sample at this similar function and place thereof.
6. method as claimed in claim 4, it is characterised in that described code library includes multiple code block and simhash value thereof, And building the code block that simhash value index is corresponding, code block index comprises the function of this code block, and index functions comprises this letter Three grades of inverted indexs of the sample of number.
7. the method as described in claim 4 or 5 or 6, it is characterised in that the method calculating this simhash value is:
71) create the vector of a N position, and be initialized as 0;
72) standardized code block being carried out word segmentation processing, participle is the standardized job sequence of 2-gram;
73) it is that each participle gives weights, by the frequency of participle as basis weights, to the weights comprising call instruction participle Double;
74) each participle uses MD5 hash algorithm do Hash to process, take the cryptographic Hash as this participle of the N position in MD5 value;
75) to each of participle Hash, if this position is 1, then the value of this vector corresponding positions is plus the weights of this participle, otherwise Deduct the weights of this participle;
76) to each of this vector, if this position is more than 0, then it is set to 1, is otherwise set to 0, form the simhash value of a N position.
8. the method as described in claim 4 or 5 or 6, it is characterised in that step 3) in, first search for each in code library According to the code registration of this function with this function to be traced to the source, the function that similar codes block is corresponding, then determines that whether this function is Potential similarity function.
9. the method as described in claim 4 or 5 or 6, it is characterised in that step 4) in, according to redirecting between similar codes block Whether relation with determining this similar function with the method for functional similarity to be traced to the source is: if this similar function is with this function to be traced to the source only There is a similar codes block, be then judged to this similar function and this functional similarity to be traced to the source, if this similar function is waited to trace to the source with this There is multiple similar codes block between function, then in this similar function, the relation that redirects between these similar codes blocks is waited to trace to the source with this In function, the relation that redirects between corresponding code block judges this similar function and this functional similarity to be traced to the source time identical.
10. the quick traceability system of multiplexing code, it is characterised in that include that pretreatment module, code library build module and letter Count module of tracing to the source;Wherein:
Pretreatment module, for obtaining the assembly code of each sample, and extracts the function in each assembly code;And foundation This function is split as some code blocks and calculates each code block by the jump instruction in each function and jump address Simhash value;
Code library builds module, for preserving the sample at function, code block and place thereof and the simhash value of code block, and Building the code block that simhash value index is corresponding, code block index comprises the function of this code block, and index functions comprises this function Three grades of inverted indexs of sample;
Function is traced to the source module, for dividing the code block and simhash value thereof obtained according to function to be traced to the source, and inspection in code library The similar codes block of this function to be traced to the source of rope, then searches for the potential similarity letter that each similar codes block is corresponding in code library Number, then foundation potential similarity function and the similar codes number of blocks of this function to be traced to the source are that corresponding potential similarity function gives one Weights, filter out weights and exceed the potential similarity function of setting threshold value as similar function;Then between according to similar codes block Redirect relation with determine this similar function whether with functional similarity to be traced to the source;If similar, then this similar function is multiplexing code.
CN201610474461.1A 2016-06-24 2016-06-24 A kind of multiplexing code base construction method, the quick source tracing method of multiplexing code and system Expired - Fee Related CN106126235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610474461.1A CN106126235B (en) 2016-06-24 2016-06-24 A kind of multiplexing code base construction method, the quick source tracing method of multiplexing code and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610474461.1A CN106126235B (en) 2016-06-24 2016-06-24 A kind of multiplexing code base construction method, the quick source tracing method of multiplexing code and system

Publications (2)

Publication Number Publication Date
CN106126235A true CN106126235A (en) 2016-11-16
CN106126235B CN106126235B (en) 2019-05-07

Family

ID=57266110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610474461.1A Expired - Fee Related CN106126235B (en) 2016-06-24 2016-06-24 A kind of multiplexing code base construction method, the quick source tracing method of multiplexing code and system

Country Status (1)

Country Link
CN (1) CN106126235B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590385A (en) * 2017-09-15 2018-01-16 湖南大学 A kind of hardware aids in anti-code reuse attack defending system and method
CN107885503A (en) * 2017-11-11 2018-04-06 湖南大学 A kind of iteration based on performance of program analysis compiles optimization method
WO2018099151A1 (en) * 2016-11-30 2018-06-07 上海寒武纪信息科技有限公司 Instruction generation process multiplexing method and device
CN108763486A (en) * 2018-05-30 2018-11-06 湖南写邦科技有限公司 Paper duplicate checking method, terminal and storage medium based on terminal
CN109445844A (en) * 2018-11-05 2019-03-08 浙江网新恒天软件有限公司 Code Clones detection method based on cryptographic Hash, electronic equipment, storage medium
CN109815996A (en) * 2019-01-07 2019-05-28 北京首钢自动化信息技术有限公司 It is a kind of based on the scene of Recognition with Recurrent Neural Network from adaptation method and device
CN110569629A (en) * 2019-09-10 2019-12-13 北京计算机技术及应用研究所 Binary code file tracing method
CN110647666A (en) * 2019-09-03 2020-01-03 平安科技(深圳)有限公司 Intelligent matching method and device for template and formula and computer readable storage medium
CN111241497A (en) * 2020-02-13 2020-06-05 北京高质系统科技有限公司 Open source code tracing detection method based on software multiplexing feature learning
CN111290784A (en) * 2020-01-21 2020-06-16 北京航空航天大学 Program source code similarity detection method suitable for large-scale samples
CN112257068A (en) * 2020-11-17 2021-01-22 南方电网科学研究院有限责任公司 Program similarity detection method and device, electronic equipment and storage medium
CN112799939A (en) * 2021-01-22 2021-05-14 网易(杭州)网络有限公司 Incremental code coverage rate testing method and device, storage medium and electronic equipment
CN113360134A (en) * 2020-03-06 2021-09-07 武汉斗鱼网络科技有限公司 Method, device, equipment and storage medium for generating security verification program
CN113590192A (en) * 2021-09-26 2021-11-02 北京迪力科技有限责任公司 Quality analysis method and related equipment
CN113703773A (en) * 2021-08-26 2021-11-26 北京计算机技术及应用研究所 NLP-based binary code similarity comparison method
CN113722238A (en) * 2021-11-01 2021-11-30 北京大学 Method and system for realizing rapid open source component detection of source code file
US20220129417A1 (en) * 2020-10-22 2022-04-28 Google Llc Code Similarity Search
WO2022166410A1 (en) * 2021-02-02 2022-08-11 华为技术有限公司 Method and apparatus for jumping between functions having symbols with different names, and computer readable storage medium
CN114995880A (en) * 2022-05-23 2022-09-02 北京计算机技术及应用研究所 Binary code similarity comparison method based on SimHash

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101425008A (en) * 2007-11-01 2009-05-06 北京航空航天大学 Method for measuring similarity of source code based on edition distance
CN102063446A (en) * 2009-11-13 2011-05-18 中国移动通信集团四川有限公司 Method for creating inverted index and inverted indexing device
CN103646080A (en) * 2013-12-12 2014-03-19 北京京东尚科信息技术有限公司 Microblog duplication-eliminating method and system based on reverse-order index

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101425008A (en) * 2007-11-01 2009-05-06 北京航空航天大学 Method for measuring similarity of source code based on edition distance
CN102063446A (en) * 2009-11-13 2011-05-18 中国移动通信集团四川有限公司 Method for creating inverted index and inverted indexing device
CN103646080A (en) * 2013-12-12 2014-03-19 北京京东尚科信息技术有限公司 Microblog duplication-eliminating method and system based on reverse-order index

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴冲: "《基于抽象语法树的重复代码检测》", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018099151A1 (en) * 2016-11-30 2018-06-07 上海寒武纪信息科技有限公司 Instruction generation process multiplexing method and device
WO2018099490A1 (en) * 2016-11-30 2018-06-07 上海寒武纪信息科技有限公司 Method and device for reusing instruction generation process, and processing device
CN107590385A (en) * 2017-09-15 2018-01-16 湖南大学 A kind of hardware aids in anti-code reuse attack defending system and method
CN107590385B (en) * 2017-09-15 2020-03-17 湖南大学 Hardware-assisted code reuse attack resisting defense system and method
CN107885503B (en) * 2017-11-11 2021-01-08 湖南大学 Iterative compilation optimization method based on program characteristic analysis
CN107885503A (en) * 2017-11-11 2018-04-06 湖南大学 A kind of iteration based on performance of program analysis compiles optimization method
CN108763486A (en) * 2018-05-30 2018-11-06 湖南写邦科技有限公司 Paper duplicate checking method, terminal and storage medium based on terminal
CN109445844A (en) * 2018-11-05 2019-03-08 浙江网新恒天软件有限公司 Code Clones detection method based on cryptographic Hash, electronic equipment, storage medium
CN109815996A (en) * 2019-01-07 2019-05-28 北京首钢自动化信息技术有限公司 It is a kind of based on the scene of Recognition with Recurrent Neural Network from adaptation method and device
CN109815996B (en) * 2019-01-07 2021-05-04 北京首钢自动化信息技术有限公司 Scene self-adaptation method and device based on recurrent neural network
CN110647666A (en) * 2019-09-03 2020-01-03 平安科技(深圳)有限公司 Intelligent matching method and device for template and formula and computer readable storage medium
CN110569629A (en) * 2019-09-10 2019-12-13 北京计算机技术及应用研究所 Binary code file tracing method
CN111290784B (en) * 2020-01-21 2021-08-24 北京航空航天大学 Program source code similarity detection method suitable for large-scale samples
CN111290784A (en) * 2020-01-21 2020-06-16 北京航空航天大学 Program source code similarity detection method suitable for large-scale samples
CN111241497A (en) * 2020-02-13 2020-06-05 北京高质系统科技有限公司 Open source code tracing detection method based on software multiplexing feature learning
CN113360134A (en) * 2020-03-06 2021-09-07 武汉斗鱼网络科技有限公司 Method, device, equipment and storage medium for generating security verification program
CN113360134B (en) * 2020-03-06 2022-06-17 武汉斗鱼网络科技有限公司 Method, device, equipment and storage medium for generating security verification program
US20220129417A1 (en) * 2020-10-22 2022-04-28 Google Llc Code Similarity Search
CN112257068A (en) * 2020-11-17 2021-01-22 南方电网科学研究院有限责任公司 Program similarity detection method and device, electronic equipment and storage medium
CN112799939A (en) * 2021-01-22 2021-05-14 网易(杭州)网络有限公司 Incremental code coverage rate testing method and device, storage medium and electronic equipment
WO2022166410A1 (en) * 2021-02-02 2022-08-11 华为技术有限公司 Method and apparatus for jumping between functions having symbols with different names, and computer readable storage medium
CN113703773B (en) * 2021-08-26 2022-07-19 北京计算机技术及应用研究所 NLP-based binary code similarity comparison method
CN113703773A (en) * 2021-08-26 2021-11-26 北京计算机技术及应用研究所 NLP-based binary code similarity comparison method
CN113590192A (en) * 2021-09-26 2021-11-02 北京迪力科技有限责任公司 Quality analysis method and related equipment
CN113722238B (en) * 2021-11-01 2022-04-26 北京大学 Method and system for realizing rapid open source component detection of source code file
CN113722238A (en) * 2021-11-01 2021-11-30 北京大学 Method and system for realizing rapid open source component detection of source code file
CN114995880A (en) * 2022-05-23 2022-09-02 北京计算机技术及应用研究所 Binary code similarity comparison method based on SimHash
CN114995880B (en) * 2022-05-23 2024-04-05 北京计算机技术及应用研究所 Binary code similarity comparison method based on SimHash

Also Published As

Publication number Publication date
CN106126235B (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN106126235A (en) A kind of multiplexing code library construction method, the quick source tracing method of multiplexing code and system
CN105426539B (en) A kind of lucene Chinese word cutting method based on dictionary
CN110727766B (en) Sensitive word detection method
Wang et al. Efficient approximate entity extraction with edit distance constraints
CN104700033A (en) Virus detection method and virus detection device
RU2016113791A (en) METHOD AND DEVICE FOR CONSTRUCTION OF PATTERN AND METHOD AND DEVICE FOR IDENTIFICATION OF INFORMATION
CN106033416A (en) A string processing method and device
US9128923B2 (en) Orthographical variant detection apparatus and orthographical variant detection method
CN111159697B (en) Key detection method and device and electronic equipment
CN105808709A (en) Quick retrieval method and device of face recognition
CN105045715B (en) Leak clustering method based on programming mode and pattern match
CN117077153B (en) Static application security detection false alarm discrimination method based on large-scale language model
CN108985065A (en) The Calculate Mahalanobis Distance of application enhancements carries out the method and system of firmware Hole Detection
CN109976806B (en) Java statement block clone detection method based on byte code sequence matching
CN113297580B (en) Code semantic analysis-based electric power information system safety protection method and device
CN102298681B (en) Software identification method based on data stream sliced sheet
CN105760762A (en) Unknown malicious code detection method for embedded processor
CN106874762A (en) Android malicious code detecting method based on API dependence graphs
CN109543846B (en) MVO (mechanical vapor deposition) improvement based DBSCAN (direct species analysis controller area network) mine water inrush spectrum identification method
CN115438340A (en) Mining behavior identification method and system based on morpheme characteristics
US11386340B2 (en) Method and apparatus for performing block retrieval on block to be processed of urine sediment image
CN110362813A (en) Relevance of searches measure, storage medium, equipment and system based on BM25
CN107656863A (en) A kind of data flow method of testing and its test system based on key point guiding
CN113139379B (en) Information identification method and system
CN104424332A (en) Unambiguous Japanese name list building method and name identification method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190507