CN111914260A

CN111914260A - Binary program vulnerability detection method based on function difference

Info

Publication number: CN111914260A
Application number: CN202010574987.3A
Authority: CN
Inventors: 晋武侠; 徐一飞; 刘烃
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-11-10
Anticipated expiration: 2040-06-22
Also published as: CN111914260B

Abstract

The invention discloses a binary program vulnerability detection method based on function difference, which is used for extracting patch features aiming at a known vulnerability function, carrying out feature matching in a suspected target function, identifying whether a corresponding patch is applied or not, and judging whether the known vulnerability is included or not. Firstly, determining a vulnerability related function, collecting a binary code containing the vulnerability function and a repaired function, and performing disassembly processing; secondly, determining the change between two versions of the same function by using a differential analysis technology and generating patch characteristics; and finally, screening out a suspected target function from the target program, positioning and representing a local key area in the target function, performing feature matching through similarity calculation to detect whether the target function contains a vulnerability, and completing vulnerability detection on the target program according to the vulnerability detection result. The method aims to quickly and accurately detect whether the target program contains the known vulnerability after the known vulnerability to be searched is given, and solves the problem of high false alarm rate of the existing vulnerability detection method based on function matching.

Description

Binary program vulnerability detection method based on function difference

Technical Field

The invention belongs to the technical field of binary program analysis and vulnerability detection, and particularly relates to a binary program vulnerability detection method based on function difference.

Background

Known vulnerabilities are those for which patches have been issued. With the increasingly mature development form of componentization, the completion and support of various third-party class libraries greatly improve the development efficiency, however, developers may not know all components of software in detail, and developers may pay more attention to the implementation of program functional logic, use the class libraries of old versions, or fail to update some components in time, if bugs which have been discovered and reported exist in the class libraries or components, the bugs may continue to influence the developed programs, and form a potential safety hazard. With the development and perfection of the software industry system, various commercial software and programs are greatly developed, closed-source software and programs for closing source codes are increased, binary codes become one of the main existing forms of software, and the situations of using and depending on the programs are not rare. Therefore, when detecting a bug in a program, it is necessary to deal with a case where information such as a source code and a version thereof cannot be obtained.

Most of the existing binary vulnerability detection methods are binary program similarity detection technologies based on function granularity. And searching similar functions in the files to be detected by taking the functions containing the known bugs as search targets, and judging the programs containing the similar functions as containing the bugs. Firstly, when a vulnerability function is highly similar to a non-hole-leaking function, false alarm is easily caused, and the non-hole-leaking function is judged to have a vulnerability; secondly, with the change of versions, the same function may include multiple changes (such as function update) in different versions, and changes (which may be regarded as noise) irrelevant to the vulnerability are formed in the function, which affects the overall similarity judgment of the function, thereby interfering with the judgment of the vulnerability and leading to an erroneous judgment result.

Disclosure of Invention

In order to solve the problem of high false alarm rate of the conventional method, the invention provides a binary program vulnerability detection method based on function difference, which is used for accurately judging whether a known vulnerability exists in a binary program or whether a corresponding patch exists in the binary program by constructing patch characteristics.

In order to achieve the purpose, the invention adopts the technical scheme that:

a binary program vulnerability detection method based on function difference comprises the following steps:

s1: constructing a feature extraction object vulnerability function VF and a repaired function PF;

s2: generating patch characteristics by using a binary function differential analysis technology for patch identification;

s3: taking a vulnerability function as an object, and screening out a function similar to the vulnerability function from a target program based on a binary function similarity detection technology to serve as a suspected target function TF;

s4: performing patch identification on the target function TF according to the similarity relation between the effective path set generated by the target function TF and the patch characteristics, if a patch can be identified in the target function, determining that the patch does not exist in the function, otherwise, determining that the function still contains the patch, wherein the method comprises the steps of determining a local key area related to the patch in the target function TF by using a binary function differential analysis technology, generating and reducing effective paths, and performing patch identification in the target function TF;

s5: judging whether the related functions of the yet-to-be-analyzed loopholes exist or not, if only one function is influenced by one loophole, continuing the method downwards, and if a plurality of functions are influenced by one loophole and still the related functions of the yet-to-be-analyzed loophole exist, returning to the step S1 to continue iteration;

s6: judging whether the target binary program contains the vulnerability or not according to the actual judgment condition of all functions related to the vulnerability to be searched in the target binary program, wherein if one vulnerability only affects one function, the judgment result of the function is consistent with the judgment result of the vulnerability; if a bug affects multiple functions, if more than one function is determined to contain a bug or the number of functions determined to contain a bug is greater than or equal to the number of functions determined to be repaired, the program is considered to still contain the bug, otherwise, the bug is considered to be repaired.

The invention further improves the following steps: in step S1, a function related to the vulnerability is determined according to the related information of the known vulnerability to be searched, the last binary code containing the vulnerability version VF and the first repaired version PF of the related function is collected, the disassembly processing is performed by the disassembly tool, and the control flow graph represented by the assembly code is used as the feature extraction object for the function VF and the PF respectively.

In step S2, performing differential analysis on the input vulnerability function and its repaired function by using a binary differential analysis technique, locating a boundary basic block BBB of all changed basic blocks CBB in the two functions based on the obtained differential analysis result of the two functions, and constructing a patch feature according to the boundary basic block BBB, where the changed basic block is a basic block added, deleted or modified between the functions; the boundary basic block refers to a neighbor node of the change basic block, but may be a change basic block itself.

In step S2, a plurality of local control flow graphs are formed in the function by connecting the basic variable blocks and the basic boundary blocks, and a set of all effective paths in the function is obtained by traversing the local control flow graphs, where an effective path VT is a continuous basic block sequence starting and ending with the basic boundary block and includes at least one basic variable block and no loop, and if a loop exists, the loop is flattened, and the set of effective paths generated by the vulnerability function is T₁The effective path set of the repaired function is T₂And taking the boundary basic block and the effective path set as patch features.

In step S2, for each effective path, first removing the jump instruction between the basic blocks, connecting the basic blocks in the path into an instruction sequence, and then normalizing the instructions in the effective path, including:

1) address standardization: replacing the specific address with "address";

2) memory standardization: memory addressing is replaced with "mem";

3) register normalization: the specific register is replaced with "reg".

In step S4, it is determined whether a target function has been patched, and if the target function has been patched, it is determined that a corresponding bug in the target function has been fixed, otherwise, it is determined that the bug still exists in the target function, and step S4 specifically includes:

s401: generating an effective path of an objective function: firstly, a target function TF is differentially analyzed with a vulnerability function VF and a repaired function PF respectively by using a binary differential analysis method, a basic block matching algorithm is used for matching a boundary basic block BBB of the PF in characteristics in the target function TF according to the differential analysis result of the TF and the VF, one or more local control flow graphs can be constructed by connecting the CBB in the TF and the boundary basic block BBB which is matched in the TF and adjacent to the CBB, the local control flow graphs are considered to be the embodiment of patches or vulnerability behaviors and are called as local key regions, and an effective path set T is generated by traversing the local control flow graphs₃Similarly, the active path set T is generated by concatenating the CBB in a VF with the BBB in the corresponding VF₄Through the differential analysis of the TF and the PF, for the differential analysis result of the TF and the PF, a boundary basic block BBB of the VF in the characteristics is matched in the target function TF by using a basic block matching algorithm, one or more local control flow graphs can be constructed by connecting the CBB in the TF with the boundary basic blocks BBB which are matched in the TF and adjacent to the CBB, and an effective path set T is generated by traversing the local control flow graphs₅Similarly, the effective path set T is generated by connecting CBBs in a PF with BBBs in the corresponding PF₆After generating the effective path set, merging the effective paths connected end to end, then for each effective path, firstly removing the jump instruction between the basic blocks, connecting the basic blocks in the path into an instruction sequence, and then standardizing the instructions in the effective paths, including:

1) address standardization: replacing the specific address with "address";

2) memory standardization: memory addressing is replaced with "mem";

3) register normalization: the specific register is replaced with "reg".

S402: reduction of the objective function path: for T₄If T is each valid path T in₁There is no path with t that contains a common CBB, then this valid path t isWill go from T₄To form a reduced effective path set T₄₁Similarly, T will also be passed₆And T₂Is compared from T₆The irrelative paths are reduced to form a reduced effective path set T₆₂。

S403: and (3) judging algorithm: deducing the relationship among the functions according to the similarity relationship of the differences among the functions to judge whether the target function is patched or not so as to complete vulnerability detection, wherein Sim (T, T') represents the similarity among the path sets, and the method specifically comprises the following three conditions:

case 1: t is₁And T₂Neither is empty: if the target function has been repaired, the difference T between TF and VF₃Should compare the difference T between TF and PF₅More pronounced, and T₃Should be associated with T in the patch feature₂More similar; if the target function still contains a vulnerability, the difference T between TF and PF₅Should compare the difference T between TF and VF₃More pronounced, and T₅Should be associated with T in the patch feature₁More similarly, therefore, in this case, if Sim (T)₃,T₂)>Sim(T₅,T₁) The target function is considered to be repaired, otherwise, the target function is considered to still contain the vulnerability;

case 2: t is₁Is empty, T₂Not empty: t is₁Being empty means that some new code has been added to the patch, if the target function has been patched, then T₃Will be related to T₂More similarly, and T₆₂Should be empty; if the target function still contains a bug, T₆₂And T₂Should be more similar, T₃Should be empty, so in this case, if Sim (T)₂,T₃)>Sim(T₂,T₆₂) The target function is considered to be repaired, otherwise, the target function is considered to still contain the vulnerability;

case 3: t is₂Is empty, T₁Not empty: t is₂Being empty means that the patch has some code deleted, similar to case 2, if Sim (T)₁,T₄₁)>Sim(T₁,T₅) The target function is considered as having been fixed, otherwise it is considered as still containing a vulnerability.

The invention further improves the following steps: in step S403, in order to calculate the similarity score between each pair of path sets, firstly, the similarity between the paths needs to be calculated, and the similarity score is calculated by comparing the instruction sequences of the two paths, and the specific calculation formula is as follows:

wherein, t₁And t₂Two paths for which a similarity score is to be calculated, edge (t)₁,t₂) For the edit distance between the two paths, len (t)₁) And len (t)₂) Are respectively a path t₁And t₂Length of (d);

after the similarity of each pair of paths is calculated, the final similarity between the two sets is calculated according to the following formula:

wherein, T₁And T₂For two sets of paths for which similarity scores are to be calculated, t₁And t₂Respectively from the set T₁And T₂Two paths of (d), Sim (t)₁,t₂) Is the similarity between two paths, | T₁I and I T₂L are respectively the set T₁And T₂Number of paths in, len (T)₁) And len (T)₂) Are respectively a set T₁And T₂The middle path total length.

Compared with the prior art, the invention has the beneficial effects that:

(1) the method for detecting the vulnerability of the binary program based on the function difference is provided, patch features are constructed to accurately judge whether corresponding patches exist in the binary program or not so as to judge whether known vulnerabilities exist or not, the problem that the existing vulnerability detection method based on the function matching is high in false alarm rate is solved, and the detection accuracy rate is improved;

(2) the binary program can be directly analyzed without depending on program source codes;

(3) only a small number of basic blocks are used for feature generation and patch identification, so that the speed and accuracy of patch identification are improved, and the method has the capability of analyzing large programs in a real scene;

(4) the boundary basic block is utilized to position a local key area in the target function to form a local control flow graph, so that the influence caused by modification irrelevant to the vulnerability in the target function is reduced, and the anti-interference capability is improved;

(5) whether a target object has a known vulnerability can be verified at both function and binary program granularity.

Drawings

FIG. 1 is an overall flow chart of the method of the present invention;

fig. 2 is a schematic diagram of the difference result between VF and PF, and fig. 2(a) and fig. 2(b) show control flow diagrams of dtls1_ process _ heartbeat () function in openssl1.0.1f and openssl1.0.1g, respectively;

fig. 3 is a schematic diagram of the difference result between TF and VF, and fig. 3(a) and fig. 3(b) respectively show control flow diagrams of dtls1_ process _ heartbeat () function in openssl1.0.1e and openssl1.0.1 f;

fig. 4 is a schematic diagram of the difference result between TF and PF, and fig. 4(a) and fig. 4(b) respectively show control flow diagrams of dtls1_ process _ heartbeat () function in openssl1.0.1e and openssl1.0.1 g.

Detailed Description

In order to make the objects, features and advantages of the present invention more apparent and understandable, embodiments of the present invention are described in detail below with reference to the accompanying drawings and examples.

Taking the known bug CVE-2014-0160 as an example, the binary program of the class library OpenSSL1.0.1e is taken as the target binary program to be detected to detect the bug.

As shown in fig. 1, a binary program vulnerability detection method based on function difference includes the following steps:

step S1: according to related information such as patches, determining a function related to a known vulnerability CVE-2014-0160 to be dtls1_ process _ heartbeat (), tls1_ process _ heartbeat (), and selecting dtls1_ process _ heartbeat () to analyze; collecting the last bug version function, namely the binary code of the function dtls1_ process _ heartbeat () in OpenSSL1.0.1f, and collecting the first repaired version function, namely the binary code of the function dtls1_ process _ heartbeat () in OpenSSL1.0.1 g; and disassembling the two functions by using a disassembling tool to obtain the control flow diagrams VF and PF which are expressed by assembly codes.

Step S2: carrying out differential analysis on the input vulnerability function VF and the repaired function PF, wherein the binary differential analysis technology is not specifically limited; based on the difference analysis result of the two functions, the boundary basic block BBB of all the changed basic blocks CBB in the two functions can be located, and fig. 2(a) and fig. 2(b) respectively represent the control flow diagrams of function dtls1_ process _ heartbeat () in openssl1.0.1f and openssl1.0.1 g; FIG. 2 shows the results of two functional difference analyses, where basic blocks A, C, D, F and basic blocks A ', C ', D ', E ', F ', G ', I ', L ', M ', N ' are CBB, basic blocks B, E, K, G and basic blocks B ', H ', P ', J ', U ' are BBB;

connecting the change basic block and the boundary basic block can form a plurality of local control flow diagrams in the function, for example, in fig. 2(a), the basic blocks E, F and G can form a local control flow diagram, and an effective path 'E- > F- > G' can be obtained by traversing the control flow diagram; all local control flow graphs are constructed and traversed, all effective paths in the functions can be obtained, and an effective path set T is formed₁As follows:

A-＞B； B-＞C-＞D-＞E；

A-＞C-＞D-＞E； B-＞C-＞D-＞G；

A-＞C-＞D-＞G； B-＞C-＞K；

A-＞C-＞K； E-＞F-＞G；

also, as shown in FIG. 2(b), an efficient path set T can be constructed₂As follows:

A′-＞B′； A′-＞C′-＞L′-＞U′；

A′-＞C′-＞D′-＞M′-＞′； A′-＞C′-＞D′-＞E′-＞F′-＞N′-＞U′；

A′-＞C′-＞D′-＞E′-＞F′-＞G′-＞H′； A′-＞C′-＞D′-＞E′-＞F′-＞G′-＞J′；

A′-＞C′-＞D′-＞E′-＞P′； B′-＞C′-＞L′-＞U′；

B′-＞C′-＞D′-＞M′-＞U′； B′-＞C′-＞D′-＞E′-＞F′-＞N′-＞U′；

B′-＞C′-＞D′-＞E′-＞F′-＞G′-＞H′； B′-＞C′-＞D′-＞E′-＞F′-＞G′-＞J′；

B′-＞C′-＞D′-＞E′-＞P′； H′-＞I′-＞J′；

then, for each effective path, removing jump instructions among basic blocks, connecting the basic blocks in the path into an instruction sequence, and then normalizing the instructions in all effective paths according to a normalization rule, for example, an instruction sequence segment of an effective path a- > B is changed before and after normalization as follows:

after generating the effective path set, merging and optimizing the effective paths connected end to end, for example, the effective paths 'A- > B', 'B- > C- > D- > E' can be merged and optimized into 'A- > B- > C- > D- > E';

and finally, taking the boundary basic block and the effective path set as patch features.

Step S3: the method comprises the steps of taking a vulnerability function as an object, searching a function similar to the vulnerability function in a target program OpenSSL1.0.1e based on a binary function similarity detection technology, obtaining a function dtls1_ process _ heartbeat () in the OpenSSL1.0.1e as a suspected target function, and reducing a search space, wherein the binary function similarity detection technology is not specifically limited.

Step S4: in the step, vulnerability detection is carried out on the suspected target function, and the specific steps are as follows:

step S401: firstly, a binary differential analysis method is used for carrying out differential analysis on a target function TF and a vulnerability function VF, as shown in FIG. 3, the analysis result of the TF and the VF is the same as that of a function dtls1_ process _ heartbeat () in OpenSSL1.0.1e and that in OpenSSL1.0.1f, so that no changed basic block exists; then, a basic block matching algorithm is used for matching a boundary basic block BBB in the PF in the features in an objective function TF to be matched with basic blocks B, E, K, G and J in the (a) of the figure 3; at this time, CBB and BBB cannot form a local control flow graph, i.e. no effective path set, T, is generated₃Is an empty set; similarly, there is no basic block that has changed in VF, CBB and BBB cannot form a local control flow graph, i.e. no effective path set, T₄Is an empty set;

then, a binary differential analysis method is used for carrying out differential analysis on the target function TF and the repaired function PF, as shown in FIG. 4, the analysis results of the TF and the PF are shown, and the basic blocks A, C, D and F and the basic blocks A ', C', D ', E', F ', G', I ', L', M 'and N' are changed basic blocks; then, a basic block matching algorithm is used for matching a boundary basic block BBB in the VF in the characteristics in an objective function TF to match basic blocks B, E, K and G in the (a) of the graph 4; local control flow graphs can be formed by connecting CBBs and BBBs, and an effective path set T is generated by traversing the local control flow graphs₅As follows:

A-＞B； B-＞C-＞D-＞E；

A-＞C-＞D-＞E； B-＞C-＞D-＞G；

A-＞C-＞D-＞G； B-＞C-＞K；

A-＞C-＞K； E-＞F-＞G；

similarly, the effective path set T is generated by connecting CBBs in a PF with BBBs in the corresponding PF₆As follows:

A′-＞B′； A′-＞C′-＞L′-＞U′；

A′-＞C′-＞D′-＞M′-＞U′； A′-＞C′-＞D′-＞E′-＞F′-＞N′-＞U′；

A′-＞C′-＞D′-＞E′-＞P′； B′-＞C′-＞L′-＞U′；

B′-＞C′-＞D′-＞E′-＞P′； H′-＞I′-＞J′；

after generating an effective path set, merging and optimizing the effective paths connected end to end;

finally, performing instruction connection on the effective paths in all the path sets, and performing standardization processing according to a standardization rule;

step S402: in this example, due to T₄Is empty, does not need reduction, and is at T₆For each valid path T, at T₂There is a path and it contains the same CBB, so no irrelevant path is reduced to form the effective path set T₆₂；

Step S403: in this example, the condition of case one is met, so the determination is made according to case one: respectively calculating the similarity between paths and the similarity between path sets according to a formula to finally obtain Sim (T)₃,T₂)<Sim(T₅,T₁) Therefore, the determination target function TF still contains a vulnerability.

Step S5: returning to step S1 to perform vulnerability detection in the target program by taking the function tls1_ process _ heartbeat () as an object because an unanalyzed related function tls1_ process _ heartbeat () still exists; when the process is started again after the function tls1_ process _ heartbeat () is analyzed, the process goes to step S6 if there is no unanalyzed correlation function.

Step S6: through the determination of the above steps, if it is considered that functions highly similar to the function dtls1_ process _ heartbeat () and the function tls1_ process _ heartbeat () exist in the binary program of openssl1.0.1e and both contain bugs, it is determined that the binary program openssl1.0.1e contains bugs CVE-2014-0160, and the algorithm is ended.

To summarize: compared with the existing method, the method provided by the invention has the advantages that under the given input, the characteristics are constructed based on the function difference analysis method, whether the related functions in the target binary program are patched or not can be judged, the detection of the known bugs is realized, the false alarm rate is reduced, and the accuracy rate is improved.

Claims

1. A binary program vulnerability detection method based on function difference is characterized by comprising the following steps:

2. The method for binary program vulnerability detection based on function difference according to claim 1, wherein in step S1, a function related to a vulnerability is determined according to related information of a known vulnerability to be searched, a last binary code containing a vulnerability version VF and a first repaired version PF of the related function is collected, and a disassembly tool is used to perform disassembly processing, so as to respectively construct a control flow graph represented by the assembly code for the functions VF and PF as a feature extraction object.

3. The method for binary program vulnerability detection based on function difference as claimed in claim 1, wherein in step S2, the binary difference analysis technique is used to perform difference analysis on the input vulnerability function and its repaired function, based on the obtained difference analysis result of the two functions, the boundary basic block BBB of all changed basic blocks CBB in the two functions is located, and patch features are constructed accordingly, wherein the changed basic blocks refer to basic blocks added, deleted or modified between functions; the boundary basic block refers to a neighbor node of the change basic block, but may be a change basic block itself.

4. The method according to claim 3, wherein in step S2, a plurality of local control flow graphs are formed in the function by connecting the basic blocks of variation and the basic blocks of boundary, and all valid paths set in the function are obtained by traversing these local control flow graphs, wherein a valid path VT is a continuous basic path starting and ending with the basic blocks of boundaryThe block sequence at least comprises a change basic block and does not contain a loop, if the loop exists, the loop is flattened, and an effective path set generated by a vulnerability function is T₁The effective path set of the repaired function is T₂And taking the boundary basic block and the effective path set as patch features.

5. The method according to claim 4, wherein in step S2, for each valid path, the step of removing jump instructions between basic blocks, connecting the basic blocks in the path into an instruction sequence, and then normalizing the instructions in the valid path comprises:

1) address standardization: replacing the specific address with "address";

2) memory standardization: memory addressing is replaced with "mem";

3) register normalization: the specific register is replaced with "reg".

6. The method for detecting vulnerabilities of a binary program based on function differentiation according to claim 1, wherein in step S4, it is determined whether an object function has been patched, if the object function has been patched, it is determined that a corresponding vulnerability in the object function has been fixed, otherwise, it is determined that the vulnerability still exists in the object function, and step S4 specifically includes:

s401: generating an effective path of an objective function: firstly, a target function TF is differentially analyzed with a vulnerability function VF and a repaired function PF respectively by using a binary differential analysis method, a basic block matching algorithm is used for matching a boundary basic block BBB of the PF in characteristics in the target function TF according to the differential analysis result of the TF and the VF, one or more local control flow graphs can be constructed by connecting the CBB in the TF and the boundary basic block BBB which is matched in the TF and adjacent to the CBB, the local control flow graphs are considered to be the embodiment of patches or vulnerability behaviors and are called as local key regions, and an effective path set T is generated by traversing the local control flow graphs₃The number of the first and second electrodes is, similarly,generating an active path set T by concatenating CBBs in VFs with BBBs in corresponding VFs₄Through the differential analysis of the TF and the PF, for the differential analysis result of the TF and the PF, a boundary basic block BBB of the VF in the characteristics is matched in the target function TF by using a basic block matching algorithm, one or more local control flow graphs can be constructed by connecting the CBB in the TF with the boundary basic blocks BBB which are matched in the TF and adjacent to the CBB, and an effective path set T is generated by traversing the local control flow graphs₅Similarly, the effective path set T is generated by connecting CBBs in a PF with BBBs in the corresponding PF₆After generating the effective path set, merging the effective paths connected end to end, then for each effective path, firstly removing the jump instruction between the basic blocks, connecting the basic blocks in the path into an instruction sequence, and then standardizing the instructions in the effective paths, including:

1) address standardization: replacing the specific address with "address";

2) memory standardization: memory addressing is replaced with "mem";

3) register normalization: the specific register is replaced with "reg".

S402: reduction of the objective function path: for T₄If T is each valid path T in₁There is no path with T that contains a common CBB, then the valid path T is from T₄To form a reduced effective path set T₄₁Similarly, T will also be passed₆And T₂Is compared from T₆The irrelative paths are reduced to form a reduced effective path set T₆₂。

case 1: t is₁And T₂Neither is empty: if the target function has been repaired, the difference T between TF and VF₃Should be proportional between TF and PFDifference T₅More pronounced, and T₃Should be associated with T in the patch feature₂More similar; if the target function still contains a vulnerability, the difference T between TF and PF₅Should compare the difference T between TF and VF₃More pronounced, and T₅Should be associated with T in the patch feature₁More similarly, therefore, in this case, if Sim (T)₃,T₂)>Sim(T₅,T₁) The target function is considered to be repaired, otherwise, the target function is considered to still contain the vulnerability;

7. The method according to claim 6, wherein in step S403, in order to calculate the similarity score between each pair of path sets, firstly, the similarity between the paths needs to be calculated, and the similarity score is calculated by comparing the instruction sequences of the two paths, and the specific calculation formula is as follows:

wherein, t₁And t₂For the two paths for which the similarity score is to be calculated,edit(t₁,t₂) For the edit distance between the two paths, len (t)₁) And len (t)₂) Are respectively a path t₁And t₂Length of (d);