KR102318714B1

KR102318714B1 - Computet program for detecting software vulnerability based on binary code clone

Info

Publication number: KR102318714B1
Application number: KR1020200034311A
Authority: KR
Inventors: 이희조; 장하진; 양경석
Original assignee: 고려대학교 산학협력단
Priority date: 2020-01-31
Filing date: 2020-03-20
Publication date: 2021-10-28
Also published as: KR20210098297A

Abstract

전술한 과제를 해결하기 위한 본 개시의 일 실시예에서, 하나 이상의 프로세서들에 의해 실행 가능한 컴퓨턴 판독가능 저장 매체에 저장된 컴퓨터 프로그램으로서, 상기 컴퓨터 프로그램은 상기 하나 이상의 프로세서로 하여금 소프트웨어의 바이너리 코드에 기반하여 취약점을 판별하기 위한 이하의 동작들을 수행하도록 하며, 상기 동작들은: 소프트웨어의 취약점 정보에 대한 전처리를 수행하여 취약점 시그니처(Vulnerability Signature)를 생성하는 동작, 타겟 바이너리 코드에 대한 전처리를 수행하여 타겟 함수 지문(Target Function Fingerprint)을 생성하는 동작 및 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하는 동작을 포함할 수 있다.In one embodiment of the present disclosure for solving the above problems, as a computer program stored in a computer-readable storage medium executable by one or more processors, the computer program causes the one or more processors to convert the binary code of software. The following operations are performed to determine a vulnerability based on the It may include an operation of generating a target function fingerprint and an operation of determining whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint.

Description

COMPUTET PROGRAM FOR DETECTING SOFTWARE VULNERABILITY BASED ON BINARY CODE CLONE}

본 개시는 소프트웨어의 취약점을 탐지하는 방법에 관한 것으로, 보다 구체적으로 소프트웨어의 바이너리에서 알려진 취약 코드의 코드 클론을 빠르고 정확하게 탐지하여 취약점의 유무를 판별하기 위한 컴퓨터 프로그램에 관한 것이다.The present disclosure relates to a method of detecting a vulnerability in software, and more particularly, to a computer program for quickly and accurately detecting a code clone of a known vulnerable code in a binary of software to determine the presence or absence of a vulnerability.

라이브러리는 코드 작성에 자주 사용되는 로직을 재활용 가능하도록 만든 소프트웨어로, 특정 기능의 구현이 가능하며 타 소프트웨어와 결합할 수 있는 인터페이스를 가진 소프트웨어이다. 또한, 라이브러리는 사용자 자신이 사용하기 위해 만들 수 있고, 불특정 다수가 사용할 수 있도록 공개할 수도 있다. 라이브러리를 사용하면 필요한 로직을 직접 구현할 필요가 없이 라이브러리에서 제공하는 API (Application Program Interface)를 호출하는 것을 통해서 원하는 기능을 사용할 수 있게 된다. A library is software that makes it possible to reuse logic frequently used in code writing. It is software that can implement specific functions and has an interface that can be combined with other software. In addition, the library can be created for the user's own use, or can be made public for use by an unspecified number of people. If you use the library, you can use the desired function by calling the API (Application Program Interface) provided by the library without having to implement the necessary logic yourself.

오픈소스 소프트웨어(Open Source Software, OSS)란 소스코드가 공개되어 있으면서 오프소스 라이선스를 준수하는 한 누구나 특별한 제한없이 재사용, 수정 및 재배포가 가능한 소프트웨어를 말한다. Open Source Software (OSS) refers to software that can be reused, modified and redistributed by anyone without special restrictions as long as the source code is open and the open source license is complied with.

상술한 바와 같은 라이브러리 및 오픈소스 소프트웨어를 활용하는 경우, 개발 시간 및 비용을 단축시킬 수 있는 장점이 존재하나, 코드 클론의 발생 우려가 존재한다. 여기서, 코드 클론이란, 동일하거나, 또는 유사한 코드가 여러 소프트웨어 또는 하나의 소프트웨어 내에서 중복되는 현상으로, 주로 외부 코드의 재사용이 주 발생 원인이다. 이러한 코드 클론은 소프트웨어에 문제를 야기시킬 수 있다. 예를 들어, 특정 코드로부터 취약점이 발견되는 경우, 개발자는 해당 코드에 대응하는 모든 코드 클론을 식별하여 수정하여야 한다.In the case of using the library and open source software as described above, there is an advantage of reducing development time and cost, but there is a concern about the occurrence of code clones. Here, the code clone is a phenomenon in which the same or similar code is duplicated in several software or one software, and the main cause of occurrence is the reuse of external code. These code clones can cause problems with the software. For example, when a vulnerability is found in a specific code, the developer must identify and correct all code clones corresponding to the code.

다시 말해, 라이브러리 및 오픈소스 소프트웨어는 널리 사용되는 것을 목적으로 하기 때문에 취약점을 내포한 채 배포되는 경우, 수많은 소프트웨어에 취약점을 전파하는 창구로 작용할 수 있다. 또한, 특정 소프트웨어의 취약점을 패치한 후에도 해당 소프트웨어에 의존하는 타 소프트웨어에 패치가 적용되기까지 오랜 시간이 소요되는 문제가 발생할 우려가 있다. 또한, 하나의 소스코드는 컴파일 환경에 따라 다양한 형태의 바이너리로 컴파일이 가능(즉, 다양한 기능적 패치 가능)하나, 소프트웨어의 취약점 패치는 기능 패치에 비해 소규모인 경우가 많다. 이러한 환경에서, 바이너리에서 발생한 소규모의 코드 변화의 원인이 컴파일 환경 변화인지 또는 소프트웨어의 취약점 패치인지 여부를 판별하기 어려울 수 있다. 뿐만 아니라, 코드 클로닝된 소스코드를 컴파일한 바이너리에서 취약한 부분이 존재하는지 여부에 대한 판별이 어려울 수도 있다.In other words, since libraries and open source software are intended to be widely used, if they are distributed with vulnerabilities, they can act as a window for spreading vulnerabilities to numerous software. In addition, even after patching a vulnerability in a specific software, there is a risk that it takes a long time for the patch to be applied to other software that depends on the software. In addition, one source code can be compiled into various types of binaries depending on the compilation environment (ie, various functional patches are possible), but software vulnerability patches are often smaller than functional patches. In such an environment, it may be difficult to determine whether the cause of a small code change occurring in a binary is a change in the compilation environment or a patch for vulnerabilities in software. In addition, it may be difficult to determine whether a vulnerable part exists in a binary compiled of the code cloned source code.

이에 따라, 두 바이너리 함수의 유사도를 계산하여 코드 클론을 탐지하는 종래의 기술이 존재한다. 구체적으로, 종래의 바이너리 코드 클론 탐지 방법은, 원본 바이너리 코드 및 코드 클론 의심 바이너리 코드의 전부 또는 일부를 비교하거나, 구문 분석에 기초하여 추출된 주요 구문 또는 토큰을 비교함으로써, 코드 클론을 탐지할 수 있다. Accordingly, there is a conventional technique for detecting a code clone by calculating the similarity of two binary functions. Specifically, the conventional binary code clone detection method can detect code clones by comparing all or part of the original binary code and the code clone suspicious binary code, or by comparing key phrases or tokens extracted based on parsing analysis. have.

다만, 종래의 코드 클론 탐지 방법은, 바이너리 함수 쌍의 유사도를 계산하여 코드 클론의 존재 여부를 탐지할 뿐, 해당 코드에 취약점이 존재하는지 여부를 판별하기 어려우며, 규모 가변성의 한계가 존재할 수 있다. 또한, 종래의 방법은, 바이너리 코드에 대한 문자열 비교를 수행하므로, 비교적 많은 컴퓨팅 자원을 소모하며, 방대한 작업 시간이 요구될 수 있다. However, the conventional code clone detection method only detects the existence of a code clone by calculating the similarity of binary function pairs, and it is difficult to determine whether a vulnerability exists in the corresponding code, and there may be a limit of scalability. In addition, since the conventional method performs string comparison on binary codes, it consumes relatively large computing resources and may require a large amount of work time.

따라서, 소프트웨어 바이너리에서 알려진 취약점을 규모 가변성 있게 탐지하는 방법에 대한 수요가 당 업계에 존재할 수 있다.Accordingly, there may be a need in the art for a method for scalable detection of known vulnerabilities in software binaries.

한국등록특허 2014-0001951Korean Patent Registration 2014-0001951

본 개시는 전술한 배경기술에 대응하여 안출된 것으로, 소프트웨어의 바이너리에서 알려진 취약 코드의 코드 클론을 빠르고 정확하게 탐지하여 취약점의 유무를 판별하는 컴퓨터 프로그램을 제공하기 위한 것이다.The present disclosure has been made in response to the above-described background technology, and is intended to provide a computer program that detects a code clone of a known vulnerable code in a software binary quickly and accurately to determine the presence or absence of a vulnerability.

전술한 과제를 해결하기 위한 본 개시의 일 실시예에 소프트웨어의 바이너리 코드에 기반하여 취약점을 판별하기 위한 컴퓨터 프로그램이 개시된다. 상기 컴퓨터 프로그램은 하나 이상의 프로세서에서 실행되는 경우, 상기 하나 이상의 프로세서들로 하여금 소프트웨어의 바이너리 코드에 기반하여 취약점을 판별하기 위한 이하의 동작들을 수행하도록 하며, 상기 동작들은, 소프트웨어의 취약점 정보에 대한 전처리를 수행하여 취약점 시그니처를 생성하는 동작, 타겟 바이너리 코드에 대한 전처리를 수행하여 타겟 함수 지문을 생성하는 동작 및 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하는 동작을 포함할 수 있다.In an embodiment of the present disclosure for solving the above problems, a computer program for determining a vulnerability based on a binary code of software is disclosed. When the computer program is executed on one or more processors, the one or more processors cause the one or more processors to perform the following operations for determining a vulnerability based on a binary code of the software, wherein the operations are pre-processing for vulnerability information of the software It is determined whether a vulnerability exists in the target software through the operation of generating a vulnerability signature by performing It may include an action to

대안적으로, 상기 취약점 정보는, 소스 코드 형태의 취약점 패치 정보, 취약점에 관련한 바이너리 코드 및 취약점이 패치된 바이너리 코드를 포함할 수 있다.Alternatively, the vulnerability information may include vulnerability patch information in the form of source code, a binary code related to a vulnerability, and a binary code in which the vulnerability is patched.

대안적으로, 상기 소프트웨어의 취약점 정보에 대한 전처리를 수행하여 취약점 시그니처를 생성하는 동작은, 상기 취약점 정보에 기초하여 변경에 관련한 바이너리 코드를 식별하는 동작, 상기 식별된 변경에 관련한 바이너리 코드에 기초하여 취약 함수 지문(Vulnerability Function Fingerprint) 및 패치 함수 지문(Patched Function Fingerprint)을 생성하는 동작 및 상기 취약 함수 지문 및 상기 패치 함수 지문을 포함하는 취약점 시그니처를 생성하는 동작을 포함할 수 있다.Alternatively, the operation of generating a vulnerability signature by performing pre-processing on the vulnerability information of the software may include: identifying a binary code related to a change based on the vulnerability information; based on the identified binary code related to the change It may include generating a vulnerability function fingerprint and a patched function fingerprint, and generating a vulnerability signature including the vulnerable function fingerprint and the patched function fingerprint.

대안적으로, 상기 식별된 변경에 관련한 바이너리 코드에 기초하여 취약 함수 지문 및 패치 함수 지문을 생성하는 동작은, 상기 식별된 변경에 관련한 바이너리 코드를 기초하여 실행 가능한 함수 및 바이너리 코드의 속성 정보를 획득하는 동작, 상기 실행 가능한 함수에 기초하여 함수의 속성 정보를 획득하는 동작, 상기 함수 내의 독립적인 코드의 흐름에 기초하여 상기 함수를 하나 이상의 스트랜드(Strand)로 분할하는 동작 및 상기 하나 이상의 스트랜드 각각에 상기 함수의 속성 정보 및 상기 바이너리의 속성 정보를 맵핑하여 상기 취약 함수 지문 및 상기 패치 함수 지문을 생성하는 동작을 포함할 수 있다. Alternatively, the generating of the weak function fingerprint and the patch function fingerprint based on the binary code related to the identified change may include: obtaining attribute information of the executable function and the binary code based on the binary code related to the identified change operation, obtaining property information of a function based on the executable function, dividing the function into one or more strands based on an independent code flow in the function, and in each of the one or more strands and generating the weak function fingerprint and the patch function fingerprint by mapping the attribute information of the function and the attribute information of the binary.

대안적으로, 상기 스트랜드는, 하나의 변수의 값을 산출하기 위해 필요한 명령어들의 집합으로, 상기 유사도 비교에 기준이 되는 단위일 수 있다. Alternatively, the strand is a set of instructions necessary to calculate the value of one variable, and may be a unit that is a standard for the similarity comparison.

대안적으로, 상기 타겟 바이너리 코드에 대한 전처리를 수행하여 타겟 함수 지문을 생성하는 동작은, 상기 타겟 바이너리 코드를 기초하여 실행 가능한 타겟 함수 및 상기 타겟 바이너리 코드의 속성 정보를 획득하는 동작, 상기 타겟 함수에 기초하여 타겟 함수의 속성 정보를 획득하는 동작, 상기 타겟 함수 내의 독립적인 코드의 흐름에 기초하여 상기 타겟 함수를 하나 이상의 스트랜드로 분할하는 동작 및 상기 하나 이상의 스트랜드 각각에 상기 타겟 함수의 속성 정보 및 상기 타겟 바이너리의 속성 정보를 맵핑하여 상기 타겟 함수 지문을 생성하는 동작을 포함할 수 있다.Alternatively, the operation of generating a target function fingerprint by performing preprocessing on the target binary code includes: acquiring an executable target function based on the target binary code and property information of the target binary code; the target function Obtaining property information of the target function based on and generating the target function fingerprint by mapping attribute information of the target binary.

대안적으로, 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교는, 상기 취약점 시그니처에 대응하는 함수 지문 및 상기 타겟 함수 지문 각각을 사전 결정된 단위로 분할하고, 그리고 분할된 모든 단위 중 얼마나 많은 단위가 유사한지 여부에 대한 비교를 수행하는 N-Gram 유사도 비교일 수 있다. Alternatively, the similarity comparison between the vulnerability signature and the target function fingerprint may include dividing each of the function fingerprint corresponding to the vulnerability signature and the target function fingerprint into a predetermined unit, and how many units of all the divided units are similar. It may be an N-Gram similarity comparison that performs a comparison on whether or not.

대안적으로, 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하는 동작은, 상기 취약점 시그니처에 포함된 취약 함수 지문과 상기 타겟 함수 지문 각각에 포함된 함수 간의 제 1 유사도를 산출하는 동작, 상기 취약점 시그니처에 포함된 패치 함수 지문과 상기 타겟 함수 지문 각각에 포함된 함수 간의 제 2 유사도를 산출하는 동작 및 상기 제 1 유사도 및 상기 제 2 유사도의 차이에 기초하여 상기 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하는 동작을 포함할 수 있다.Alternatively, the operation of determining whether a vulnerability exists in the target software through a similarity comparison between the vulnerability signature and the target function fingerprint includes a weak function fingerprint included in the vulnerability signature and a function included in each of the target function fingerprint Based on the operation of calculating a first degree of similarity between to determine whether a vulnerability exists in the target software.

대안적으로, 상기 제 1 유사도는 상기 소프트웨어에서 취약점이 판별될 확률과 양의 상관 관계를 가지며, 상기 제 2 유사도는 상기 소프트웨어에서 취약점이 판별될 확률과 음의 상관 관계를 가질 수 있다.Alternatively, the first similarity may have a positive correlation with a probability that a vulnerability is determined in the software, and the second similarity may have a negative correlation with a probability that a vulnerability is determined in the software.

대안적으로, 상기 제 1 유사도 및 상기 제 2 유사도의 차이에 기초하여 상기 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하는 동작은, 상기 제 1 유사도와 제 2 유사도의 차이가 사전 결정된 기준을 초과하는 경우, 상기 타겟 소프트웨어에 취약점이 존재하는 것으로 판별하는 동작을 포함할 수 있다.Alternatively, the determining whether a vulnerability exists in the target software based on the difference between the first similarity and the second similarity may include: the difference between the first similarity and the second similarity exceeding a predetermined criterion; In this case, it may include an operation of determining that a vulnerability exists in the target software.

대안적으로, 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하는 동작은, 상기 취약점 시그니처 및 상기 타겟 함수 지문에 대한 1차 필터링을 수행하여 후보 취약점 시그니처 및 후보 타겟 함수 지문을 선별하는 동작을 포함하고, 그리고 상기 1차 필터링은, 상기 취약점 시그니처 및 상기 타겟 함수 지문 각각에서 취약점에 관련한 코드를 식별하기 위한 필터링일 수 있다.Alternatively, the operation of determining whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint may include performing primary filtering on the vulnerability signature and the target function fingerprint to obtain a candidate vulnerability signature. and selecting a candidate target function fingerprint, and the primary filtering may be filtering for identifying a code related to a vulnerability in each of the vulnerability signature and the target function fingerprint.

대안적으로, 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하는 동작은, 상기 후보 취약점 시그니처 및 후보 타겟 함수 지문에 대한 2차 필터링을 수행하여 유사 스트랜드를 식별하는 동작을 더 포함할 수 있다. Alternatively, the operation of determining whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint may include performing secondary filtering on the candidate vulnerability signature and the candidate target function fingerprint to obtain a similar strand. It may further include an operation of identifying

본 개시의 다른 실시예에서 컴퓨팅 장치의 프로세서에서 수행되는 소프트웨어의 타겟 바이너리 코드에 기반하여 취약점을 판별하기 위한 방법이 개시된다. 상기 방법은, 컴퓨팅 장치에 포함된 프로세서가 소프트웨어의 취약점 정보에 대한 전처리를 수행하여 취약점 시그니처를 생성하는 단계, 상기 프로세서가 타겟 바이너리 코드에 대한 전처리를 수행하여 타겟 함수 지문을 생성하는 단계 및 상기 프로세서가 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하는 단계를 포함할 수 있다. In another embodiment of the present disclosure, a method for determining a vulnerability based on a target binary code of software executed in a processor of a computing device is disclosed. The method includes the steps of, by a processor included in a computing device, performing pre-processing on vulnerability information of software to generate a vulnerability signature, the processor performing pre-processing on a target binary code to generate a target function fingerprint, and the processor determining whether a vulnerability exists in the target software through a similarity comparison between the vulnerability signature and the target function fingerprint.

본 개시의 또 다른 일 실시예에 소프트웨어의 바이너리 코드에 기반하여 취약점을 판별하기 위한 컴퓨팅 장치가 개시된다. 상기 컴퓨팅 장치는 하나 이상의 코어를 포함하는 프로세서, 상기 프로세서에서 실행가능한 프로그램 코드들은 포함하는 메모리 및 클라이언트와 데이터를 송수신하는 네트워크부를 포함하고, 그리고 상기 프로세서는, 소프트웨어의 취약점 정보에 대한 전처리를 수행하여 취약점 시그니처를 생성하고, 타겟 바이너리 코드에 대한 전처리를 수행하여 타겟 함수 지문을 생성하고, 그리고 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별할 수 있다. In another embodiment of the present disclosure, a computing device for determining a vulnerability based on a binary code of software is disclosed. The computing device includes a processor including one or more cores, a memory including program codes executable in the processor, and a network unit for transmitting and receiving data to and from a client, and the processor performs pre-processing on vulnerability information of software, It is possible to generate a vulnerability signature, perform preprocessing on the target binary code to generate a target function fingerprint, and determine whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint.

본 개시는 소프트웨어의 바이너리에서 알려진 취약 코드의 코드 클론을 빠르고 정확하게 탐지하여 취약점의 유무를 판별하는 컴퓨터 프로그램을 제공할 수 있다.The present disclosure may provide a computer program that detects a code clone of a known vulnerable code in a binary of software quickly and accurately to determine the presence or absence of a vulnerability.

다양한 양상들이 이제 도면들을 참조로 기재되며, 여기서 유사한 참조 번호들은 총괄적으로 유사한 구성요소들을 지칭하는데 이용된다. 이하의 실시예에서, 설명 목적을 위해, 다수의 특정 세부사항들이 하나 이상의 양상들의 총체적 이해를 제공하기 위해 제시된다. 그러나, 그러한 양상(들)이 이러한 구체적인 세부사항들 없이 실시될 수 있음은 명백할 것이다.
도 1은 본 개시의 일 실시예와 관련된 컴퓨팅 장치의 블록 구성도를 도시한다.
도 2는 본 개시의 일 실시예와 관련된 바이너리 코드 클론 기반 소프트웨어의 취약점 판별하기 위한 시스템에 대한 개략도를 도시한다.
도 3은 본 개시의 일 실시예와 관련된 취약점 시그니처를 전처리하는 과정에 대한 예시적인 순서도를 도시한다.
도 4는 본 개시의 일 실시예와 관련된 타겟 함수 지문을 전처리하는 과정에 대한 예시적인 순서도를 도시한다.
도 5는 본 개시의 일 실시예와 관련된 코드 클론 탐지 및 취약점 판별하는 과정에 대한 예시적인 순서도를 도시한다.
도 6은 본 개시의 일 실시예와 관련된 코드 흐름 쌍의 유사도 산출 방법을 설명하기 위한 예시도를 도시한다.
도 7은 본 개시의 일 실시예와 관련된 바이너리 코드 클론 기반 소프트웨어의 취약점 판별하기 위한 시스템의 처리 과정을 예시적으로 나타낸 모식도이다.
도 8은 본 개시의 일 실시예와 관련된 취약점 정보 전처리 및 바이너리 코드 전처리 과정을 예시적으로 나타낸 모식도이다.
도 9는 본 개시의 일 실시예와 관련된 코드 클론 탐지 및 취약점 판별 과정을 예시적으로 나타낸 모식도이다.
도 10은 본 개시의 일 실시예와 관련된 바이너리 코드 클론 기반 소프트웨어의 취약점 판별 방법에 대한 예시적인 순서도를 도시한다.
도 11은 바이너리 코드 클론 기반 소프트웨어의 취약점 판별 방법을 구현하기 위한 로직의 예시적인 순서도를 도시한다.
도 12는 본 개시의 일 실시예들이 구현될 수 있는 예시적인 컴퓨팅 환경에 대한 간략하고 일반적인 개략도를 도시한다.Various aspects are now described with reference to the drawings, in which like reference numbers are used to refer to like elements collectively. In the following examples, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects. It will be evident, however, that such aspect(s) may be practiced without these specific details.
1 illustrates a block diagram of a computing device according to an embodiment of the present disclosure.
2 is a schematic diagram of a system for determining vulnerabilities of binary code clone-based software related to an embodiment of the present disclosure.
3 shows an exemplary flowchart of a process of pre-processing a vulnerability signature related to an embodiment of the present disclosure.
4 shows an exemplary flowchart of a process of preprocessing a target function fingerprint according to an embodiment of the present disclosure.
5 shows an exemplary flowchart of a process for detecting code clones and determining vulnerabilities related to an embodiment of the present disclosure.
6 is an exemplary diagram illustrating a method of calculating a similarity of a code flow pair related to an embodiment of the present disclosure.
7 is a schematic diagram exemplarily illustrating a processing process of a system for determining a vulnerability of binary code clone-based software related to an embodiment of the present disclosure.
8 is a schematic diagram exemplarily illustrating a process of preprocessing vulnerability information and preprocessing a binary code related to an embodiment of the present disclosure.
9 is a schematic diagram exemplarily illustrating a code clone detection and vulnerability determination process related to an embodiment of the present disclosure.
10 shows an exemplary flowchart of a method for determining a vulnerability of binary code clone-based software according to an embodiment of the present disclosure.
11 shows an exemplary flowchart of logic for implementing a method for determining a vulnerability of binary code clone-based software.
12 depicts a simplified, general schematic diagram of an exemplary computing environment in which embodiments of the present disclosure may be implemented.

다양한 실시예들이 이제 도면을 참조하여 설명된다. 본 명세서에서, 다양한 설명들이 본 개시의 이해를 제공하기 위해서 제시된다. 그러나, 이러한 실시예들은 이러한 구체적인 설명 없이도 실행될 수 있음이 명백하다.Various embodiments are now described with reference to the drawings. In this specification, various descriptions are presented to provide an understanding of the present disclosure. However, it is apparent that these embodiments may be practiced without these specific descriptions.

본 명세서에서 사용되는 용어 "컴포넌트", "모듈", "시스템" 등은 컴퓨터-관련 엔티티, 하드웨어, 펌웨어, 소프트웨어, 소프트웨어 및 하드웨어의 조합, 또는 소프트웨어의 실행을 지칭한다. 예를 들어, 컴포넌트는 프로세서상에서 실행되는 처리과정(procedure), 프로세서, 객체, 실행 스레드, 프로그램, 및/또는 컴퓨터일 수 있지만, 이들로 제한되는 것은 아니다. 예를 들어, 컴퓨팅 장치에서 실행되는 애플리케이션 및 컴퓨팅 장치 모두 컴포넌트일 수 있다. 하나 이상의 컴포넌트는 프로세서 및/또는 실행 스레드 내에 상주할 수 있다. 일 컴포넌트는 하나의 컴퓨터 내에 로컬화 될 수 있다. 일 컴포넌트는 2개 이상의 컴퓨터들 사이에 분배될 수 있다. 또한, 이러한 컴포넌트들은 그 내부에 저장된 다양한 데이터 구조들을 갖는 다양한 컴퓨터 판독가능한 매체로부터 실행할 수 있다. 컴포넌트들은 예를 들어 하나 이상의 데이터 패킷들을 갖는 신호(예를 들면, 로컬 시스템, 분산 시스템에서 다른 컴포넌트와 상호작용하는 하나의 컴포넌트로부터의 데이터 및/또는 신호를 통해 다른 시스템과 인터넷과 같은 네트워크를 통해 전송되는 데이터)에 따라 로컬 및/또는 원격 처리들을 통해 통신할 수 있다.The terms “component,” “module,” “system,” and the like, as used herein, refer to a computer-related entity, hardware, firmware, software, a combination of software and hardware, or execution of software. For example, a component can be, but is not limited to being, a process running on a processor, a processor, an object, a thread of execution, a program, and/or a computer. For example, both an application running on a computing device and the computing device may be a component. One or more components may reside within a processor and/or thread of execution. A component may be localized within one computer. A component may be distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored therein. Components may communicate via a network such as the Internet with another system, for example via a signal having one or more data packets (eg, data and/or signals from one component interacting with another component in a local system, distributed system, etc.) may communicate via local and/or remote processes depending on the data being transmitted).

더불어, 용어 "또는"은 배타적 "또는"이 아니라 내포적 "또는"을 의미하는 것으로 의도된다. 즉, 달리 특정되지 않거나 문맥상 명확하지 않은 경우에, "X는 A 또는 B를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 의도된다. 즉, X가 A를 이용하거나; X가 B를 이용하거나; 또는 X가 A 및 B 모두를 이용하는 경우, "X는 A 또는 B를 이용한다"가 이들 경우들 어느 것으로도 적용될 수 있다. 또한, 본 명세서에 사용된 "및/또는"이라는 용어는 열거된 관련 아이템들 중 하나 이상의 아이템의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다.In addition, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless otherwise specified or clear from context, "X employs A or B" is intended to mean one of the natural implicit substitutions. That is, X employs A; X employs B; or when X employs both A and B, "X employs A or B" may apply to either of these cases. It should also be understood that the term “and/or” as used herein refers to and includes all possible combinations of one or more of the listed related items.

또한, "포함한다" 및/또는 "포함하는"이라는 용어는, 해당 특징 및/또는 구성요소가 존재함을 의미하는 것으로 이해되어야 한다. 다만, "포함한다" 및/또는 "포함하는"이라는 용어는, 하나 이상의 다른 특징, 구성요소 및/또는 이들의 그룹의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. 또한, 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 본 명세서와 청구범위에서 단수는 일반적으로 "하나 또는 그 이상"을 의미하는 것으로 해석되어야 한다.Also, the terms "comprises" and/or "comprising" should be understood to mean that the feature and/or element in question is present. However, it should be understood that the terms "comprises" and/or "comprising" do not exclude the presence or addition of one or more other features, elements and/or groups thereof. Also, unless otherwise specified or otherwise clear from context to refer to a singular form, the singular in the specification and claims should generally be construed to mean "one or more."

당업자들은 추가적으로 여기서 개시된 실시예들과 관련되어 설명된 다양한 예시 적 논리적 블록들, 구성들, 모듈들, 회로들, 수단들, 로직들, 및 알고리즘 단계들이 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양쪽 모두의 조합들로 구현될 수 있음을 인식해야 한다. 하드웨어 및 소프트웨어의 상호교환성을 명백하게 예시하기 위해, 다양한 예시 적 컴포넌트들, 블록들, 구성들, 수단들, 로직들, 모듈들, 회로들, 및 단계들은 그들의 기능성 측면에서 일반적으로 위에서 설명되었다. 그러한 기능성이 하드웨어로 또는 소프트웨어로서 구현되는지 여부는 전반적인 시스템에 부과된 특정 어플리케이션(application) 및 설계 제한들에 달려 있다. 숙련된 기술자들은 각각의 특정 어플리케이션들을 위해 다양한 방법들로 설명된 기능성을 구현할 수 있다. 다만, 그러한 구현의 결정들이 본 개시내용의 영역을 벗어나게 하는 것으로 해석되어서는 안된다.Those skilled in the art will further appreciate that the various illustrative logical blocks, configurations, modules, circuits, means, logics, and algorithm steps described in connection with the embodiments disclosed herein may be implemented in electronic hardware, computer software, or combinations of both. It should be recognized that they can be implemented with To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application. However, such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

라이브러리 또는 오픈소스 소프트웨어는 널리 사용되는 것을 목적으로 하기 때문에 취약점을 내포한 채 타 사용자에게 배포되는 경우, 수많은 소프트웨어에 취약점을 전파하는 창구로 작용할 수 있다. 취약점 전파 문제를 완화하기 위하여 코드 콜론의 탐지하는 것은 가장 효과적인 방법 중 하나일 수 있다. 이에 따라, 취약점에 관련한 코드 클론을 탐지하여 소프트웨어의 취약점을 판별하기 위한 다양한 종래 기술들이 존재한다.Since libraries or open source software are intended to be widely used, if they are distributed to other users with vulnerabilities, they can act as a window to spread vulnerabilities to numerous software. Detecting code colons to mitigate the vulnerability propagation problem may be one of the most effective methods. Accordingly, various conventional techniques exist for detecting a code clone related to a vulnerability to determine a software vulnerability.

일반적으로 코드 클론을 통한 취약점 탐지 방법으로는, 소프트웨어의 소스 코드에 기반한 탐지 방법 및 소프트웨어의 바이너리 코드에 기반한 탐지 방법 등이 존재할 수 있다.In general, as a method of detecting a vulnerability through a code clone, there may be a detection method based on the source code of the software and a detection method based on the binary code of the software.

먼저, 소스 코드에 기반한 취약점 탐지 방법은, 사전 정의한 취약점에 관련한 소스 코드와 대상 소프트웨어의 소스 코드 간 유사도 비교 통해 수행될 수 있다. 즉, 소프트웨어의 소스 코드를 활용하여 해당 소프트웨어의 코드 클론 탐지 및 취약점을 판별할 수 있다. 상술한 방법은 효과적이고 효율적이나, 일반적으로 다수의 프로그램들은 상업용 운영 체제 및 독점 소프트웨어와 같은 바이너리 코드의 형태로 배포되고 있다. 즉, 프로그램 자체에서 소스 코드를 제공하지 않는 경우, 해당 방법을 통한 코드 클론 탐지 및 취약점 판별이 어려울 수 있다. First, the vulnerability detection method based on the source code may be performed by comparing the similarity between the source code related to a predefined vulnerability and the source code of the target software. In other words, it is possible to determine the code clone detection and vulnerability of the software by using the source code of the software. Although the above-described method is effective and efficient, in general, many programs are distributed in the form of binary codes such as commercial operating systems and proprietary software. That is, if the program itself does not provide the source code, it may be difficult to detect code clones and identify vulnerabilities through the method.

추가적으로, 프로그램의 바이너리는 동일한 소스 코드에서 컴파일 되더라도 배포자에 따라 다른 기능과 취약성을 가질 수 있다. 또한, 프로그램 자체의 소스 코드에 취약점이 없더라도, 종속 라이브러리가 프로그램 자체의 보안 수준에 관계없이 취약점을 일으킬 수 있어 소스 코드를 활용한 취약점 탐지에 불완전성이 존재할 수 있다.Additionally, the binary of the program may have different functions and vulnerabilities depending on the distributor even if they are compiled from the same source code. In addition, even if there is no vulnerability in the source code of the program itself, a dependent library may cause a vulnerability regardless of the security level of the program itself, so there may be incompleteness in vulnerability detection using the source code.

이에 따라, 소프트웨어의 바이너리 코드에 기반하여 코드 클론을 탐지하는 종래의 기술들이 존재한다. 바이너리 코드에 기반하여 코드 클론 탐지는 별도의 소스 코드를 요구하지 않기 때문에 높은 활용도를 가질 수 있다.Accordingly, there are conventional techniques for detecting a code clone based on the binary code of the software. Code clone detection based on binary code can have high utilization because it does not require a separate source code.

다만, 종래의 바이너리 코드에 기반한 코드 클론 탐지 방법들은, 단일 소스에서 여러 바이너리 표현이 컴파일 되는 등 컴파일 환경의 다양성으로 인해 확장성에 문제가 있을 수 있다. 또한, 바이너리 코드를 통해 두 대상 함수의 유사성을 계산할 수는 있으나, 함수가 실제 대상 바이너리에 존재하는지 여부를 판단하거나, 또는 취약점에 관련한 바이너리인지 여부를 판단하기는 어려울 수 있다.However, conventional methods for detecting code clones based on binary codes may have problems in scalability due to the diversity of compilation environments, such as multiple binary representations compiled from a single source. In addition, although similarity between two target functions can be calculated through binary code, it may be difficult to determine whether a function exists in an actual target binary or whether it is a binary related to a vulnerability.

추가적으로, 종래의 바이너리 코드에 기반한 코드 클론 탐지 또는 취약점 판별 방법들은, 취약점 판별에 대한 높은 정확도와 빠른 처리 속도 모두를 확보하기 어려운 측면이 있다. 예를 들어, 종래의 방법들은, 바이너리 코드의 정적 속성에 의존하여 처리 속도를 향상시키는 반면, 컴파일 환경의 변경에 취약하여 정확도가 저하될 수 있다. 다른 예를 들어, 컴파일 환경으로 인한 변경 처리에 중점을 두어, 정확도를 향상시킬 수 있으나, 처리 속도의 저하를 수반할 우려가 존재한다.Additionally, it is difficult to secure both high accuracy and fast processing speed for vulnerability determination in the conventional methods for detecting code clones or detecting vulnerabilities based on binary codes. For example, while the conventional methods improve processing speed by relying on static properties of binary codes, they are vulnerable to changes in the compilation environment and thus accuracy may be reduced. As another example, it is possible to improve the accuracy by focusing on the processing of changes caused by the compilation environment, but there is a concern that the processing speed is reduced.

상술한 바와 같이, 바이너리 코드에 기반하여 코드 클론을 검출하고, 취약점을 판별하는 종래의 기술들은 취약점 판별 속도, 또는 정확성이 저하되는 우려가 존재한다.As described above, conventional techniques for detecting a code clone based on a binary code and determining a vulnerability have a concern that a vulnerability identification speed or accuracy is deteriorated.

따라서, 본 개시는 바이너리 코드에 기반한 코드 클론 탐지 및 취약점 판별 과정에서, 보다 높은 정확도와 빠른 처리 속도를 제공하는 컴퓨팅 장치(100)를 제공할 수 있다. 본 개시의 컴퓨팅 장치(100)가 바이너리 코드에 기반하여 코드 클론 및 취약점을 판별하는 구체적인 방법들은 이하에서 후술하도록 한다.Accordingly, the present disclosure may provide a computing device 100 that provides higher accuracy and faster processing speed in a process of detecting a code clone and determining a vulnerability based on a binary code. Specific methods for the computing device 100 of the present disclosure to determine a code clone and a vulnerability based on a binary code will be described below.

도 1은 본 개시의 일 실시예와 관련된 컴퓨팅 장치의 블록 구성도를 도시한다.1 illustrates a block diagram of a computing device according to an embodiment of the present disclosure.

본 개시의 일 실시예에 따르면, 컴퓨팅 장치(100)는 소프트웨어의 바이너리 코드에 기초하여 코드 클론을 탐지함으로써, 소프트웨어의 취약점 유무를 판별할 수 있다. 컴퓨팅 장치(100)는 도 1에 도시된 바와 같이, 프로세서(110), 메모리(130) 및 네트워크부(150)를 포함할 수 있다. 전술한 컴포넌트들은 예시적인 것으로서, 본 개시내용의 권리범위가 전술한 컴포넌트들로 제한되지 않는다. 즉, 본 개시의 실시예들에 대한 구현 양태에 따라서 추가적인 컴포넌트들이 포함되거나 또는 전술한 컴포넌트들 중 일부가 생략될 수 있다. According to an embodiment of the present disclosure, the computing device 100 may determine whether the software has a vulnerability by detecting a code clone based on the binary code of the software. The computing device 100 may include a processor 110 , a memory 130 , and a network unit 150 as shown in FIG. 1 . The above-described components are exemplary, and the scope of the present disclosure is not limited to the above-described components. That is, additional components may be included or some of the above-described components may be omitted depending on implementation aspects for the embodiments of the present disclosure.

본 개시의 일 실시예에 따르면, 컴퓨팅 장치(100)는 클라이언트와 데이터를 송수신하는 네트워크부(150)를 포함할 수 있다. 네트워크부(150)는 컴퓨팅 장치(100)와 클라이언트와의 통신 기능을 제공할 수 있다. 예를 들어, 네트워크부(150)는 클라이언트로부터 특정 소프트웨어의 바이너리 코드에 대한 코드 클론 탐지 또는, 해당 소프트웨어의 취약점 판별에 관련된 요청을 수신할 수 있다. 본 개시의 일 실시예에 따른 네트워크부(150)는 공중전화 교환망(PSTN: Public Switched Telephone Network), xDSL(x Digital Subscriber Line), RADSL(Rate Adaptive DSL), MDSL(Multi Rate DSL), VDSL(Very High Speed DSL), UADSL(Universal Asymmetric DSL), HDSL(High Bit Rate DSL) 및 근거리 통신망(LAN) 등과 같은 다양한 유선 통신 시스템들을 사용할 수 있다.According to an embodiment of the present disclosure, the computing device 100 may include a network unit 150 for transmitting and receiving data with a client. The network unit 150 may provide a communication function between the computing device 100 and the client. For example, the network unit 150 may receive, from a client, a request related to code clone detection of a binary code of a specific software or a vulnerability determination of the corresponding software. The network unit 150 according to an embodiment of the present disclosure includes a Public Switched Telephone Network (PSTN), x Digital Subscriber Line (xDSL), Rate Adaptive DSL (RADSL), Multi Rate DSL (MDSL), VDSL ( A variety of wired communication systems such as Very High Speed DSL), Universal Asymmetric DSL (UADSL), High Bit Rate DSL (HDSL), and Local Area Network (LAN) can be used.

또한, 본 명세서에서 제시되는 네트워크부(150)는 CDMA(Code Division Multi Access), TDMA(Time Division Multi Access), FDMA(Frequency Division Multi Access), OFDMA(Orthogonal Frequency Division Multi Access), SC-FDMA(Single Carrier-FDMA) 및 다른 시스템들과 같은 다양한 무선 통신 시스템들을 사용할 수 있다. 본 명세서에서 설명된 기술들은 위에서 언급된 네트워크들로 제한되는 것은 아니며, 다른 네트워크들에서도 사용될 수도 있다.In addition, the network unit 150 presented herein is CDMA (Code Division Multi Access), TDMA (Time Division Multi Access), FDMA (Frequency Division Multi Access), OFDMA (Orthogonal Frequency Division Multi Access), SC-FDMA ( A variety of wireless communication systems may be used, such as Single Carrier-FDMA) and other systems. The techniques described herein are not limited to the networks mentioned above, but may be used in other networks as well.

본 개시의 일 실시예에 따르면, 클라이언트로부터 발행된 쿼리에 따라서, 컴퓨팅 장치(100)의 후술될 동작들이 수행될 수 있다. 클라이언트는 컴퓨팅 장치(100)와 통신을 위한 메커니즘을 갖는 시스템에서의 임의의 형태의 노드(들)를 의미할 수 있다. 예를 들어, 이러한 클라이언트는 PC, 랩탑 컴퓨터, 워크스테이션, 단말 및/또는 네트워크 접속성을 갖는 임의의 전자 디바이스를 포함할 수 있다. 또한, 클라이언트는 에이전트, API(Application Programming Interface) 및 플러그-인(Plug-in) 중 적어도 하나에 의해 구현되는 임의의 서버를 포함할 수도 있다. According to an embodiment of the present disclosure, operations to be described later of the computing device 100 may be performed according to a query issued from a client. A client may mean any type of node(s) in a system having a mechanism for communicating with the computing device 100 . For example, such clients may include PCs, laptop computers, workstations, terminals, and/or any electronic device with network connectivity. In addition, the client may include any server implemented by at least one of an agent, an application programming interface (API), and a plug-in.

본 개시의 일 실시예에 따르면, 메모리(130)는 프로세서(110)가 생성하거나 결정한 임의의 형태의 정보 및 네트워크부(150)가 수신한 임의의 형태를 저장할 수 있다. 메모리(130)는 본 개시의 일 실시예에 따른 소프트웨어의 취약점 판별을 수행하기 위한 컴퓨터 프로그램을 저장할 수 있으며, 저장된 컴퓨터 프로그램은 프로세서(110)에 의하여 판독되어 구동될 수 있다. 예를 들어, 메모리(130)는 코드 클론을 탐지하기 위한 취약 코드 클론의 집합에 관한 정보를 저장할 수 있다. 다른 예를 들어, 메모리(130)는 취약점에 관련한 코드와 취약점 판별에 대상이 되는 소프트웨어의 코드 각각에 대한 전처리 방법에 관한 정보를 저장할 수 있다. 전술한 메모리에 저장된 구체적인 정보들에 대한 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.According to an embodiment of the present disclosure, the memory 130 may store any type of information generated or determined by the processor 110 and any type received by the network unit 150 . The memory 130 may store a computer program for performing vulnerability determination of software according to an embodiment of the present disclosure, and the stored computer program may be read and driven by the processor 110 . For example, the memory 130 may store information about a set of vulnerable code clones for detecting code clones. As another example, the memory 130 may store information about a code related to a vulnerability and a preprocessing method for each code of software that is a target for vulnerability determination. Description of the specific information stored in the above-described memory is merely an example, and the present disclosure is not limited thereto.

본 개시의 일 실시예에 따르면, 메모리(130)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 컴퓨팅 장치(100)는 인터넷 상에서 컴퓨팅 장치(100)의 저장 기능을 수행하는 웹 스토리지(web storage)와 관련되어 동작할 수도 있다. 전술한 메모리에 대한 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.According to an embodiment of the present disclosure, the memory 130 is a flash memory type, a hard disk type, a multimedia card micro type, and a card type memory (eg, SD or XD memory, etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read (PROM) -Only Memory), a magnetic memory, a magnetic disk, and an optical disk may include at least one type of storage medium. The computing device 100 may operate in relation to a web storage that performs a storage function of the computing device 100 on the Internet. The description of the above-described memory is only an example, and the present disclosure is not limited thereto.

본 개시의 일 실시예에 따르면, 프로세서(110)는 통상적으로 컴퓨팅 장치(100)의 전반적인 동작을 처리할 수 있다. 프로세서(110)는 위에서 살펴본 구성요소들을 통해 입력 또는 출력되는 신호, 데이터, 정보 등을 처리하거나 메모리(130)에 저장된 응용 프로그램을 구동함으로써, 클라이언트에게 적절한 정보(예컨대, 타겟 소프트웨어에 관련한 코드 클론 탐지, 또는 취약점 유무에 관한 정보) 또는 기능을 제공하거나 처리할 수 있다.According to an embodiment of the present disclosure, the processor 110 may typically process the overall operation of the computing device 100 . The processor 110 processes signals, data, information, etc. input or output through the above-described components or runs an application program stored in the memory 130, thereby providing information appropriate to the client (eg, code clone detection related to the target software). , or information on the presence or absence of vulnerabilities) or functions.

본 개시의 일 실시예에 따르면, 프로세서(110)는 소프트웨어의 바이너리 코드에 기반하여 취약점을 탐지할 수 있다. 구체적으로, 프로세서(110)는 패치와 연관이 있는 코드의 흐름에 대한 정보를 포함하는 취약점 시그니처(650)를 대상 소프트웨어의 바이너리 코드(즉, 타겟 바이너리 코드(710))와 비교함으로써, 소프트웨어의 코드 클론 여부를 탐지할 수 있다. 또한, 바이너리 코드에 기반하여 탐지한 코드 클론을 통해 해당 소프트웨어에 취약점이 내포되어 있는지 여부를 판별할 수 있다. 즉, 프로세서(110)는 도 2에 도시된 바와 같이, 임의의 소프트웨어에 대한 취약점 존재 유무를 판별하기 위하여, 취약점 정보 전처리(200), 타겟 바이너리 전처리(300) 및 코드 클론 탐지 및 취약점 판별(400) 과정을 수행할 수 있다. 프로세서(110)가 수행하는 취약점 정보 전처리(200), 타겟 바이너리 전처리(300) 및 코드 클론 탐지 및 취약점 판별(400) 각각의 구체적인 동작은 도 3 내지 도 6을 참조하여 이하에서 서술하도록 한다. According to an embodiment of the present disclosure, the processor 110 may detect a vulnerability based on a binary code of software. Specifically, the processor 110 compares the vulnerability signature 650 including information on the flow of code related to the patch with the binary code of the target software (ie, the target binary code 710 ), and thus the code of the software. Clones can be detected. In addition, it is possible to determine whether a vulnerability is embedded in the software through the code clone detected based on the binary code. That is, as shown in FIG. 2 , the processor 110 performs vulnerability information preprocessing 200 , target binary preprocessing 300 , and code clone detection and vulnerability determination 400 in order to determine whether a vulnerability exists for any software. ) process can be performed. Specific operations of the vulnerability information preprocessing 200 , the target binary preprocessing 300 , and the code clone detection and vulnerability determination 400 performed by the processor 110 will be described below with reference to FIGS. 3 to 6 .

본 개시의 일 실시예에 따르면, 프로세서(110)는 취약점 정보에 대한 전처리를 수행하여 취약점 시그니처(650)를 생성할 수 있다. 구체적으로, 프로세서(110)는 소프트웨어 내에서 알려진 취약점에 관련한 취약점 정보에 대한 전처리를 수행함으로써 소프트웨어의 취약점 판별에 기준이 되는 취약점 시그니처(650)를 생성할 수 있다. 취약점 정보는, 소스 코드 형태의 취약점 패치 정보, 취약에 관련한 바이너리 코드(즉, 취약 바이너리 코드(610)) 및 취약점이 패치된 바이너리 코드(즉, 패치 바이너리 코드(620))를 포함할 수 있다. 취약점 패치 정보는, 소프트웨어의 알려진 취약점을 식별하기 위하여 표준화된 정보로, 예를 들어, 보안 취약점 표준 코드(Common Vulnerabilities and Exposures)에 관련한 정보일 수 있다. 즉, 프로세서(110)는 공격에 활용가능한 취약점을 가지고 있는 코드 클론을 판별하기 위한 기준이 되는 취약점 정보를 전처리하여 취약점 시그니처(650)를 생성할 수 있다. According to an embodiment of the present disclosure, the processor 110 may perform pre-processing on the vulnerability information to generate the vulnerability signature 650 . Specifically, the processor 110 may generate the vulnerability signature 650 as a criterion for determining the vulnerability of the software by performing pre-processing on the vulnerability information related to the known vulnerability in the software. The vulnerability information may include vulnerability patch information in the form of source code, a binary code related to a vulnerability (ie, the vulnerable binary code 610 ), and a binary code in which the vulnerability is patched (ie, the patch binary code 620 ). The vulnerability patch information is information standardized to identify known vulnerabilities of software, and may be, for example, information related to a security vulnerability standard code (Common Vulnerabilities and Exposures). That is, the processor 110 may generate the vulnerability signature 650 by pre-processing vulnerability information that is a criterion for determining a code clone having a vulnerability usable for an attack.

프로세서(110)는 취약점 패치 정보, 취약 바이너리 코드(610) 및 패치 바이너리 코드(620)에 대한 전처리 과정을 통해 취약점 시그니처(650)를 생성할 수 있다. 취약점 시그니처(650)는, 소프트웨어에 알려진 취약점에 관한 정보를 포함할 수 있으며, 코드 클론 탐지 및 취약점 판별 과정에서 대상 소프트웨어의 타겟 바이너리와 매칭 가능한 형태로 전처리된 것일 수 있다. 구체적으로, 프로세서(110)는 취약점 정보에 기초하여 변경에 관련한 바이너리 코드를 식별할 수 있다. 이 경우, 변경에 관련한 바이너리 코드는, 소프트웨어의 보안 패치에 따라 명령어가 추가되거나 삭제됨에 따라 변경되는 코드를 의미할 수 있다. 프로세서(110)는 변경에 관련한 바이너리 코드에 기초하여 취약 함수 지문(643) 및 패치 함수 지문을 생성할 수 있다. 또한, 프로세서(110)는 취약 함수 지문(643) 및 패치 함수 지문을 포함하는 취약점 시그니처(650)를 생성할 수 있다. The processor 110 may generate the vulnerability signature 650 through a preprocessing process for the vulnerability patch information, the vulnerable binary code 610 and the patch binary code 620 . The vulnerability signature 650 may include information on vulnerabilities known to the software, and may be pre-processed in a form that can match the target binary of the target software in the code clone detection and vulnerability determination process. Specifically, the processor 110 may identify the binary code related to the change based on the vulnerability information. In this case, the binary code related to the change may mean a code that is changed as commands are added or deleted according to a security patch of software. The processor 110 may generate a weak function fingerprint 643 and a patch function fingerprint based on the binary code related to the change. In addition, the processor 110 may generate a vulnerability signature 650 including a weak function fingerprint 643 and a patch function fingerprint.

보다 구체적으로, 도 3을 참조하면, 프로세서(110)는 취약점 패치 정보에 기초하여 취약 바이너리 코드(610)에 제거 표시(631)(Removal Mark)를 수행할 수 있다(211). 제거 표시(631)는 보안 패치에 의해 취약한 바이너리에서 삭제된 명령어와 관련한 것일 수 있다. 구체적으로, 프로세서(110)는 소스 코드 형태의 취약점 패치 정보에 기초하여 취약점에 관련하여 제거된 명령어를 식별할 수 있으며, 취약 바이너리 코드(610)에서 제거된 명령어에 대응하는 코드에 제거 표시(631)를 수행할 수 있다. 즉, 프로세서(110)는 취약 바이너리 코드(610)에 취약점 패치에 따라 제거된 소스 코드에 대응하는 바이너리 코드 부분에 관련하여 제거 표시(631)를 부여함으로써, 취약점 시그니처(650) 생성 과정에서 전체 바이너리에 대한 처리가 아닌, 취약점에 관련한 부분만을 식별하여 해당 명령어에 관련한 전처리만을 수행할 수 있다. 즉, 해당 제거 표시(631)를 통해 보다 효율적으로 취약점 시그니처(650)(즉, 취약 함수 지문(643))를 생성할 수 있다.More specifically, referring to FIG. 3 , the processor 110 may perform a removal mark 631 on the vulnerable binary code 610 based on the vulnerability patch information ( 211 ). The removal mark 631 may relate to an instruction deleted from a vulnerable binary by a security patch. Specifically, the processor 110 may identify the instruction removed in relation to the vulnerability based on the vulnerability patch information in the form of the source code, and a removal mark 631 on the code corresponding to the instruction removed from the vulnerable binary code 610 . ) can be done. That is, the processor 110 gives the vulnerable binary code 610 a removal mark 631 in relation to the binary code part corresponding to the source code removed according to the vulnerability patch, so that the entire binary code 650 is generated in the process of generating the vulnerability signature 650 . It is possible to perform only the preprocessing related to the command by identifying only the part related to the vulnerability, not the processing for the command. That is, the vulnerability signature 650 (ie, the weak function fingerprint 643 ) can be generated more efficiently through the corresponding removal mark 631 .

프로세서(110)는 취약 바이너리 코드(610)에 기초하여 취약 함수 및 취약 바이너리 속성(Property) 정보를 획득할 수 있다(213). 취약 함수는, 취약 바이너리 코드(610)에 대응하는 명령어에 관련한 함수를 의미할 수 있다. 취약 바이너리 속성 정보는, 취약점을 가진 바이너리에 대응하는 프로퍼티(property)를 의미할 수 있다. 또한, 프로세서(110)는 취약 바이너리를 중간 언어(Intermediate Representation, IR)로 번역하고, 취약 함수를 분석하여 취약 함수의 속성 정보를 획득할 수 있다(215). 취약 함수의 속성 정보는, 해당 함수가 컴파일 환경 변화에 의해 영향을 받는지 여부에 관련된 안정성 정보일 수 있다. 즉, 취약 함수의 속성 정보는, 해당 취약 함수가 다른 컴파일 환경에서 안정적인지(즉, 컴파일 환경 변화에 영향을 받지 않음) 또는 불안정적인지(즉, 컴파일 환경 변화에 영향을 받음) 여부에 관한 정보를 포함할 수 있다. 이러한 취약 함수의 속성 정보는, 예를 들어, 컴파일 환경에서 변경되지 않는, 독립된 코드 흐름의 수 또는 함수의 호출 횟수 중 적어도 하나에 기반한 것일 수 있다.The processor 110 may acquire weak function and weak binary property information based on the weak binary code 610 ( 213 ). The weak function may mean a function related to an instruction corresponding to the weak binary code 610 . Vulnerable binary property information may mean a property corresponding to a binary having a vulnerability. In addition, the processor 110 may translate the vulnerable binary into an intermediate language (Intermediate Representation, IR) and analyze the vulnerable function to obtain attribute information of the vulnerable function ( 215 ). The property information of the vulnerable function may be stability information related to whether the corresponding function is affected by a change in the compilation environment. That is, the property information of a vulnerable function provides information on whether the vulnerable function is stable (that is, not affected by changes in the compilation environment) or unstable (that is, affected by changes in the compilation environment) in other compilation environments. may include The attribute information of such a vulnerable function may be based on at least one of the number of independent code flows or the number of calls to the function, which is not changed in the compilation environment, for example.

취약 바이너리를 중간 언어로 번역하는 것은, 여러 아키텍처의 바이너리를 사용하는 코드 클론 탐지 알고리즘을 고속화하기 위한 것일 수 있다. 즉, 중간 언어를 통해 여러 아키텍처에서 하나의 코드와 알고리즘을 활용할 수 있다. 또한, 하나의 코드와 알고리즘을 여러 아키텍처에서 활용할 수 있으므로, 함수를 스트랜드(strand)로 분할하기 위한 알고리즘을 보다 용이하게 작성할 수 있다.Translating vulnerable binaries into intermediate languages may be to speed up code clone detection algorithms that use binaries of multiple architectures. In other words, an intermediate language allows one code and algorithm to be leveraged across multiple architectures. Also, since one code and algorithm can be used on multiple architectures, it is easier to write algorithms for splitting a function into strands.

또한, 프로세서(110)는 취약 함수 내의 독립적인 코드의 흐름에 기초하여 취약 함수를 하나 이상의 스트랜드로 분할하고, 그리고 분할된 하나 이상의 스트랜드 각각에 대응하는 취약 함수의 속성 정보 및 취약 바이너리 속성 정보를 맵핑하여 취약 함수 지문(643)을 생성할 수 있다(217). 스트랜드는, 하나의 변수 값을 산출하기 위해 필요한 명령어들의 집합으로, 유사도 비교에 기준이 되는 단위일 수 있다. 즉, 스트랜드의 집합은 소프트웨어 시맨틱의 일부 또는 전체를 대표할 수 있다.In addition, the processor 110 divides the vulnerable function into one or more strands based on the independent code flow in the vulnerable function, and maps the property information and the vulnerable binary property information of the vulnerable function corresponding to each of the divided one or more strands. to generate a weak function fingerprint 643 ( 217 ). A strand is a set of instructions required to calculate a single variable value, and may be a standard unit for similarity comparison. That is, the set of strands may represent some or all of the software semantics.

즉, 프로세서(110)에 의해 생성된 취약 함수 지문(643)은, 소프트웨어의 취약점에 관련한 바이너리 코드가 스트랜드 단위로 구분된 집합을 의미할 수 있다.That is, the weak function fingerprint 643 generated by the processor 110 may mean a set in which binary codes related to vulnerabilities of software are divided into strands.

또한, 프로세서(110)는 취약점 패치 정보에 기초하여 패치 바이너리 코드(620)에 추가 표시(632)(Additional Mark)를 수행할 수 있다(221). 추가 표시(632)는 보안 패치에 의해 패치 바이너리에 추가된 명령어와 관련한 것일 수 있다. 구체적으로, 프로세서(110)는 소스 코드 형태의 취약점 패치 정보에 기초하여 취약점의 패치와 관련하여 추가된 명령어를 식별할 수 있으며, 패치 바이너리 코드(620)에서 추가된 명령어에 대응하는 코드에 추가 표시(632)를 수행할 수 있다. 즉, 프로세서(110)는 추가 바이너리 코드에 취약점 패치에 따라 추가된 소스 코드에 대응하는 바이너리 코드 부분에 관련하여 추가 표시(632)를 부여함으로써, 취약점 시그니처(650) 생성 과정에서 전체 바이너리에 대한 처리가 아닌, 패치된 취약점에 관련한 부분만을 식별하여 해당 명령어에 관련한 전처리만을 수행할 수 있다. 즉, 해당 추가 표시(632)를 통해 보다 효율적으로 취약점 시그니처(650)(즉, 패치 함수 지문)를 생성할 수 있다.In addition, the processor 110 may perform an additional mark 632 (Additional Mark) on the patch binary code 620 based on the vulnerability patch information ( 221 ). Addition indication 632 may relate to instructions added to the patch binary by a security patch. Specifically, the processor 110 may identify an instruction added in relation to patching a vulnerability based on the vulnerability patch information in the form of a source code, and additionally display the code corresponding to the added instruction in the patch binary code 620 . (632) can be performed. That is, the processor 110 gives the additional binary code an additional mark 632 in relation to the binary code part corresponding to the source code added according to the vulnerability patch, thereby processing the entire binary in the process of generating the vulnerability signature 650 . Only the part related to the patched vulnerability can be identified and only the preprocessing related to the command can be performed. That is, the vulnerability signature 650 (ie, the patch function fingerprint) can be generated more efficiently through the corresponding additional indication 632 .

프로세서(110)는 패치 바이너리 코드(620)에 기초하여 패치 함수 및 패치 바이너리 속성 정보를 획득할 수 있다(223). 패치 함수는, 패치 바이너리 코드(620)에 대응하는 명령어에 관련한 함수를 의미할 수 있다. 패치 바이너리 속성 정보는 취약점이 패치된 바이너리에 대응하는 프로퍼티를 의미할 수 있다. 또한, 프로세서(110)는 패치 바이너리를 중간 언어로 번역하고, 패치 함수를 분석하여 패치 함수의 속성 정보를 획득할 수 있다(225). 패치 함수의 속성 정보는, 해당 패치 함수가 다른 컴파일 환경에서 안정적인지(즉, 컴파일 환경 변화에 영향을 받지 않음) 또는 불안정적인지(즉, 컴파일 환경 변화에 영향을 받음) 여부에 관한 정보를 포함할 수 있다. 이러한 패치 함수의 속성 정보는, 예를 들어, 컴파일 환경에서 변경되지 않는, 독립된 코드 흐름의 수 또는 함수의 호출 횟수 중 적어도 하나에 기반한 것일 수 있다.The processor 110 may obtain patch function and patch binary attribute information based on the patch binary code 620 ( 223 ). The patch function may mean a function related to an instruction corresponding to the patch binary code 620 . The patch binary attribute information may mean a property corresponding to a binary in which a vulnerability is patched. Also, the processor 110 may translate the patch binary into an intermediate language and analyze the patch function to obtain attribute information of the patch function ( 225 ). The attribute information of a patch function may include information about whether the patch function is stable (that is, not affected by changes in the compilation environment) or unstable (that is, affected by changes in the compilation environment) in different compilation environments. can The attribute information of the patch function, for example, may be based on at least one of the number of independent code flows that are not changed in the compilation environment or the number of calls of the function.

또한, 프로세서(110)는 패치 함수 내의 독립적인 코드의 흐름에 기초하여 패치 함수를 하나 이상의 스트랜드로 분할하고, 그리고 분할된 하나 이상의 스트랜드 각각에 대응하는 패치 함수의 속성 정보 및 패치 바이너리 속성 정보를 맵핑하여 패치 함수 지문을 생성할 수 있다. In addition, the processor 110 divides the patch function into one or more strands based on the independent code flow in the patch function, and maps the patch function attribute information and patch binary attribute information corresponding to each of the divided one or more strands. to generate a patch function fingerprint.

즉, 프로세서(110)에 의해 생성된 패치 함수 지문은, 소프트웨어의 취약점에 대한 패치가 완료된 바이너리 코드가 스트랜드 단위로 구분된 집합을 의미할 수 있다. 다시 말해, 패치 함수 지문은 소프트웨어에서 취약점을 야기시키지 않는 바이너리 코드의 집합일 수 있다. That is, the patch function fingerprint generated by the processor 110 may refer to a set in which a binary code for which software vulnerabilities are patched are divided in strand units. In other words, a patch function fingerprint can be a set of binary code that does not introduce a vulnerability in software.

따라서, 프로세서(110)는 전술한 전처리 과정을 통해 취약 함수 지문(643) 및 패치 함수 지문을 포함하는 취약점 시그니처(650)를 생성할 수 있다(230). 다시 말해, 프로세서(110)는 패치의 영향을 받은 코드의 흐름을 스트랜드 단위로 저장하여 취약 함수 지문(643) 및 패치 함수 지문을 포함하는 취약점 시그니처(650)를 생성할 수 있다. 즉, 취약점 시그니처(650)는 취약점과 연관된 코드 흐름만을 스트랜드 단위로 포함하여 생성됨에 따라, 타겟 바이너리 코드(710)와의 유사도 비교 과정에서 모든 코드 흐름을 비교할 필요가 없이 유사 스트랜드 단위의 비교가 가능하도록 하여 코드 클론 탐지 시간을 획기적으로 감소시킬 수 있다.Accordingly, the processor 110 may generate the vulnerability signature 650 including the weak function fingerprint 643 and the patch function fingerprint through the above-described preprocessing process ( 230 ). In other words, the processor 110 may generate the vulnerability signature 650 including the weak function fingerprint 643 and the patch function fingerprint by storing the flow of the code affected by the patch in units of strands. That is, since the vulnerability signature 650 is generated by including only the code flow associated with the vulnerability in a strand unit, it is not necessary to compare all the code flows in the similarity comparison process with the target binary code 710 so that the similar strand unit can be compared. This can dramatically reduce code clone detection time.

본 개시의 일 실시예에 따르면, 프로세서(110)는 타겟 바이너리 코드(710)에 대한 전처리를 수행하여 타겟 함수 지문(730)을 생성할 수 있다. 타겟 바이너리 코드(710)에 대한 전처리는, 전술한 취약 바이너리 코드(610) 및 패치 바이너리 코드(620)에 대한 전처리와 대응될 수 있다.According to an embodiment of the present disclosure, the processor 110 may perform preprocessing on the target binary code 710 to generate the target function fingerprint 730 . The preprocessing of the target binary code 710 may correspond to the preprocessing of the weak binary code 610 and the patch binary code 620 described above.

구체적으로, 도 4를 참조하면, 프로세서(110)는 타겟 소프트웨어(즉, 취약점 판별의 대상이 되는 소프트웨어)의 타겟 바이너리 코드(710)에 기초하여 타겟 함수 및 타겟 바이너리 속성 정보를 획득할 수 있다(310). 타겟 함수는 타겟 바이너리 코드(710)에 대응하는 명령어에 관련한 함수를 의미할 수 있다. 타겟 바이너리 속성 정보는 타겟 소프트웨어의 타겟 바이너리에 대응하는 프로퍼티를 의미할 수 있다. Specifically, referring to FIG. 4 , the processor 110 may acquire target function and target binary attribute information based on the target binary code 710 of the target software (ie, software that is the target of vulnerability determination) ( 310). The target function may mean a function related to an instruction corresponding to the target binary code 710 . The target binary attribute information may mean a property corresponding to the target binary of the target software.

추가적인 실시예에서, 타겟 함수 및 타겟 바이너리 속성 정보 획득에 기반이 되는 타겟 바이너리 코드(710)는, 전체 타겟 바이너리에 대한 처리가 아닌, 클라이언트의 요청에 관련한 명령어만을 처리하기 위한 임의의 표시(Mark)가 부여된 것일 수 있다. 즉, 프로세서(110)는 타겟 바이너리 코드(710)에 부여된 임의의 표시를 통해 타겟 함수 지문(730) 생성 과정에서 전체 타겟 바이너리에 대한 처리가 아닌, 클라이언트의 요청에 관련한 부분만을 식별하여 해당 명령어에 관련한 전처리만을 수행할 수 있다. 즉, 클라이언트의 요청에 관련한 임의의 표시를 통해 효율적으로 타겟 함수 지문(730)을 생성할 수 있다. 또한, 프로세서(110)는 타겟 바이너리를 중간 언어로 번역하고, 타겟 함수를 분석하여 타겟 함수의 속성 정보를 획득할 수 있다(320). 타겟 함수의 속성 정보는, 해당 타겟 함수가 다른 컴파일 환경에서 안정적인지(즉, 컴파일 환경 변화에 영향을 받지 않음) 또는 불안정적인지(즉, 컴파일 환경 변화에 영향을 받음) 여부에 관한 정보를 포함할 수 있다. 이러한 타겟 함수의 속성 정보는, 예를 들어, 컴파일 환경에서 변경되지 않는, 독립된 코드 흐름의 수 또는 함수의 호출 횟수 중 적어도 하나에 기반한 것일 수 있다.In a further embodiment, the target binary code 710, which is based on obtaining the target function and target binary attribute information, is an arbitrary mark for processing only the command related to the request of the client, not the processing of the entire target binary. may have been granted. That is, the processor 110 identifies only the part related to the client's request, not the entire target binary, in the process of generating the target function fingerprint 730 through an arbitrary indication given to the target binary code 710 and identifies the corresponding command. Only preprocessing related to . That is, it is possible to efficiently generate the target function fingerprint 730 through any indication related to the client's request. Also, the processor 110 may translate the target binary into an intermediate language and analyze the target function to obtain attribute information of the target function ( 320 ). The property information of the target function may include information on whether the target function is stable (that is, not affected by changes in the compilation environment) or unstable (that is, affected by changes in the compilation environment) in other compilation environments. can The attribute information of the target function, for example, may be based on at least one of the number of independent code flows or the number of calls of the function, which is not changed in the compilation environment.

또한, 프로세서(110)는 타겟 함수 내의 독립적인 코드의 흐름에 기초하여 타겟 함수를 하나 이상의 스트랜드로 분할하고, 그리고 분할된 하나 이상의 스트랜드 각각에 대응하는 타겟 함수의 속성 정보 및 타겟 바이너리 속성 정보를 맵핑하여 타겟 함수 지문(730)을 생성할 수 있다. 즉, 프로세서(110)에 의해 생성된 타겟 함수 지문(730)은, 타겟 소프트웨어에 관련한 바이너리 코드가 스트랜드 단위로 구분된 집합을 의미할 수 있다.In addition, the processor 110 divides the target function into one or more strands based on the independent code flow in the target function, and maps attribute information and target binary attribute information of the target function corresponding to each of the divided one or more strands. Thus, the target function fingerprint 730 may be generated. That is, the target function fingerprint 730 generated by the processor 110 may refer to a set in which binary codes related to target software are divided in strand units.

따라서, 프로세서(110)는 전술한 바와 같이, 패치의 영향을 받은 코드의 흐름을 스트랜드 단위로 저장하여 취약 함수 지문(643) 및 패치 함수 지문을 포함하는 취약점 시그니처(650)를 사전 생성할 수 있다. 또한, 프로세서(110)는 타겟 소프트웨어의 타겟 바이너리를 취약점 시그니처(650)와 비교 가능한 형태로 전처리하여 타겟 함수 지문(730)을 생성할 수 있다. 이 경우, 타겟 함수 지문(730) 또한, 코드의 흐름이 스트랜드 단위로 저장되어 있으므로, 취약점 시그니처(650)와 스트랜드 단위의 비교가 수행될 수 있어, 코드 클론 탐지 및 취약점 판별에 시간이 획기적으로 감소될 수 있다. Accordingly, as described above, the processor 110 may store the flow of the code affected by the patch in strand units to pre-generate the vulnerability signature 650 including the weak function fingerprint 643 and the patch function fingerprint. . In addition, the processor 110 may generate the target function fingerprint 730 by preprocessing the target binary of the target software in a form comparable to the vulnerability signature 650 . In this case, since the target function fingerprint 730 also stores the code flow in units of strands, a comparison between the vulnerability signature 650 and the strands can be performed, which dramatically reduces the time for code clone detection and vulnerability determination. can be

본 개시의 일 실시예에 따르면, 프로세서(110)는 소프트웨어의 바이너리 코드에 기초하여 타겟 소프트웨어에서의 코드 클론 탐지 및 취약점 판별(400)을 수행할 수 있다. 프로세서(110)는 취약점 정보 전처리(200)를 통해 생성된 취약점 시그니처(650)와 타겟 소프트웨어의 타겟 바이너리 전처리(300)를 통해 생성된 타겟 함수 지문(730) 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별할 수 있다. 취약점 시그니처(650) 및 타겟 함수 지문(730) 간의 유사도 비교는, 취약점 시그니처(650)에 대응하는 함수 지문(즉, 취약 함수 지문(643) 및 타겟 함수 지문(730)) 및 타겟 함수 지문(730) 각각을 사전 결정된 단위로 분할하고, 그리고 분할된 모든 단위 중 얼마나 많은 단위가 유사한지 여부에 대한 비교를 수행하는 N-Gram 유사도 비교일 수 있다. 프로세서(110)는, 두 시퀀스를 N-Gram으로 분할한 뒤, 생성된 모든 N-Gram 중 얼마나 많은 N-Gram이 동일한가를 기준으로 각 함수 지문 간 유사도를 산출할 수 있다. 타 시퀀스 유사도 비교 방법과 달리 N-Gram 유사도는 시퀀스의 sub-ordering을 보존한다. 즉, N-Gram 유사도에 기반한 유사도 비교를 통해 바이너리의 형태가 변해도 함수 시맨틱의 sub-ordering이 보존되는 경향이 강해질 수 있다.According to an embodiment of the present disclosure, the processor 110 may perform code clone detection and vulnerability determination 400 in the target software based on the binary code of the software. The processor 110 compares the similarity between the vulnerability signature 650 generated through the vulnerability information preprocessing 200 and the target function fingerprint 730 generated through the target binary preprocessing 300 of the target software to detect a vulnerability in the target software. It can be determined whether it exists or not. A similarity comparison between the vulnerability signature 650 and the target function fingerprint 730 is a function fingerprint corresponding to the vulnerability signature 650 (ie, the weak function fingerprint 643 and the target function fingerprint 730 ) and the target function fingerprint 730 . ) may be an N-Gram similarity comparison that divides each into a predetermined unit, and performs a comparison on how many units of all the divided units are similar. After dividing the two sequences into N-Grams, the processor 110 may calculate the similarity between each function fingerprint based on how many N-Grams among all the generated N-Grams are the same. Unlike other sequence similarity comparison methods, N-Gram similarity preserves sequence sub-ordering. In other words, the tendency to preserve the sub-ordering of function semantics can be strong even if the binary form changes through similarity comparison based on N-Gram similarity.

예를 들어, 도 6에 도시된 바와 같이, 시퀀스x가 ABCDE를 포함하고 시퀀스y가 ABCZE를 포함하는 경우, 각각의 시퀀스는 Gx = {AB, ABC, BCD, CDE, DE} 및 Gy = {AB, ABC, BCZ, CZE, ZE}와 같이 N-Gram으로 분할될 수 있다. 이 경우, N-Gram으로 분할된 함수의 합집합은 8개(즉, AB, ABC, BCD, CDE, DE, BCZ, CZE, ZE)일 수 있으며, 함수의 교집합은 2개(AB, ABC)일 수 있다. 즉, 각 시퀀스의 N-Gram 유사도는 2/8(즉, 0.25)로 산출될 수 있다. 다시 말해, N-Gram 유사도 비교는 분할되어 생성된 모든 N-Gram 중 얼마나 많은 N-gram이 동일한가를 기준으로 하는 각 함수의 유사도 비교 방법일 수 있다. 즉, N-Gram 유사도 비교를 통해 각 함수 지문 간의 함수의 종류와 빈도를 적절히 비교하여 타겟 함수 지문(730) 중 취약점 시그니처(650)와 유사한 코드 흐름이 얼마나 존재하는지를 유사도로 산출할 수 있다. 전술한 시퀀스 각각의 함수, 분할 기준 및 유사도에 대한 구체적인 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다.For example, as shown in Figure 6, if sequence x contains ABCDE and sequence y contains ABCZE, then each sequence has Gx = {AB, ABC, BCD, CDE, DE} and Gy = {AB , ABC, BCZ, CZE, ZE} can be divided into N-Grams. In this case, the union of functions partitioned by N-Gram can be 8 (i.e., AB, ABC, BCD, CDE, DE, BCZ, CZE, ZE), and the intersection of the functions is 2 (AB, ABC). can That is, the N-Gram similarity of each sequence may be calculated as 2/8 (ie, 0.25). In other words, the N-Gram similarity comparison may be a similarity comparison method of each function based on how many N-grams among all N-grams generated by division are the same. That is, it is possible to calculate the similarity of how many code flows similar to the vulnerability signature 650 exist among the target function fingerprints 730 by appropriately comparing the types and frequencies of functions between each function fingerprint through the N-Gram similarity comparison. The detailed description of the function, division criterion, and similarity of each of the above-described sequences is merely an example, and the present disclosure is not limited thereto.

또한, 프로세서(110)는 각 함수(즉, 취약점 시그니처(650)에 포함된 함수 지문 및 타겟 함수 지문(730)) 별 코드 흐름 집합 사이의 유사도에 기초하여 한 함수 쌍의 유사도를 산출할 수 있다. Also, the processor 110 may calculate the similarity of one function pair based on the similarity between the code flow sets for each function (that is, the function fingerprint included in the vulnerability signature 650 and the target function fingerprint 730 ). .

<수식 1><Formula 1>

수식 1은 함수 쌍의 유사도를 어떻게 코드 흐름의 유사도에서 계산하는지 표현한다.

과

는 함수를 나타내며, 각각 n개와 m개의 스트랜드를 가질 수 있다.

의 스트랜드는

,

의 스트랜드는

로 표현될 수 있다.Equation 1 expresses how the similarity of a function pair is calculated from the similarity of the code flow.

class

denotes a function, and can have n and m strands, respectively.

the strand of

,

the strand of

can be expressed as

프로세서(110)는 취약점 시그니처(650)에 포함된 취약 함수 지문(643)과 타겟 함수 지문(730) 각각에 포함된 함수 간의 제 1 유사도를 산출할 수 있다. 제 1 유사도는, 타겟 함수 지문(730)에 포함된 함수와 취약점에 관련한 바이너리 코드의 집합인 취약 함수 지문(643)에 포함된 함수 간의 유사도이므로, 타겟 소프트웨어에서 취약점이 판별된 확률과 양의 상관 관계(즉, 비례 관계)를 가질 수 있다. The processor 110 may calculate a first similarity between the weak function fingerprint 643 included in the vulnerability signature 650 and the function included in each of the target function fingerprint 730 . The first similarity is a degree of similarity between a function included in the target function fingerprint 730 and a function included in the weak function fingerprint 643 that is a set of binary codes related to vulnerabilities, and thus has a positive correlation with the probability that a vulnerability is determined in the target software. It may have a relationship (ie, a proportional relationship).

또한, 프로세서(110)는 취약점 시그니처(650)에 포함된 패치 함수 지문과 타겟 함수 지문(730) 각각에 포함된 함수 간의 제 2 유사도를 산출할 수 있다. 제 2 유사도는, 타겟 함수 지문(730)에 포함된 함수와 취약점을 야기시기지 않는(즉, 패치가 완료된) 바이너리 코드의 집합인 패치 함수 지문에 포함된 함수 간의 유사도이므로, 타겟 소프트웨어서 취약점이 판별된 확률과 음의 상관 관계(즉, 반비례 관계)를 가질 수 있다. Also, the processor 110 may calculate a second similarity between the patch function fingerprint included in the vulnerability signature 650 and the function included in each of the target function fingerprint 730 . The second similarity is the similarity between the function included in the target function fingerprint 730 and the function included in the patch function fingerprint, which is a set of binary codes that do not cause a vulnerability (that is, the patch has been completed). It may have a negative correlation (ie, an inverse relationship) with the determined probability.

또한, 프로세서(110)는 제 1 유사도 및 제 2 유사도의 차이에 기초하여 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별할 수 있다. 프로세서(110)는 제 1 유사도와 제 2 유사도의 차이가 사전 결정된 기준을 초과하는 경우, 타겟 소프트웨어에 취약점이 존재하는 것으로 판별할 수 있다.Also, the processor 110 may determine whether a vulnerability exists in the target software based on the difference between the first and second similarities. When the difference between the first similarity and the second similarity exceeds a predetermined criterion, the processor 110 may determine that a vulnerability exists in the target software.

구체적으로, 프로세서(110)는 타겟 함수 지문(730)의 함수가 취약 함수 지문(643)의 함수와 유사할 경우 취약할 확률이 높은 것으로 판별할 수 있고, 그리고 타겟 함수 지문(730)의 함수가 패치 함수 지문의 함수와 유사할 경우 취약하지 않을 확률이 높은 것으로 판별할 수 있다. 이 경우, 컴퓨팅 장치(100)는 취약점 시그니처(650)가 정확하게 생성되지 못한 경우, 패치된 함수 지문과의 유사도 산출 과정에서 오탐이 발생할 가능성을 최소화시키기 위해 1보다 작은 계수를 곱해 확률을 낮게 반영할 수 있다. 프로세서(110)는 취약할 확률(즉, 제 1 유사도)에서 취약하지 않은 확률(즉, 제 2 유사도)을 뺀 값이 사전 결정된 기준값 보다 큰 경우, 취약점이 존재하는 것으로 판별할 수 있다.Specifically, when the function of the target function fingerprint 730 is similar to the function of the weak function fingerprint 643 , the processor 110 may determine that the probability of being vulnerable is high, and the function of the target function fingerprint 730 is If it is similar to the function of the patch function fingerprint, it can be determined that the probability of not being vulnerable is high. In this case, when the vulnerability signature 650 is not accurately generated, the computing device 100 multiplies a coefficient less than 1 to minimize the possibility of a false positive in the process of calculating the similarity with the patched function fingerprint to reflect the low probability. can When the value obtained by subtracting the probability of not being vulnerable (ie, the second similarity) from the probability of being vulnerable (ie, the first similarity) is greater than a predetermined reference value, the processor 110 may determine that the vulnerability exists.

본 개시의 일 실시예에 따르면, 프로세서(110)는 취약점 시그니처(650) 및 타겟 함수 지문(730)에 대한 필터링을 수행하여 유사도 판별 과정을 보다 효율적으로 수행할 수 있다. 유사도 판별 과정을 효율적으로 수행하기 위한 구체적인 필터링 과정에 대한 구체적인 설명은 도 5를 참조하여 이하에서 후술하도록 하며, 중복 설명은 생략하도록 한다. According to an embodiment of the present disclosure, the processor 110 may perform a similarity determination process more efficiently by filtering the vulnerability signature 650 and the target function fingerprint 730 . A detailed description of a specific filtering process for efficiently performing a similarity determination process will be described later with reference to FIG. 5 , and a redundant description will be omitted.

프로세서(110)는 함수의 속성 정보에 기초하여 취약점 시그니처(650) 및 타겟 함수 지문(730)에 대한 1차 필터링을 수행하여 후보 취약점 시그니처(651) 및 후보 타겟 함수 지문(731)을 선별할 수 있다(410). 1차 필터링은, 취약점 시그니처(650) 및 타겟 함수 지문(730) 각각에서 취약점에 관련한 코드를 식별하기 위한 필터링일 수 있다.The processor 110 may select the candidate vulnerability signature 651 and the candidate target function fingerprint 731 by performing primary filtering on the vulnerability signature 650 and the target function fingerprint 730 based on the attribute information of the function. There is (410). The primary filtering may be filtering for identifying a code related to a vulnerability in each of the vulnerability signature 650 and the target function fingerprint 730 .

1차 필터링은, 예를 들어, 취약점 시그니처(650)에 포함된 함수 지문 및 타겟 함수 지문(730) 각각에 대응하는 함수의 속성 정보에 기초하여 수행될 수 있다. 구체적으로, 취약점 시그니처(650)는, 취약 함수 지문(643) 및 패치 함수 지문(644)으로 구성될 수 있다. 또한, 취약 함수 지문(643), 패치 함수 지문(644) 및 타겟 함수 지문(730) 각각에는, 프로세서(110)에 의한 전처리 과정을 통해 함수의 속성 정보가 맵핑되어 있을 수 있다. 함수의 속성 정보는, 함수 각각이 컴파일 환경에서 안정적인지 여부에 관련한 정보일 수 있다. 즉, 프로세서(110)는 취약 함수 지문(643), 패치 함수 지문(644) 및 타겟 함수 지문(730) 각각에 맵핑된 함수의 속성 정보에 기초하여 취약 함수 지문(643), 패치 함수 지문(644) 및 타겟 함수 지문(730)에 포함된 함수 중 다양한 컴파일 환경에서 안정적인 함수를 제외시키는 1차 필터링(즉, 취약점과 관련 없는 함수 지문을 필터링)을 수행할 수 있다. 전술한, 함수의 속성에 관련한 1차 필터링에 대한 구체적인 기재는 예시일 뿐이며, 본 개시에서의 1차 필터링은, 각 함수 지문에서 취약점에 관련한 바이너리 코드들을 식별하기 위한 다양한 필터 조건을 포함할 수 있다.The primary filtering may be performed, for example, based on attribute information of a function corresponding to each of the function fingerprint and the target function fingerprint 730 included in the vulnerability signature 650 . Specifically, the vulnerability signature 650 may include a weak function fingerprint 643 and a patch function fingerprint 644 . In addition, each of the weak function fingerprint 643 , the patch function fingerprint 644 , and the target function fingerprint 730 may have function attribute information mapped through a preprocessing process by the processor 110 . The attribute information of the function may be information related to whether each function is stable in a compilation environment. That is, the processor 110 generates a weak function fingerprint 643 and a patch function fingerprint 644 based on the attribute information of the function mapped to each of the weak function fingerprint 643 , the patch function fingerprint 644 , and the target function fingerprint 730 . ) and the functions included in the target function fingerprint 730 , primary filtering (ie, filtering function fingerprints not related to vulnerabilities) may be performed to exclude stable functions from various compilation environments. The above-mentioned detailed description of the primary filtering related to the property of the function is only an example, and the primary filtering in the present disclosure may include various filter conditions for identifying binary codes related to vulnerabilities in each function fingerprint. .

다시 말해, 프로세서(110)는 각 지문에 맵핑된 함수 속성 정보에 기초하여 취약점에 관련한 바이너리 코드에 대응하는 함수 지문만을 로드함으로써, 비교 과정에서의 전체 프로세서의 속도를 향상시키고, 그리고 리소스의 사용량을 저감시킬 수 있다. 즉, 프로세서(110)는 유사도 비교의 대상이 되는 함수 지문들을 축소시켜 성능을 향상시킬 수 있다.In other words, the processor 110 loads only the function fingerprint corresponding to the binary code related to the vulnerability based on the function attribute information mapped to each fingerprint, thereby improving the speed of the entire processor in the comparison process, and reducing the resource usage. can be reduced. That is, the processor 110 may improve performance by reducing function fingerprints that are objects of similarity comparison.

또한, 프로세서(110)는 후보 취약점 시그니처(651) 및 후보 타겟 함수 지문(731)에 대한 2차 필터링을 수행하여 각 함수 지문에서 유사 스트랜드를 식별할 수 있다(420). 후보 취약점 시그니처(651)에 포함된 후보 취약 함수 지문 및 후보 패치 함수 지문과 후보 타겟 함수 지문(731)은 프로세서(110)의 전처리 과정을 통해 스트랜드 단위로 구분되어 있을 수 있다. 이 경우, 각 함수 지문에 포함되는 하나 이상의 스트랜드 각각은, 하나의 변수 값을 계산하는데 필요한 명령 세트로써, 하나의 독립적인 데이터 흐름을 나타내는 것일 수 있다. 구체적으로, 프로세서(110)는 후보 취약 함수 지문에 포함된 하나 이상의 스트랜드와 후보 타겟 함수 지문(731)에 포함된 하나 이상의 스트랜드 중 유사한 스트랜드를 식별할 수 있다. 또한, 프로세서(110)는 후보 패치 함수 지문에 포함된 하나 이상의 스트랜드와 후보 타겟 함수 지문(731)에 포함된 하나 이상의 스트랜드 중 유사한 스트랜드를 식별할 수 있다. 즉, 각 후보 함수 지문에 포함된 하나 이상의 스트랜드 중 관련성을 가진 유사 스트랜드만을 유사성 비교를 위한 함수로써 식별할 수 있다. 다시 말해, 함수들의 대표성을 나타내는 스트랜드 단위의 비교를 통해 각 함수 지문에서 유사도 비교 대상이 되는 함수를 보다 효율적으로 식별할 수 있으므로, 처리 속도가 향상될 수 있다. In addition, the processor 110 may perform secondary filtering on the candidate vulnerability signature 651 and the candidate target function fingerprint 731 to identify similar strands in each function fingerprint ( 420 ). The candidate weak function fingerprint, the candidate patch function fingerprint, and the candidate target function fingerprint 731 included in the candidate vulnerability signature 651 may be divided into strand units through a preprocessing process of the processor 110 . In this case, each of the one or more strands included in each function fingerprint may represent one independent data flow as a set of instructions required to calculate one variable value. Specifically, the processor 110 may identify a similar strand among one or more strands included in the candidate weak function fingerprint and one or more strands included in the candidate target function fingerprint 731 . In addition, the processor 110 may identify a similar strand among one or more strands included in the candidate patch function fingerprint and one or more strands included in the candidate target function fingerprint 731 . That is, only similar strands having relevance among one or more strands included in each candidate function fingerprint may be identified as a function for similarity comparison. In other words, a function to be compared in the similarity in each function fingerprint can be more efficiently identified through the comparison of the strand unit representing the representativeness of the functions, so that the processing speed can be improved.

또한, 프로세서(110)는 후보 취약점 시그니처(651) 및 후보 타겟 함수 지문(731) 각각에서 식별된 유사 스트랜드 간의 유사도를 산출할 수 있다(430). 이 경우, 각 스트랜드 간의 유사도 비교는 N-gram 유사도 비교일 수 있다. 또한, 프로세서(110)는 각 함수(즉, 후보 취약점 시그니처(651)에 포함된 함수 지문 및 후보 타겟 함수 지문(731)) 별 코드 흐름 집합 사이의 유사도에 기초하여 한 함수 쌍의 유사도를 산출할 수 있다. 전술한 유사도 비교 방법은 예시일 뿐이며 본 개시는 이에 제한되지 않는다. In addition, the processor 110 may calculate a similarity between the similar strands identified in each of the candidate vulnerability signature 651 and the candidate target function fingerprint 731 ( 430 ). In this case, the similarity comparison between each strand may be an N-gram similarity comparison. In addition, the processor 110 calculates the similarity of one function pair based on the similarity between the code flow sets for each function (that is, the function fingerprint included in the candidate vulnerability signature 651 and the candidate target function fingerprint 731 ). can The above-described similarity comparison method is merely an example, and the present disclosure is not limited thereto.

또한, 프로세서(110)는 유사도 비교 결과에 기초하여 코드 클론 및 취약점 유무를 판별하여 클라이언트에게 알림을 제공할 수 있다(440). 구체적으로, 프로세서(110)는 제 1 유사도 및 제 2 유사도의 차이가 사전 결정된 기준을 초과하는 경우, 타겟 소프트웨어에 취약점이 존재하는 것으로 판별할 수 있다. 또한, 프로세서(110)는 타겟 소프트웨어에 취약점이 존재하는 것으로 판별한 경우, 클라이언트에게 취약점에 관련한 알림을 제공할 수 있다. 취약점에 관련한 알림이 클라이언트에게 제공됨에 따라, 클라이언트는 취약점 판별의 대상이 되는 타겟 소프트웨어가 취약점을 포함하는지 여부를 인지할 수 있다. In addition, the processor 110 may provide a notification to the client by determining the presence or absence of a code clone and a vulnerability based on the similarity comparison result ( 440 ). Specifically, when the difference between the first degree of similarity and the second degree of similarity exceeds a predetermined criterion, the processor 110 may determine that a vulnerability exists in the target software. Also, when it is determined that a vulnerability exists in the target software, the processor 110 may provide a notification related to the vulnerability to the client. As the notification related to the vulnerability is provided to the client, the client can recognize whether the target software, which is the target of vulnerability determination, includes the vulnerability.

따라서, 프로세서(110)는 취약점 정보에 관련한 전처리를 통해 소프트웨어의 바이너리에 기반한 코드 클론 및 취약점 판별에 기준이 되는 취약점 시그니처(650)를 사전 생성할 수 있다. 또한, 프로세서(110)는 임의의 소프트웨어의 타겟 바이너리를 취약점 시그니처(650)와 비교 가능한 형태로 전처리하여 타겟 함수 지문(730)을 생성할 수 있다. 또한, 프로세서(110)는 취약점 시그니처(650)와 타겟 함수 지문(730)의 유사도 비교에 기초하여 타겟 소프트웨어의 코드 클론 및 취약점을 판별할 수 있다. 이 경우, 프로세서(110)에 의해 전처리된 각 함수 지문(즉, 취약 함수 지문(643), 패치 함수 지문(644) 및 타겟 함수 지문(730))은 스트랜드 단위로 저장되어 있으며, 각각의 함수의 속성 정보가 맵핑되어 있으므로, 각 함수 지문의 유사도 비교 과정에 1차 필터링 및 2차 필터링이 수행될 수 있다. 이에 따라, 코드 클론 탐지 및 취약점 판별의 효율 및 처리 속도가 향상될 수 있다. Accordingly, the processor 110 may pre-generate the code clone based on the binary of the software and the vulnerability signature 650 as a basis for vulnerability determination through pre-processing related to the vulnerability information. In addition, the processor 110 may generate the target function fingerprint 730 by preprocessing the target binary of any software in a form comparable to the vulnerability signature 650 . Also, the processor 110 may determine a code clone and a vulnerability of the target software based on a similarity comparison between the vulnerability signature 650 and the target function fingerprint 730 . In this case, each function fingerprint (that is, the weak function fingerprint 643, the patch function fingerprint 644, and the target function fingerprint 730) preprocessed by the processor 110 is stored in a strand unit, and the Since attribute information is mapped, primary filtering and secondary filtering may be performed in the similarity comparison process of each function fingerprint. Accordingly, the efficiency and processing speed of code clone detection and vulnerability determination may be improved.

도 7은 본 개시의 일 실시예와 관련된 바이너리 코드 클론 기반 소프트웨어의 취약점 판별하기 위한 시스템의 처리 과정을 예시적으로 나타낸 모식도이다. 도 7에서 도시되는 내용에 대한 특징 중 도 2 내지 5와 관련하여 앞서 설명된 특징과 중복되는 특징에 대해서는 도 2 내지 5에 기재된 내용을 참고하고 여기에서는 그 설명을 생략하기로 한다.7 is a schematic diagram exemplarily illustrating a processing process of a system for determining a vulnerability of binary code clone-based software related to an embodiment of the present disclosure. Among the features of the content shown in FIG. 7 , for features overlapping with the features described above with respect to FIGS. 2 to 5 , refer to the features described in FIGS. 2 to 5 , and a description thereof will be omitted herein.

도 7에 도시된 바와 같이, 프로세서(110)는 취약점 정보 전처리(200), 타겟 바이너리 전처리(300) 및 코드 클론 탐지 및 취약점 판별(400) 과정을 통해 임의의 소프트웨어를 대상으로 코드 클론 및 취약점 유무를 판별할 수 있다. As shown in FIG. 7 , the processor 110 targets code clones and vulnerabilities in arbitrary software through vulnerability information preprocessing 200 , target binary preprocessing 300 , and code clone detection and vulnerability determination 400 processes. can be identified.

자세히 설명하면, 프로세서(110)는 취약점 정보 전처리(200)를 수행할 수 있다. 취약점 정보는, 소스 코드 형태의 취약점 패치 정보, 취약 바이너리 코드(610) 및 패치 바이너리 코드(620)를 포함할 수 있다. 취약점 패치 정보는, 소프트웨어의 알려진 취약점을 식별하기 위하여 표준화된 정보로, 패치 전 또는, 패치 후의 코드 변화에 대한 정보를 포함할 수 있다. 프로세서(110)는 취약점 패치 정보에 기초하여 취약 바이너리 코드(610)에서 제거와 관련한 코드를 식별할 수 있고, 그리고 패치 바이너리 코드(620)에서 추가와 관련한 코드를 식별할 수 있다. 즉, 프로세서(110)는 취약점 패치 정보에 기초하여 취약 바이너리 코드(610) 및 패치 바이너리 코드(620) 각각에서 변경에 관련한 코드(즉, 제거 또는, 추가된 코드)를 식별할 수 있다. 이에 따라, 프로세서(110)는 취약 바이너리 코드(610) 및 패치 바이너리 코드(620) 각각에서 변경에 관련한 코드에 변경 표시(630)를 수행할 수 있다. 변경 표시(630)는 취약점 정보의 전처리 과정에서 전체 바이너리 코드에 대한 처리가 아닌, 취약점에 관련한 부분만을 식별하여 해당 명령어에 관련한 전처리만을 수행하기 위한 것일 수 있다. 프로세서(110)는 취약 바이너리 코드(610) 및 패치 바이너리 코드(620)에 대한 전처리(640)를 수행하여 취약점 시그니처(650)를 생성할 수 있다. 또한, 프로세서(110)는 취약 바이너리 코드(610) 및 패치 바이너리 코드(620)의 전처리를 통해 생성된 취약점 시그니처(650)를 타겟 함수 지문(730)과의 매칭을 위해 데이터베이스(660)에 저장할 수 있다.In more detail, the processor 110 may perform the vulnerability information pre-processing 200 . The vulnerability information may include vulnerability patch information in the form of a source code, a vulnerable binary code 610 and a patch binary code 620 . Vulnerability patch information is standardized information to identify known vulnerabilities of software, and may include information on code changes before or after patching. The processor 110 may identify a code related to removal from the vulnerable binary code 610 based on the vulnerability patch information, and may identify a code related to addition from the patch binary code 620 . That is, the processor 110 may identify a code related to a change (ie, a code that is removed or added) in each of the vulnerable binary code 610 and the patch binary code 620 based on the vulnerability patch information. Accordingly, the processor 110 may perform a change mark 630 on the code related to the change in each of the weak binary code 610 and the patch binary code 620 . The change indication 630 may be for performing only the pre-processing related to the command by identifying only the part related to the vulnerability, not the entire binary code, in the pre-processing of the vulnerability information. The processor 110 may generate the vulnerability signature 650 by performing pre-processing 640 on the vulnerable binary code 610 and the patch binary code 620 . In addition, the processor 110 may store the vulnerability signature 650 generated through preprocessing of the vulnerable binary code 610 and the patch binary code 620 in the database 660 for matching with the target function fingerprint 730 . have.

또한, 프로세서(110)는 타겟 바이너리에 대한 전처리(300)를 수행할 수 있다. 타겟 바이너리 코드(710)는 타겟 소프트웨어에 관한 바이너리 코드일 수 있으며, 타겟 바이너리 코드(710)에 대한 전처리는, 타겟 바이너리 코드를 취약점 시그니처와 매칭 가능한 형태로 가공하기 위한 것일 수 있다. 구체적으로, 프로세서(110)는 타겟 바이너리 코드(710)에 대한 전처리(720)를 통해 타겟 함수 지문(730)을 생성할 수 있다. 또한, 프로세서(110)는 생성된 타겟 함수 지문을 캐시 메모리(740)에 저장할 수 있다. 즉, 프로세서(110)는 코드 클론 및 취약점 판별에 대상이 되는 임의의 소프트웨어(즉, 타겟 소프트웨어)의 바이너리 코드에 대한 전처리(720)를 통해 타겟 함수 지문(730)을 생성할 수 있으며, 생성된 타겟 함수 지문(730)을 캐시 메모리(740)에 저장할 수 있다. 이러한 타겟 바이너리 코드에 대한 전처리는 클라이언트로부터 수신하는 타겟 소프트웨어의 코드 클론 및 취약점 판별에 관련한 쿼리 요청에 기초하여 수행되는 것일 수 있다. 즉, 프로세서(110)는 클라이언트의 쿼리 요청에 대응하는 최초 시점에 타겟 바이너리 코드에 대한 전처리를 수행하여 타겟 함수 지문을 생성할 수 있다.In addition, the processor 110 may perform the preprocessing 300 on the target binary. The target binary code 710 may be a binary code related to the target software, and the preprocessing of the target binary code 710 may be to process the target binary code into a form that can match the vulnerability signature. Specifically, the processor 110 may generate the target function fingerprint 730 through the preprocessing 720 for the target binary code 710 . Also, the processor 110 may store the generated target function fingerprint in the cache memory 740 . That is, the processor 110 may generate the target function fingerprint 730 through the preprocessing 720 for the binary code of arbitrary software (ie, target software) that is the target of code clone and vulnerability determination, and the generated The target function fingerprint 730 may be stored in the cache memory 740 . The pre-processing of the target binary code may be performed based on a query request related to code clone and vulnerability determination of the target software received from the client. That is, the processor 110 may generate a target function fingerprint by performing pre-processing on the target binary code at an initial point in time corresponding to the query request of the client.

또한, 프로세서(110)는 임의의 소프트웨어에 대한 코드 클론 탐지 및 취약점 유무를 판별(400)할 수 있다. 구체적으로, 프로세서(110)는 취약점 정보의 전처리를 통해 생성된 취약점 시그니처(650)와 타겟 바이너리 코드(710)의 전처리를 통해 생성된 타겟 함수 지문(730)과의 유사도 비교를 통해 코드 클론을 탐지할 수 있다. 이 경우, 취약점 시그니처(650)는 소프트웨어에서 취약점에 관련한 코드들의 집합이므로, 타겟 바이너리 코드에 대응하는 타겟 함수 지문에서 코드 클론이 탐지되는 경우, 타겟 소프트웨어가 취약점을 내포하고 있는 것일 수 있다. 즉, 프로세서(110)는 취약점 시그니처(650)와 타겟 함수 지문(730)의 코드 흐름의 유사도를 통해 코드 클론 및 취약점을 판별할 수 있다. 또한, 프로세서(110)는 취약점 시그니처(650)와 타겟 함수 지문(730) 간의 코드 클론 탐지를 위한 매칭(810) 과정에서의 고속화를 위해 취약점 시그니처(650)와 타겟 함수 지문(730) 각각에 대한 필터링을 수행할 수 있다. 자세히 설명하면, 프로세서(110)는 취약점 시그니처(650) 및 타겟 함수 지문(730) 각각에 대한 지문 필터링(820)을 수행할 수 있다. 또한, 프로세서(110)는 지문 필터링(820) 된 취약점 시그니처(650) 및 타겟 함수 지문(730) 각각에 대한 스트랜드 필터링(830)을 수행할 수 있다. 이 경우, 지문 필터링(820) 및 스트랜드 필터링(830)은 도 5를 참조하여 설명한 1차 필터링 및 2차 필터링 각각에 대응하는 필터링일 수 있다. 즉, 프로세서(110)는 취약점 시그니처(650) 및 타겟 함수 지문(730) 각각에 대한 지문 필터링(820) 및 스트랜드 필터링(820)을 통해 코드 클론을 탐지하기 위한 지문 매칭(840)을 최소화시킬 수 있다. 이에 따라, 코드 클론 및 취약점 판별 과정이 고속화될 수 있다.In addition, the processor 110 may detect code clones for arbitrary software and determine the presence or absence of vulnerabilities ( 400 ). Specifically, the processor 110 detects a code clone by comparing the similarity between the vulnerability signature 650 generated through the preprocessing of the vulnerability information and the target function fingerprint 730 generated through the preprocessing of the target binary code 710 . can do. In this case, since the vulnerability signature 650 is a set of codes related to vulnerabilities in software, when a code clone is detected in the target function fingerprint corresponding to the target binary code, the target software may contain a vulnerability. That is, the processor 110 may determine the code clone and the vulnerability through the similarity of the code flow between the vulnerability signature 650 and the target function fingerprint 730 . In addition, the processor 110 performs the processing for each of the vulnerability signature 650 and the target function fingerprint 730 to speed up the matching 810 process for code clone detection between the vulnerability signature 650 and the target function fingerprint 730 . Filtering can be performed. In more detail, the processor 110 may perform fingerprint filtering 820 on each of the vulnerability signature 650 and the target function fingerprint 730 . Also, the processor 110 may perform strand filtering 830 on each of the vulnerability signature 650 and the target function fingerprint 730 subjected to the fingerprint filtering 820 . In this case, the fingerprint filtering 820 and the strand filtering 830 may be filtering corresponding to each of the primary filtering and the secondary filtering described with reference to FIG. 5 . That is, the processor 110 may minimize the fingerprint matching 840 for detecting code clones through the fingerprint filtering 820 and the strand filtering 820 for the vulnerability signature 650 and the target function fingerprint 730, respectively. have. Accordingly, code cloning and vulnerability determination processes can be speeded up.

도 8은 본 개시의 일 실시예와 관련된 취약점 정보 전처리 및 바이너리 코드 전처리 과정을 예시적으로 나타낸 모식도이다. 도 8에서 도시되는 내용에 대한 특징 중 도 2 내지 5와 관련하여 앞서 설명된 특징과 중복되는 특징에 대해서는 도 2 내지 5에 기재된 내용을 참고하고 여기에서는 그 설명을 생략하기로 한다.8 is a schematic diagram exemplarily illustrating a process of preprocessing vulnerability information and preprocessing a binary code related to an embodiment of the present disclosure. Among the features of the content shown in FIG. 8, for features overlapping with the features described above with respect to FIGS. 2 to 5, refer to the content described in FIGS. 2 to 5 and a description thereof will be omitted herein.

본 개시의 일 실시예에 따르면, 프로세서(110) 취약점 정보에 기초하여 취약점 시그니처(650)를 생성할 수 있다. 구체적으로, 프로세서(110)는 취약 바이너리 코드(610) 및 패치 바이너리 코드(620) 각각에 변경 표시(630)를 수행할 수 있다. 변경 표시(630)는 취약점 시그니처 생성 과정에서 전체 바이너리 코드에 대한 처리가 아닌, 취약점에 관련한 부분만을 식별하여 해당 명령어에 관련한 전처리만을 수행하기 위한 것일 수 있다. 프로세서(110)는 취약 바이너리 코드에 추가 표시(632)를 수행할 수 있다. 추가 표시(632)는 보안 패치에 의해 취약한 바이너리에서 제거된 명령어와 관련한 것일 수 있다. 프로세서(110)는 취약 바이너리 코드에 제거 표시(631)를 수행할 수 있다. 제거 표시(631)는 보안 패치에 의해 취약한 바이너리 코드에 추가된 명령어와 관련한 것일 수 있다.According to an embodiment of the present disclosure, the vulnerability signature 650 may be generated based on the vulnerability information of the processor 110 . Specifically, the processor 110 may perform a change mark 630 on each of the vulnerable binary code 610 and the patch binary code 620 . The change indication 630 may be for performing only pre-processing related to a corresponding command by identifying only a part related to a vulnerability rather than processing the entire binary code in the process of generating a vulnerability signature. The processor 110 may perform an additional mark 632 to the vulnerable binary code. Additional indications 632 may relate to instructions removed from the vulnerable binary by a security patch. The processor 110 may perform a removal mark 631 on the vulnerable binary code. The removal mark 631 may relate to an instruction added to the vulnerable binary code by a security patch.

또한, 프로세서(110)는 취약 바이너리 코드(610) 및 제거 표시(631)에 기초하여 취약 바이너리 코드 전처리(641)를 수행할 수 있다. 또한, 프로세서(110)는 패치 바이너리 코드(620) 및 추가 표시(632)에 기초하여 패치 바이너리 코드 전처리(642)를 수행할 수 있다. 본 개시에서, 바이너리 코드들(즉, 취약 바이너리 코드, 패치 바이너리 코드 및 타겟 바이너리 코드)은 코드 클론 탐지를 위한 유사도 비교를 위해 동일한 형태로 가공되어야 하므로, 대응되는 전처리 과정을 포함할 수 있다. 구체적으로, 프로세서(110)는 취약 바이너리 코드(610) 및 제거 표시(631)에 기초하여 취약 바이너리 코드 전처리(641)를 수행할 수 있으며, 패치 바이너리 코드 및 추가 표시(632)에 기초하여 패치 바이너리 코드 전처리(642)를 수행할 수 있다. 또한, 프로세서(110)는 타겟 바이너리 코드 및 임의의 표시(920)에 기초하여 타겟 바이너리 코드 전처리를 수행할 수 있다. 임의의 표시(920)는 타겟 바이너리 코드에서 클라이언트의 요청에 관련한 코드 흐름만을 대상으로 전처리를 수행하기 위한 것일 수 있다. 임의의 표시는 클라이언트의 요청에 기반한 것으로 선택적인 것일 수 있다. 임의의 표시가 타겟 바이너리 코드에 부여된 경우, 프로세서(110)는 해당 부분의 코드 흐름만을 기초로 전처리를 수행하여 타겟 지문 함수를 생성할 수 있으므로, 전처리 과정의 효율성이 증대될 수 있다.In addition, the processor 110 may perform the vulnerable binary code preprocessing 641 based on the vulnerable binary code 610 and the removal mark 631 . In addition, the processor 110 may perform a patch binary code preprocessing 642 based on the patch binary code 620 and the additional indication 632 . In the present disclosure, since binary codes (ie, weak binary code, patch binary code, and target binary code) must be processed in the same form for similarity comparison for code clone detection, a corresponding preprocessing process may be included. Specifically, the processor 110 may perform the vulnerable binary code preprocessing 641 based on the vulnerable binary code 610 and the removal mark 631 , and the patch binary code and the patch binary based on the additional mark 632 . Code preprocessing 642 may be performed. In addition, the processor 110 may perform target binary code preprocessing based on the target binary code and the arbitrary representation 920 . The arbitrary indication 920 may be for performing pre-processing on only the code flow related to the client's request in the target binary code. Any indication may be optional based on the request of the client. When an arbitrary mark is given to the target binary code, the processor 110 may generate the target fingerprint function by performing preprocessing based on only the code flow of the corresponding part, so that the efficiency of the preprocessing process may be increased.

상술한 바와 같이, 프로세서(110)에 의해 수행되는 바이너리 코드들(즉, 취약 바이너리 코드, 패치 바이너리 코드 및 타겟 바이너리 코드)에 대한 전처리는 대응되는 수행 과정을 포함하므로, 바이너리 코드 전처리(640)로 통합하여 구체적으로 후술하도록 한다.As described above, the preprocessing for binary codes (ie, weak binary code, patch binary code, and target binary code) performed by the processor 110 includes a corresponding execution process, so that the binary code preprocessing 640 is performed. Integrate to be described in detail later.

프로세서(110)는 바이너리 코드(910)를 분석하여 바이너리에 대응하는 함수 및 바이너리 속성 정보를 획득(930, 940)할 수 있다. 또한, 프로세서(110)는 바이너리 코드에 대응하는 함수를 중간 언어로 번역(950)할 수 있다. 바이너리 코드에 대응하는 함수를 중간 언어로 번역함으로써, 여러 아키텍처의 바이너리를 사용하는 코드 클론 탐지 알고리즘을 고속화할 수 있다. 즉, 중간 언어를 통해 여러 아키텍처에서 하나의 코드와 알고리즘을 활용할 수 있다. 또한, 하나의 코드와 알고리즘을 여러 아키텍처에서 활용할 수 있으므로, 함수를 스트랜드(strand)로 분할하기 위한 알고리즘을 보다 용이하게 작성할 수 있다. 또한, 프로세서(110)는 함수를 스트랜드로 분할(960)하고, 그리고 함수의 속성 정보를 추출(970)할 수 있다. 즉, 프로세서(110)는 바이너리 함수를 스트랜드(961)로 분할하고, 그리고 분할된 스트랜드 각각에 대응하는 함수의 속성 정보(971) 및 바이너리 속성 정보(941)를 맵핑하여 함수 지문(980)을 생성할 수 있다.The processor 110 may analyze the binary code 910 to obtain function and binary attribute information corresponding to the binary ( 930 , 940 ). Also, the processor 110 may translate 950 a function corresponding to the binary code into an intermediate language. By translating a function corresponding to a binary code into an intermediate language, it is possible to speed up a code clone detection algorithm using binaries of various architectures. In other words, an intermediate language allows one code and algorithm to be leveraged across multiple architectures. Also, since one code and algorithm can be used on multiple architectures, it is easier to write algorithms for splitting a function into strands. Also, the processor 110 may divide the function into strands ( 960 ) and extract attribute information of the function ( 970 ). That is, the processor 110 divides the binary function into strands 961 , and maps the attribute information 971 and the binary attribute information 941 of the function corresponding to each of the divided strands to generate a function fingerprint 980 . can do.

즉, 상술한 과정에서 바이너리 코드(910)가 취약 바이너리 코드(610)인 경우, 프로세서(110)는 취약 함수 지문(643)을 생성할 수 있으며, 패치 바이너리 코드(620)인 경우, 프로세서(110)는 패치 함수 지문(644)을 생성할 수 있다. 또한, 바이너리 코드(910)가 타겟 바이너리 코드(710)인 경우, 프로세서(110)는 타겟 함수 지문을 생성할 수 있다.That is, in the above process, when the binary code 910 is the weak binary code 610 , the processor 110 may generate the weak function fingerprint 643 , and in the case of the patch binary code 620 , the processor 110 . ) may generate a patch function fingerprint 644 . Also, when the binary code 910 is the target binary code 710 , the processor 110 may generate a target function fingerprint.

프로세서(110)는 취약 함수 지문(643) 및 패치 함수 지문(644)을 포함하는 취약점 시그니처를 생성할 수 있으며, 취약 함수 지문(643) 및 패치 함수 지문(644) 각각을 타겟 함수 지문(730)과 비교하여 타겟 소프트웨어에서의 코드 클론 탐지 및 취약점 판별을 수행할 수 있다.The processor 110 may generate a vulnerability signature including a weak function fingerprint 643 and a patch function fingerprint 644 , and apply each of the weak function fingerprint 643 and the patch function fingerprint 644 to the target function fingerprint 730 . Compared to , it is possible to perform code clone detection and vulnerability determination in the target software.

도 9는 본 개시의 일 실시예와 관련된 코드 클론 탐지 및 취약점 판별 과정을 예시적으로 나타낸 모식도이다. 도 9에서 도시되는 내용에 대한 특징 중 도 2 내지 5와 관련하여 앞서 설명된 특징과 중복되는 특징에 대해서는 도 2 내지 5에 기재된 내용을 참고하고 여기에서는 그 설명을 생략하기로 한다.9 is a schematic diagram exemplarily illustrating a code clone detection and vulnerability determination process related to an embodiment of the present disclosure. Among the features of the content shown in FIG. 9 , for features overlapping with the features described above with respect to FIGS. 2 to 5 , refer to the features described in FIGS. 2 to 5 , and a description thereof will be omitted herein.

본 개시의 일 실시예에 따르면, 프로세서(110)는 취약점 시그니처(650) 및 타겟 함수 지문(730) 간의 유사도 비교를 통해 코드 클론을 탐지하여 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별할 수 있다. 프로세서(110)는 취약점 시그니처(650) 및 타겟 함수 지문(730) 각각에 필터링을 수행하여 각 함수 지문 간의 코드 클론 탐지 속도를 극대화시킬 수 있다. 구체적으로, 프로세서(110)는 취약점 시그니처(650) 및 타겟 함수 지문(730) 각각에 대한 지문 필터링(820)을 수행하여 후보 취약점 시그니처(651) 및 후보 타겟 함수 지문(731)을 선별할 수 있다. 지문 필터링(820)은, 취약점 시그니처(650) 및 타겟 함수 지문(730) 각각에서 취약점에 관련한 코드를 식별하기 위한 필터링일 수 있다.According to an embodiment of the present disclosure, the processor 110 may determine whether a vulnerability exists in the target software by detecting a code clone through a similarity comparison between the vulnerability signature 650 and the target function fingerprint 730 . The processor 110 may perform filtering on each of the vulnerability signature 650 and the target function fingerprint 730 to maximize the code clone detection speed between each function fingerprint. Specifically, the processor 110 may perform fingerprint filtering 820 on each of the vulnerability signature 650 and the target function fingerprint 730 to select the candidate vulnerability signature 651 and the candidate target function fingerprint 731 . . The fingerprint filtering 820 may be filtering for identifying a code related to a vulnerability in each of the vulnerability signature 650 and the target function fingerprint 730 .

지문 필터링(820)은, 예를 들어, 취약점 시그니처에 포함된 취약 함수 지문(643)과 패치 함수 지문(644) 및 타겟 함수 지문(730) 각각에 대응하는 함수의 속성 정보에 기초하여 수행될 수 있다. 함수의 속성 정보는 함수 각각이 컴파일 환경에서 안정적인지 여부에 관련한 정보일 수 있다. 즉, 프로세서(110)는 지문 필터링(820)을 통해 취약점 시그니처(650) 및 타겟 함수 지문(730) 각각에 포함된 함수 중 다양한 컴파일 환경에서 안정적인 함수를 제외시키는 지문 필터링(820)을 수행함으로써, 취약점에 관련한 바이너리 코드에 대응하는 함수 지문만을 로드할 수 있다. 전술한, 함수의 속성에 관련한 지문 필터링에 대한 구체적인 기재는 예시일 뿐이며, 본 개시에서의 지문 필터링은, 각 함수 지문에서 취약점에 관련한 바이너리 코드들을 식별하기 위한 다양한 필터 조건을 포함할 수 있다. The fingerprint filtering 820 may be performed, for example, based on attribute information of a function corresponding to the weak function fingerprint 643, the patch function fingerprint 644, and the target function fingerprint 730 included in the vulnerability signature. have. The property information of the function may be information related to whether each function is stable in a compilation environment. That is, the processor 110 performs fingerprint filtering 820 that excludes stable functions from various compilation environments among functions included in each of the vulnerability signature 650 and the target function fingerprint 730 through the fingerprint filtering 820, Only the function fingerprint corresponding to the binary code related to the vulnerability can be loaded. The detailed description of the fingerprint filtering related to the property of the function described above is only an example, and the fingerprint filtering in the present disclosure may include various filter conditions for identifying binary codes related to vulnerabilities in each function fingerprint.

이에 따라, 취약점 시그니처(650)와 타겟 함수 지문(730) 간의 비교 과정에서 전체 프로세스의 속도를 향상시키고, 그리고 리소스 사용량을 저감시킬 수 있다. 즉, 프로세서(110)는 유사도 비교의 대상이 되는 함수 지문들을 축소시켜 성능을 향상시킬 수 있다. Accordingly, in the comparison process between the vulnerability signature 650 and the target function fingerprint 730 , it is possible to improve the speed of the entire process and reduce the resource usage. That is, the processor 110 may improve performance by reducing function fingerprints that are objects of similarity comparison.

또한, 프로세서(110)는 후보 취약점 시그니처(651) 및 후보 타겟 함수 지문(731)에 대한 지문 매칭(840)을 통해 타겟 소프트웨어에서의 코드 클론을 탐지하고, 그리고 취약점 유무를 판별할 수 있다. 지문 매칭(840)은 각 함수 지문의 유사도를 비교하기 위한 것일 수 있다. 프로세서(110)는 후보 취약점 시그니처(651) 및 후보 타겟 함수 지문(731)에 대한 스트랜드 필터링(830)을 수행하여 지문 매칭(840)의 대상이 되는 함수들을 식별할 수 있다. 구체적으로, 후보 취약점 시그니처(651)에 포함된 후보 취약 함수 지문 및 후보 패치 함수 지문과 후보 타겟 함수 지문(731)은 프로세서(110)의 전처리 과정을 통해 스트랜드 단위로 구분되어 있을 수 있다. 이 경우, 각 함수 지문의 하나 이상의 스트랜드 각각은, 하나의 변수 값을 계산하는데 필요한 명령어 세트로서, 하나의 독립적인 데이터 흐름을 나타내는 것일 수 있다. 즉, 프로세서(110)는 스트랜드 필터링(830)을 수행하여 각 후보 함수 지문에 포함된 하나 이상의 스트랜드 중 관련성을 가진 유사 스트랜드를 유사성 비교를 위한 함수로써 식별할 수 있다. 다시 말해, 함수들의 대표성을 나타내는 스트랜드 단위의 비교를 통해 각 함수 지문에서 유사도 비교 대상이 되는 함수를 보다 효율적으로 식별할 수 있으므로, 처리 속도가 향상될 수 있다. 프로세서(110)는 전술한 과정을 통해 식별된 함수 간의 유사도 비교를 통해 코드 클론을 탐지하고, 그리고 취약점 유무를 판별할 수 있다. 또한, 프로세서(110)는 타겟 소프트웨어에 대한 코드 클론 및 취약점에 관련한 알림을 클라이언트에게 전송(850)할 수 있다. In addition, the processor 110 may detect a code clone in the target software through fingerprint matching 840 for the candidate vulnerability signature 651 and the candidate target function fingerprint 731 , and determine whether there is a vulnerability. The fingerprint matching 840 may be for comparing the similarity of each functional fingerprint. The processor 110 may perform strand filtering 830 on the candidate vulnerability signature 651 and the candidate target function fingerprint 731 to identify functions targeted for the fingerprint matching 840 . Specifically, the candidate weak function fingerprint, the candidate patch function fingerprint, and the candidate target function fingerprint 731 included in the candidate vulnerability signature 651 may be divided into strand units through a preprocessing process of the processor 110 . In this case, each of the one or more strands of each function fingerprint may represent one independent data flow as a set of instructions required to calculate the value of one variable. That is, the processor 110 may perform strand filtering 830 to identify a similar strand having relevance among one or more strands included in each candidate function fingerprint as a function for similarity comparison. In other words, a function to be compared in the similarity in each function fingerprint can be more efficiently identified through the comparison of the strand unit representing the representativeness of the functions, so that the processing speed can be improved. The processor 110 may detect a code clone through a similarity comparison between functions identified through the above-described process, and determine whether or not there is a vulnerability. Also, the processor 110 may transmit ( 850 ) a notification related to a code clone and a vulnerability of the target software to the client.

본 개시의 바이너리 코드 클론 기반 소프트웨어 취약점 탐지법을 이용하여 타겟인 805MB의 데비안 9.0 C/C++ 바이너리에서 6종의 취약점 시그니처의 코드 클론이 존재하는지 검사하였다. 타겟 바이너리는 총 5,420개이며, 3,266,604개의 함수로 구성되어 있다. 이 경우, 취약점 시그니처는 알려진 6종의 취약점을 대표한다.Using the binary code clone-based software vulnerability detection method of the present disclosure, it was checked whether code clones of 6 types of vulnerability signatures exist in the target 805MB Debian 9.0 C/C++ binary. There are a total of 5,420 target binaries, and it consists of 3,266,604 functions. In this case, the vulnerability signature represents six known vulnerabilities.

타겟 바이너리 내 존재하는 알려진 취약점의 코드 클론 7개 중 6개를 성공적으로 탐지하여 85.7%의 정확도를 달성하였다. 또한 2종의 취약점 시그니처가 여러 타겟 바이너리에서 탐지되었으며, 동일한 소스 코드에서 컴파일된 바이너리뿐만 아니라 취약점을 가지고 있지만 다른 소스 코드에서 컴파일된 바이너리에서도 취약점을 탐지할 수 있음을 확인하였다.We successfully detected 6 out of 7 code clones of known vulnerabilities in the target binary, achieving an accuracy of 85.7%. In addition, two types of vulnerability signatures were detected in several target binaries, and it was confirmed that vulnerabilities could be detected not only in binaries compiled from the same source code, but also in binaries compiled from different source codes.

하나의 타겟 바이너리 전처리에 평균 3,450ms가 소요되었으며, 이를 각각의 취약점 시그니처와 비교할 때 평균적으로 2ms가 소요되었다. 병렬 프로그래밍을 적용한 결과 62분만에 모든 타겟 바이너리를 전처리하고 취약점 시그니처와 매칭할 수 있었으며, 전처리된 타겟 바이너리를 캐싱한 경우 코드 클론을 3분만에 매칭할 수 있었다.It took an average of 3,450 ms to preprocess one target binary, and 2 ms on average compared to each vulnerability signature. As a result of applying parallel programming, all target binaries were preprocessed and matched with vulnerability signatures in 62 minutes, and when the preprocessed target binaries were cached, code clones could be matched in 3 minutes.

즉, 본 개시는 바이너리 함수 쌍의 유사도만을 계산하는 종래의 기술과는 달리, 패치 전 함수와 패치 후 함수의 변화 내역에 기초하여 취약점 시그니처를 생성함으로써, 타겟 바이너리에 취약점이 존재하는지 여부를 판별할 수 있다.That is, the present disclosure can determine whether a vulnerability exists in the target binary by generating a vulnerability signature based on the change history of the function before and after the patch, unlike the conventional technique that calculates only the similarity of the binary function pair. can

또한, 본 개시는 타겟 바이너리 내 존재하는 알려진 취약점 중 85.7%의 취약점을 탐지하였으며, 하나의 취약점이 여러 바이너리에 존재하는 경우 모두 탐지가 가능함을 증명하였다.In addition, the present disclosure detected 85.7% of vulnerabilities among known vulnerabilities in the target binary, and proved that all of the vulnerabilities are detectable when one vulnerability exists in several binaries.

추가적으로, 본 개시에서 하나의 타겟 바이너리 전처리에는 평균 3,450ms가 소요되었으며, 이를 각각의 취약점 시그니처와 비교할 때 평균적으로 2ms가 소요되었다. 동일 환경에서 기존 기술인 Esh 대비 코드 흐름 한 쌍의 유사도를 약 15,000배 빠르게 계산할 수 있다. Esh의 경우 코드 흐름 한 쌍 계산에 평균

㎲가 소요되었지만, 본 발명의 경우 평균 134㎲만에 처리할 수 있었다. 따라서, 본 개시의 컴퓨팅 장치(100)는 종래 기술에 대비하여 임의의 소프트웨어 바이너리가 주어진 경우, 알려진 취약점을 빠르고 정확하게 탐지하는 효과를 제공할 수 있다.Additionally, in the present disclosure, it took an average of 3,450 ms to preprocess one target binary, and 2 ms was taken on average when compared with each vulnerability signature. In the same environment, the similarity of a pair of code flows can be calculated about 15,000 times faster than Esh, which is the existing technology. For Esh, the code flow averages on a pair of calculations

It took ㎲, but in the case of the present invention, it could be processed in an average of 134 ㎲. Accordingly, compared to the prior art, the computing device 100 of the present disclosure can provide an effect of quickly and accurately detecting known vulnerabilities when an arbitrary software binary is given.

도 10은 본 개시의 일 실시예와 관련된 바이너리 코드 클론 기반 소프트웨어의 취약점 판별 방법에 대한 예시적인 순서도를 도시한다.10 is a flowchart illustrating an exemplary method for determining a vulnerability of binary code clone-based software according to an embodiment of the present disclosure.

본 개시의 일 실시예에 따르면, 컴퓨팅 장치(100)는 소프트웨어의 취약점 정보에 대한 전처리를 수행하여 취약점 시그니처를 생성할 수 있다(1010).According to an embodiment of the present disclosure, the computing device 100 may perform pre-processing on vulnerability information of software to generate a vulnerability signature ( 1010 ).

본 개시의 일 실시예에 따르면, 컴퓨팅 장치(100)는 타겟 바이너리 코드에 대한 전처리를 수행하여 타겟 함수 지문을 생성할 수 있다(1020).According to an embodiment of the present disclosure, the computing device 100 may perform preprocessing on the target binary code to generate a target function fingerprint ( 1020 ).

본 개시의 일 실시예에 따르면, 컴퓨팅 장치(100)는 취약점 시그니처 및 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별할 수 있다(1030).According to an embodiment of the present disclosure, the computing device 100 may determine whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint ( S1030 ).

전술한 도 10의 도시된 단계들은 필요에 의해 순서가 변경될 수 있으며, 적어도 하나 이상의 단계가 생략 또는 추가될 수 있다. 즉, 전술한 단계는 본 개시의 실시예에 불과할 뿐, 본 개시의 권리 범위는 이에 제한되지 않는다.The order of the steps illustrated in FIG. 10 described above may be changed if necessary, and at least one or more steps may be omitted or added. That is, the above-described steps are merely embodiments of the present disclosure, and the scope of the present disclosure is not limited thereto.

도 11은 바이너리 코드 클론 기반 소프트웨어의 취약점 판별 방법을 구현하기 위한 로직을 도시한다.11 illustrates logic for implementing a method for determining a vulnerability of binary code clone-based software.

본 개시의 일 실시예에 따르면, 컴퓨팅 장치(100)는 다음과 같은 로직에 의해 구현될 수 있다.According to an embodiment of the present disclosure, the computing device 100 may be implemented by the following logic.

본 개시의 일 실시예에 따르면, 컴퓨팅 장치(100)는 소프트웨어의 취약점 정보에 대한 전처리를 수행하여 취약점 시그니처를 생성하기 위한 로직(1110), 타겟 바이너리 코드에 대한 전처리를 수행하여 타겟 함수 지문을 생성하기 위한 로직(1120) 및 취약점 시그니처 및 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하기 위한 로직(1130)을 포함할 수 있다.According to an embodiment of the present disclosure, the computing device 100 performs pre-processing on vulnerability information of software to generate a logic 1110 for generating a vulnerability signature, and performs pre-processing on a target binary code to generate a target function fingerprint. Logic 1120 for determining whether a vulnerability exists in the target software through a similarity comparison between the vulnerability signature and the target function fingerprint may be included.

대안적으로, 상기 소프트웨어의 취약점 정보에 대한 전처리를 수행하여 취약점 시그니처를 생성하기 위한 로직은, 상기 취약점 정보에 기초하여 변경에 관련한 바이너리 코드를 식별하기 위한 로직, 상기 식별된 변경에 관련한 바이너리 코드에 기초하여 취약 함수 지문(Vulnerability Function Fingerprint) 및 패치 함수 지문(Patched Function Fingerprint)을 생성하기 위한 로직 및 상기 취약 함수 지문 및 상기 패치 함수 지문을 포함하는 취약점 시그니처를 생성하기 위한 로직을 포함할 수 있다. Alternatively, the logic for generating a vulnerability signature by performing pre-processing on the vulnerability information of the software includes a logic for identifying a binary code related to a change based on the vulnerability information, a logic for identifying a binary code related to the identified change based on the vulnerability information. and logic for generating a vulnerability function fingerprint and a patched function fingerprint based on the vulnerability function fingerprint, and logic for generating a vulnerability signature including the vulnerability function fingerprint and the patch function fingerprint.

대안적으로, 상기 식별된 변경에 관련한 바이너리 코드에 기초하여 취약 함수 지문 및 패치 함수 지문을 생성하기 위한 로직은, 상기 식별된 변경에 관련한 바이너리 코드를 기초하여 실행 가능한 함수 및 바이너리 코드의 속성 정보를 획득하기 위한 로직, 상기 실행 가능한 함수에 기초하여 함수의 속성 정보를 획득하기 위한 로직, 상기 함수 내의 독립적인 코드의 흐름에 기초하여 상기 함수를 하나 이상의 스트랜드(Strand)로 분할하기 위한 로직 및 상기 하나 이상의 스트랜드 각각에 상기 함수의 속성 정보 및 상기 바이너리의 속성 정보를 맵핑하여 상기 취약 함수 지문 및 상기 패치 함수 지문을 생성하기 위한 로직을 포함할 수 있다. Alternatively, the logic for generating a weak function fingerprint and a patch function fingerprint based on the binary code related to the identified change may include: attribute information of the executable function and the binary code based on the binary code related to the identified change. logic for obtaining, logic for obtaining attribute information of a function based on the executable function, logic for dividing the function into one or more strands based on independent code flow within the function, and the one Logic for generating the weak function fingerprint and the patch function fingerprint by mapping the attribute information of the function and the attribute information of the binary to each of the above strands.

대안적으로, 상기 타겟 바이너리 코드에 대한 전처리를 수행하여 타겟 함수 지문을 생성하기 위한 로직은, 상기 타겟 바이너리 코드를 기초하여 실행 가능한 타겟 함수 및 상기 타겟 바이너리 코드의 속성 정보를 획득하기 위한 로직, 상기 타겟 함수에 기초하여 타겟 함수의 속성 정보를 획득하기 위한 로직, 상기 타겟 함수 내의 독립적인 코드의 흐름에 기초하여 상기 타겟 함수를 하나 이상의 스트랜드로 분할하기 위한 로직 및 상기 하나 이상의 스트랜드 각각에 상기 타겟 함수의 속성 정보 및 상기 타겟 바이너리의 속성 정보를 맵핑하여 상기 타겟 함수 지문을 생성하기 위한 로직을 포함할 수 있다.Alternatively, the logic for generating a target function fingerprint by performing preprocessing on the target binary code includes: Logic for acquiring an executable target function based on the target binary code and attribute information of the target binary code; Logic for obtaining property information of a target function based on a target function, logic for dividing the target function into one or more strands based on independent code flow in the target function, and the target function on each of the one or more strands It may include logic for generating the target function fingerprint by mapping the attribute information of the target binary and the attribute information of the target binary.

대안적으로, 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하기 위한 로직은, 상기 취약점 시그니처에 포함된 취약 함수 지문과 상기 타겟 함수 지문 각각에 포함된 함수 간의 제 1 유사도를 산출하기 위한 로직, 상기 취약점 시그니처에 포함된 패치 함수 지문과 상기 타겟 함수 지문 각각에 포함된 함수 간의 제 2 유사도를 산출하기 위한 로직 및 상기 제 1 유사도 및 상기 제 2 유사도의 차이에 기초하여 상기 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하기 위한 로직을 포함할 수 있다.Alternatively, the logic for determining whether a vulnerability exists in the target software through a similarity comparison between the vulnerability signature and the target function fingerprint may include: Logic for calculating the first degree of similarity between functions, logic for calculating the second degree of similarity between the patch function fingerprint included in the vulnerability signature and the function included in each of the target function fingerprint, and the logic for calculating the first degree of similarity and the second degree of similarity Logic for determining whether a vulnerability exists in the target software based on the difference may be included.

대안적으로, 상기 제 1 유사도 및 상기 제 2 유사도의 차이에 기초하여 상기 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하기 위한 로직은, 상기 제 1 유사도와 제 2 유사도의 차이가 사전 결정된 기준을 초과하는 경우, 상기 타겟 소프트웨어에 취약점이 존재하는 것으로 판별하기 위한 로직을 포함할 수 있다.Alternatively, the logic for determining whether a vulnerability exists in the target software based on the difference between the first degree of similarity and the second degree of similarity is that the difference between the first degree of similarity and the second degree of similarity exceeds a predetermined criterion. , logic for determining that a vulnerability exists in the target software may be included.

대안적으로, 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하기 위한 로직은, 상기 취약점 시그니처 및 상기 타겟 함수 지문에 대한 1차 필터링을 수행하여 후보 취약점 시그니처 및 후보 타겟 함수 지문을 선별하기 위한 로직을 포함하고, 그리고 상기 1차 필터링은, 상기 취약점 시그니처 및 상기 타겟 함수 지문 각각에서 취약점에 관련한 코드를 식별하기 위한 필터링일 수 있다.Alternatively, the logic for determining whether a vulnerability exists in the target software through a similarity comparison between the vulnerability signature and the target function fingerprint may include performing primary filtering on the vulnerability signature and the target function fingerprint to perform a candidate vulnerability. and logic for selecting a signature and a candidate target function fingerprint, and the primary filtering may be filtering for identifying a code related to a vulnerability in each of the vulnerability signature and the target function fingerprint.

대안적으로, 상기 취약점 시그니처 및 상기 타겟 함수 지문 간의 유사도 비교를 통해 타겟 소프트웨어에 취약점이 존재하는지 여부를 판별하기 위한 로직은, 상기 후보 취약점 시그니처 및 후보 타겟 함수 지문에 대한 2차 필터링을 수행하여 유사 스트랜드를 식별하기 위한 로직을 더 포함할 수 있다. Alternatively, the logic for determining whether a vulnerability exists in the target software through the similarity comparison between the vulnerability signature and the target function fingerprint may include performing secondary filtering on the candidate vulnerability signature and the candidate target function fingerprint to obtain similarity. It may further include logic to identify the strand.

본 개시의 일 실시예에 따르면 컴퓨팅 장치(100)를 구현하기 위한 로직은 컴퓨팅 프로그램을 구현하기 위한 수단, 회로 또는 모듈에 의하여 구현될 수도 있다.According to an embodiment of the present disclosure, logic for implementing the computing device 100 may be implemented by means, circuits, or modules for implementing a computing program.

당업자들은 추가적으로 여기서 개시된 실시예들과 관련되어 설명된 다양한 예시적 논리적 블록들, 구성들, 모듈들, 회로들, 수단들, 로직들 및 알고리즘 단계들이 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양쪽 모두의 조합들로 구현될 수 있음을 인식해야 한다. 하드웨어 및 소프트웨어의 상호교환성을 명백하게 예시하기 위해, 다양한 예시적 컴포넌트들, 블록들, 구성들, 수단들, 로직들, 모듈들, 회로들, 및 단계들은 그들의 기능성 측면에서 일반적으로 위에서 설명되었다. 그러한 기능성이 하드웨어로 또는 소프트웨어로서 구현되는지 여부는 전반적인 시스템에 부과된 특정 어플리케이션(application) 및 설계 제한들에 달려 있다. 숙련된 기술자들은 각각의 특정 어플리케이션들을 위해 다양한 방법들로 설명된 기능성을 구현할 수 있으나, 그러한 구현의 결정들이 본 개시내용의 영역을 벗어나게 하는 것으로 해석되어서는 안된다.Those skilled in the art will further appreciate that the various illustrative logical blocks, configurations, modules, circuits, means, logics, and algorithm steps described in connection with the embodiments disclosed herein may be combined with electronic hardware, computer software, or combinations of both. It should be recognized that it can be implemented as To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

도 12는 본 개시의 일 실시예들이 구현될 수 있는 예시적인 컴퓨팅 환경에 대한 간략하고 일반적인 개략도를 도시한다.12 depicts a simplified, general schematic diagram of an exemplary computing environment in which embodiments of the present disclosure may be implemented.

본 개시가 일반적으로 하나 이상의 컴퓨터 상에서 실행될 수 있는 컴퓨터 실행가능 명령어와 관련하여 전술되었지만, 당업자라면 본 개시가 기타 프로그램 모듈들과 결합되어 및/또는 하드웨어와 소프트웨어의 조합으로 구현될 수 있다는 것을 잘 알 것이다.Although the present disclosure has been described above generally in the context of computer-executable instructions that may be executed on one or more computers, those skilled in the art will appreciate that the present disclosure may be implemented in combination with other program modules and/or in a combination of hardware and software. will be.

일반적으로, 프로그램 모듈은 특정의 태스크를 수행하거나 특정의 추상 데이터 유형을 구현하는 루틴, 프로시져, 프로그램, 컴포넌트, 데이터 구조, 기타 등등을 포함한다. 또한, 당업자라면 본 개시의 방법이 단일-프로세서 또는 멀티프로세서 컴퓨터 시스템, 미니컴퓨터, 메인프레임 컴퓨터는 물론 퍼스널 컴퓨터, 핸드헬드 컴퓨팅 장치, 마이크로프로세서-기반 또는 프로그램가능 가전 제품, 기타 등등(이들 각각은 하나 이상의 연관된 장치와 연결되어 동작할 수 있음)을 비롯한 다른 컴퓨터 시스템 구성으로 실시될 수 있다는 것을 잘 알 것이다.Generally, program modules include routines, procedures, programs, components, data structures, etc. that perform particular tasks or implement particular abstract data types. In addition, those skilled in the art will appreciate that the methods of the present disclosure are suitable for single-processor or multiprocessor computer systems, minicomputers, mainframe computers as well as personal computers, handheld computing devices, microprocessor-based or programmable consumer electronics, etc. (each of which is It will be appreciated that other computer system configurations may be implemented, including those that may operate in connection with one or more associated devices.

본 개시의 설명된 실시예들은 또한 어떤 태스크들이 통신 네트워크를 통해 연결되어 있는 원격 처리 장치들에 의해 수행되는 분산 컴퓨팅 환경에서 실시될 수 있다. 분산 컴퓨팅 환경에서, 프로그램 모듈은 로컬 및 원격 메모리 저장 장치 둘다에 위치할 수 있다.The described embodiments of the present disclosure may also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

컴퓨터는 통상적으로 다양한 컴퓨터 판독가능 매체를 포함한다. 컴퓨터에 의해 액세스 가능한 매체는 그 어떤 것이든지 컴퓨터 판독가능 매체가 될 수 있고, 컴퓨터 판독가능 매체는 컴퓨터 판독가능 저장 매체 및 컴퓨터 판독가능 전송 매체를 포함할 수 있다. 이러한 컴퓨터 판독가능 저장 매체는 휘발성 및 비휘발성 매체, 이동식 및 비-이동식 매체를 포함한다. 컴퓨터 판독가능 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보를 저장하는 임의의 방법 또는 기술로 구현되는 휘발성 및 비휘발성 매체, 이동식 및 비이동식 매체를 포함한다. 컴퓨터 판독가능 저장 매체는 RAM, ROM, EEPROM, 플래시 메모리 또는 기타 메모리 기술, CD-ROM, DVD(digital video disk) 또는 기타 광 디스크 저장 장치, 자기 카세트, 자기 테이프, 자기 디스크 저장 장치 또는 기타 자기 저장 장치, 또는 컴퓨터에 의해 액세스될 수 있고 원하는 정보를 저장하는 데 사용될 수 있는 임의의 기타 매체를 포함하지만, 이에 한정되지 않는다.Computers typically include a variety of computer-readable media. Any medium accessible by a computer may be a computer-readable medium, and the computer-readable medium may include a computer-readable storage medium and a computer-readable transmission medium. Such computer-readable storage media includes volatile and nonvolatile media, removable and non-removable media. Computer readable storage media includes volatile and nonvolatile media, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. A computer-readable storage medium may be RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital video disk (DVD) or other optical disk storage device, magnetic cassette, magnetic tape, magnetic disk storage device, or other magnetic storage device. device, or any other medium that can be accessed by a computer and used to store the desired information.

컴퓨터 판독가능 전송 매체는 통상적으로 반송파(carrier wave) 또는 기타 전송 메커니즘(transport mechanism)과 같은 피변조 데이터 신호(modulated data signal)에 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터등을 구현하는 정보 전달 매체를 포함한다. 피변조 데이터 신호라는 용어는 신호 내에 정보를 인코딩하도록 그 신호의 특성들 중 하나 이상을 설정 또는 변경시킨 신호를 의미한다. 제한이 아닌 예로서, 컴퓨터 판독가능 전송 매체는 유선 네트워크 또는 직접 배선 접속(direct-wired connection)과 같은 유선 매체, 그리고 음향, RF, 적외선, 기타 무선 매체와 같은 무선 매체를 포함한다. 상술된 매체들 중 임의의 것의 조합도 역시 컴퓨터 판독가능 전송 매체의 범위 안에 포함되는 것으로 한다.Computer-readable transmission media typically embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism. information delivery media. The term modulated data signal means a signal in which one or more of the characteristics of the signal is set or changed so as to encode information in the signal. By way of example, and not limitation, computer-readable transmission media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also intended to be included within the scope of computer-readable transmission media.

컴퓨터(1502)를 포함하는 본 개시의 여러가지 측면들을 구현하는 예시적인 환경(1500)이 나타내어져 있으며, 컴퓨터(1502)는 처리 장치(1504), 시스템 메모리(1506) 및 시스템 버스(1508)를 포함한다. 시스템 버스(1508)는 시스템 메모리(1506)(이에 한정되지 않음)를 비롯한 시스템 컴포넌트들을 처리 장치(1504)에 연결시킨다. 처리 장치(1504)는 다양한 상용 프로세서들 중 임의의 프로세서일 수 있다. 듀얼 프로세서 및 기타 멀티프로세서 아키텍처도 역시 처리 장치(1504)로서 이용될 수 있다.An exemplary environment 1500 implementing various aspects of the present disclosure is shown including a computer 1502 , the computer 1502 including a processing unit 1504 , a system memory 1506 , and a system bus 1508 . do. A system bus 1508 couples system components, including but not limited to system memory 1506 , to the processing unit 1504 . The processing unit 1504 may be any of a variety of commercially available processors. Dual processor and other multiprocessor architectures may also be used as processing unit 1504 .

시스템 버스(1508)는 메모리 버스, 주변장치 버스, 및 다양한 상용 버스 아키텍처 중 임의의 것을 사용하는 로컬 버스에 추가적으로 상호 연결될 수 있는 몇 가지 유형의 버스 구조 중 임의의 것일 수 있다. 시스템 메모리(1506)는 판독 전용 메모리(ROM)(1510) 및 랜덤 액세스 메모리(RAM)(1512)를 포함한다. 기본 입/출력 시스템(BIOS)은 ROM, EPROM, EEPROM 등의 비휘발성 메모리(1510)에 저장되며, 이 BIOS는 시동 중과 같은 때에 컴퓨터(1502) 내의 구성요소들 간에 정보를 전송하는 일을 돕는 기본적인 루틴을 포함한다. RAM(1512)은 또한 데이터를 캐싱하기 위한 정적 RAM 등의 고속 RAM을 포함할 수 있다.The system bus 1508 may be any of several types of bus structures that may be further interconnected to a memory bus, a peripheral bus, and a local bus using any of a variety of commercial bus architectures. System memory 1506 includes read only memory (ROM) 1510 and random access memory (RAM) 1512 . A basic input/output system (BIOS) is stored in non-volatile memory 1510, such as ROM, EPROM, EEPROM, etc., the BIOS is the basic input/output system (BIOS) that helps transfer information between components within computer 1502, such as during startup. contains routines. RAM 1512 may also include high-speed RAM, such as static RAM, for caching data.

컴퓨터(1502)는 또한 내장형 하드 디스크 드라이브(HDD)(1514)(예를 들어, EIDE, SATA)―이 내장형 하드 디스크 드라이브(1514)는 또한 적당한 섀시(도시 생략) 내에서 외장형 용도로 구성될 수 있음―, 자기 플로피 디스크 드라이브(FDD)(1516)(예를 들어, 이동식 디스켓(1518)으로부터 판독을 하거나 그에 기록을 하기 위한 것임), 및 광 디스크 드라이브(1520)(예를 들어, CD-ROM 디스크(1522)를 판독하거나 DVD 등의 기타 고용량 광 매체로부터 판독을 하거나 그에 기록을 하기 위한 것임)를 포함한다. 하드 디스크 드라이브(1514), 자기 디스크 드라이브(1516) 및 광 디스크 드라이브(1520)는 각각 하드 디스크 드라이브 인터페이스(1524), 자기 디스크 드라이브 인터페이스(1526) 및 광 드라이브 인터페이스(1528)에 의해 시스템 버스(1508)에 연결될 수 있다. 외장형 드라이브 구현을 위한 인터페이스(1524)는 USB(Universal Serial Bus) 및 IEEE 1394 인터페이스 기술 중 적어도 하나 또는 그 둘다를 포함한다.The computer 1502 may also be configured with an internal hard disk drive (HDD) 1514 (eg, EIDE, SATA) - this internal hard disk drive 1514 may also be configured for external use within a suitable chassis (not shown). Yes—a magnetic floppy disk drive (FDD) 1516 (eg, for reading from or writing to removable diskette 1518), and an optical disk drive 1520 (eg, a CD-ROM) for reading from, or writing to, disk 1522 or other high capacity optical media such as DVDs. Hard disk drive 1514 , magnetic disk drive 1516 , and optical disk drive 1520 are connected to a system bus 1508 by a hard disk drive interface 1524 , a magnetic disk drive interface 1526 , and an optical drive interface 1528 , respectively. ) can be connected to The interface 1524 for implementing an external drive includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.

이들 드라이브 및 그와 연관된 컴퓨터 판독가능 매체는 데이터, 데이터 구조, 컴퓨터 실행가능 명령어, 기타 등등의 비휘발성 저장을 제공한다. 컴퓨터(1502)의 경우, 드라이브 및 매체는 임의의 데이터를 적당한 디지털 형식으로 저장하는 것에 대응한다. 상기에서의 컴퓨터 판독가능 매체에 대한 설명이 HDD, 이동식 자기 디스크, 및 CD 또는 DVD 등의 이동식 광 매체를 언급하고 있지만, 당업자라면 집 드라이브(zip drive), 자기 카세트, 플래쉬 메모리 카드, 카트리지, 기타 등등의 컴퓨터에 의해 판독가능한 다른 유형의 매체도 역시 예시적인 운영 환경에서 사용될 수 있으며 또 임의의 이러한 매체가 본 개시의 방법들을 수행하기 위한 컴퓨터 실행가능 명령어를 포함할 수 있다는 것을 잘 알 것이다.These drives and their associated computer-readable media provide non-volatile storage of data, data structures, computer-executable instructions, and the like. For computer 1502, drives and media correspond to storing any data in a suitable digital format. Although the description of computer readable media above refers to HDDs, removable magnetic disks, and removable optical media such as CDs or DVDs, those skilled in the art will use zip drives, magnetic cassettes, flash memory cards, cartridges, etc. It will be appreciated that other tangible computer-readable media such as etc. may also be used in the exemplary operating environment and any such media may include computer-executable instructions for performing the methods of the present disclosure.

운영 체제(1530), 하나 이상의 애플리케이션 프로그램(1532), 기타 프로그램 모듈(1534) 및 프로그램 데이터(1536)를 비롯한 다수의 프로그램 모듈이 드라이브 및 RAM(1512)에 저장될 수 있다. 운영 체제, 애플리케이션, 모듈 및/또는 데이터의 전부 또는 그 일부분이 또한 RAM(1512)에 캐싱될 수 있다. 본 개시가 여러가지 상업적으로 이용가능한 운영 체제 또는 운영 체제들의 조합에서 구현될 수 있다는 것을 잘 알 것이다.A number of program modules may be stored in drives and RAM 1512 , including operating system 1530 , one or more application programs 1532 , other program modules 1534 , and program data 1536 . All or portions of the operating system, applications, modules, and/or data may also be cached in RAM 1512 . It will be appreciated that the present disclosure may be implemented in various commercially available operating systems or combinations of operating systems.

사용자는 하나 이상의 유선/무선 입력 장치, 예를 들어, 키보드(1538) 및 마우스(1540) 등의 포인팅 장치를 통해 컴퓨터(1502)에 명령 및 정보를 입력할 수 있다. 기타 입력 장치(도시 생략)로는 마이크, IR 리모콘, 조이스틱, 게임 패드, 스타일러스 펜, 터치 스크린, 기타 등등이 있을 수 있다. 이들 및 기타 입력 장치가 종종 시스템 버스(1508)에 연결되어 있는 입력 장치 인터페이스(1542)를 통해 처리 장치(1504)에 연결되지만, 병렬 포트, IEEE 1394 직렬 포트, 게임 포트, USB 포트, IR 인터페이스, 기타 등등의 기타 인터페이스에 의해 연결될 수 있다.A user may enter commands and information into the computer 1502 via one or more wired/wireless input devices, for example, a pointing device such as a keyboard 1538 and a mouse 1540 . Other input devices (not shown) may include a microphone, IR remote control, joystick, game pad, stylus pen, touch screen, and the like. Although these and other input devices are often connected to the processing unit 1504 through an input device interface 1542 that is connected to the system bus 1508, parallel ports, IEEE 1394 serial ports, game ports, USB ports, IR interfaces, It may be connected by other interfaces, etc.

모니터(1544) 또는 다른 유형의 디스플레이 장치도 역시 비디오 어댑터(1546) 등의 인터페이스를 통해 시스템 버스(1508)에 연결된다. 모니터(1544)에 부가하여, 컴퓨터는 일반적으로 스피커, 프린터, 기타 등등의 기타 주변 출력 장치(도시 생략)를 포함한다.A monitor 1544 or other type of display device is also coupled to the system bus 1508 via an interface, such as a video adapter 1546 . In addition to the monitor 1544, the computer typically includes other peripheral output devices (not shown), such as speakers, printers, and the like.

컴퓨터(1502)는 유선 및/또는 무선 통신을 통한 원격 컴퓨터(들)(1548) 등의 하나 이상의 원격 컴퓨터로의 논리적 연결을 사용하여 네트워크화된 환경에서 동작할 수 있다. 원격 컴퓨터(들)(1548)는 워크스테이션, 서버 컴퓨터, 라우터, 퍼스널 컴퓨터, 휴대용 컴퓨터, 마이크로프로세서-기반 오락 기기, 피어 장치 또는 기타 통상의 네트워크 노드일 수 있으며, 일반적으로 컴퓨터(1502)에 대해 기술된 구성요소들 중 다수 또는 그 전부를 포함하지만, 간략함을 위해, 메모리 저장 장치(1550)만이 도시되어 있다. 도시되어 있는 논리적 연결은 근거리 통신망(LAN)(1552) 및/또는 더 큰 네트워크, 예를 들어, 원거리 통신망(WAN)(1554)에의 유선/무선 연결을 포함한다. 이러한 LAN 및 WAN 네트워킹 환경은 사무실 및 회사에서 일반적인 것이며, 인트라넷 등의 전사적 컴퓨터 네트워크(enterprise-wide computer network)를 용이하게 해주며, 이들 모두는 전세계 컴퓨터 네트워크, 예를 들어, 인터넷에 연결될 수 있다.Computer 1502 may operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1548 via wired and/or wireless communications. The remote computer(s) 1548 may be a workstation, server computer, router, personal computer, portable computer, microprocessor-based entertainment device, peer device, or other common network node, and is generally Although including many or all of the components described, only memory storage device 1550 is shown for simplicity. The logical connections shown include wired/wireless connections to a local area network (LAN) 1552 and/or a larger network, eg, a wide area network (WAN) 1554 . Such LAN and WAN networking environments are common in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which can be connected to a worldwide computer network, for example, the Internet.

LAN 네트워킹 환경에서 사용될 때, 컴퓨터(1502)는 유선 및/또는 무선 통신 네트워크 인터페이스 또는 어댑터(1556)를 통해 로컬 네트워크(1552)에 연결된다. 어댑터(1556)는 LAN(1552)에의 유선 또는 무선 통신을 용이하게 해줄 수 있으며, 이 LAN(1552)은 또한 무선 어댑터(1556)와 통신하기 위해 그에 설치되어 있는 무선 액세스 포인트를 포함하고 있다. WAN 네트워킹 환경에서 사용될 때, 컴퓨터(1502)는 모뎀(1558)을 포함할 수 있거나, WAN(1554) 상의 통신 서버에 연결되거나, 또는 인터넷을 통하는 등, WAN(1554)을 통해 통신을 설정하는 기타 수단을 갖는다. 내장형 또는 외장형 및 유선 또는 무선 장치일 수 있는 모뎀(1558)은 직렬 포트 인터페이스(1542)를 통해 시스템 버스(1508)에 연결된다. 네트워크화된 환경에서, 컴퓨터(1502)에 대해 설명된 프로그램 모듈들 또는 그의 일부분이 원격 메모리/저장 장치(1550)에 저장될 수 있다. 도시된 네트워크 연결이 예시적인 것이며 컴퓨터들 사이에 통신 링크를 설정하는 기타 수단이 사용될 수 있다는 것을 잘 알 것이다.When used in a LAN networking environment, the computer 1502 is connected to the local network 1552 through a wired and/or wireless communication network interface or adapter 1556 . Adapter 1556 can facilitate wired or wireless communication to LAN 1552 , which also includes a wireless access point installed therein for communicating with wireless adapter 1556 . When used in a WAN networking environment, the computer 1502 may include a modem 1558 , connected to a communication server on the WAN 1554 , or otherwise establishing communications over the WAN 1554 , such as over the Internet. have the means A modem 1558 , which may be internal or external and a wired or wireless device, is coupled to the system bus 1508 via a serial port interface 1542 . In a networked environment, program modules described for computer 1502 , or portions thereof, may be stored in remote memory/storage device 1550 . It will be appreciated that the network connections shown are exemplary and other means of establishing a communication link between the computers may be used.

컴퓨터(1502)는 무선 통신으로 배치되어 동작하는 임의의 무선 장치 또는 개체, 예를 들어, 프린터, 스캐너, 데스크톱 및/또는 휴대용 컴퓨터, PDA(portable data assistant), 통신 위성, 무선 검출가능 태그와 연관된 임의의 장비 또는 장소, 및 전화와 통신을 하는 동작을 한다. 이것은 적어도 Wi-Fi 및 블루투스 무선 기술을 포함한다. 따라서, 통신은 종래의 네트워크에서와 같이 미리 정의된 구조이거나 단순하게 적어도 2개의 장치 사이의 애드혹 통신(ad hoc communication)일 수 있다.The computer 1502 may be associated with any wireless device or object that is deployed and operates in wireless communication, for example, a printer, scanner, desktop and/or portable computer, portable data assistant (PDA), communication satellite, wireless detectable tag. It operates to communicate with any device or place, and phone. This includes at least Wi-Fi and Bluetooth wireless technologies. Accordingly, the communication may be a predefined structure as in a conventional network or may simply be an ad hoc communication between at least two devices.

Wi-Fi(Wireless Fidelity)는 유선 없이도 인터넷 등으로의 연결을 가능하게 해준다. Wi-Fi는 이러한 장치, 예를 들어, 컴퓨터가 실내에서 및 실외에서, 즉 기지국의 통화권 내의 아무 곳에서나 데이터를 전송 및 수신할 수 있게 해주는 셀 전화와 같은 무선 기술이다. Wi-Fi 네트워크는 안전하고 신뢰성있으며 고속인 무선 연결을 제공하기 위해 IEEE 802.11(a,b,g, 기타)이라고 하는 무선 기술을 사용한다. 컴퓨터를 서로에, 인터넷에 및 유선 네트워크(IEEE 802.3 또는 이더넷을 사용함)에 연결시키기 위해 Wi-Fi가 사용될 수 있다. Wi-Fi 네트워크는 비인가 2.4 및 5 GHz 무선 대역에서, 예를 들어, 11Mbps(802.11a) 또는 54 Mbps(802.11b) 데이터 레이트로 동작하거나, 양 대역(듀얼 대역)을 포함하는 제품에서 동작할 수 있다.Wi-Fi (Wireless Fidelity) makes it possible to connect to the Internet, etc. without a wire. Wi-Fi is a wireless technology such as cell phones that allows these devices, eg, computers, to transmit and receive data indoors and outdoors, ie anywhere within range of a base station. Wi-Fi networks use a radio technology called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, and high-speed wireless connections. Wi-Fi can be used to connect computers to each other, to the Internet, and to wired networks (using IEEE 802.3 or Ethernet). Wi-Fi networks may operate in unlicensed 2.4 and 5 GHz radio bands, for example, at 11 Mbps (802.11a) or 54 Mbps (802.11b) data rates, or in products that include both bands (dual band). have.

본 개시의 기술 분야에서 통상의 지식을 가진 자는 여기에 개시된 실시예들과 관련하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 프로세서들, 수단들, 회로들 및 알고리즘 단계들이 전자 하드웨어, (편의를 위해, 여기에서 "소프트웨어"로 지칭되는) 다양한 형태들의 프로그램 또는 설계 코드 또는 이들 모두의 결합에 의해 구현될 수 있다는 것을 이해할 것이다. 하드웨어 및 소프트웨어의 이러한 상호 호환성을 명확하게 설명하기 위해, 다양한 예시적인 컴포넌트들, 블록들, 모듈들, 회로들 및 단계들이 이들의 기능과 관련하여 위에서 일반적으로 설명되었다. 이러한 기능이 하드웨어 또는 소프트웨어로서 구현되는지 여부는 특정한 애플리케이션 및 전체 시스템에 대하여 부과되는 설계 제약들에 따라 좌우된다. 본 개시의 기술 분야에서 통상의 지식을 가진 자는 각각의 특정한 애플리케이션에 대하여 다양한 방식들로 설명된 기능을 구현할 수 있으나, 이러한 구현 결정들은 본 개시의 범위를 벗어나는 것으로 해석되어서는 안 될 것이다.Those of ordinary skill in the art of the present disclosure will recognize that the various illustrative logical blocks, modules, processors, means, circuits, and algorithm steps described in connection with the embodiments disclosed herein include electronic hardware, (convenience For this purpose, it will be understood that it may be implemented by various forms of program or design code (referred to herein as "software") or a combination of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. A person skilled in the art of the present disclosure may implement the described functionality in various ways for each specific application, but such implementation decisions should not be interpreted as a departure from the scope of the present disclosure.

여기서 제시된 다양한 실시예들은 방법, 장치, 또는 표준 프로그래밍 및/또는 엔지니어링 기술을 사용한 제조 물품(article)으로 구현될 수 있다. 용어 "제조 물품"은 임의의 컴퓨터-판독가능 장치로부터 액세스 가능한 컴퓨터 프로그램, 캐리어, 또는 매체(media)를 포함한다. 예를 들어, 컴퓨터-판독가능 매체는 자기 저장 장치(예를 들면, 하드 디스크, 플로피 디스크, 자기 스트립, 등), 광학 디스크(예를 들면, CD, DVD, 등), 스마트 카드, 및 플래쉬 메모리 장치(예를 들면, EEPROM, 카드, 스틱, 키 드라이브, 등)를 포함하지만, 이들로 제한되는 것은 아니다. 또한, 여기서 제시되는 다양한 저장 매체는 정보를 저장하기 위한 하나 이상의 장치 및/또는 다른 기계-판독가능한 매체를 포함한다. 용어 "기계-판독가능 매체"는 명령(들) 및/또는 데이터를 저장, 보유, 및/또는 전달할 수 있는 무선 채널 및 다양한 다른 매체를 포함하지만, 이들로 제한되는 것은 아니다. The various embodiments presented herein may be implemented as methods, apparatus, or articles of manufacture using standard programming and/or engineering techniques. The term “article of manufacture” includes a computer program, carrier, or media accessible from any computer-readable device. For example, computer-readable media include magnetic storage devices (eg, hard disks, floppy disks, magnetic strips, etc.), optical disks (eg, CDs, DVDs, etc.), smart cards, and flash memory. devices (eg, EEPROMs, cards, sticks, key drives, etc.). Also, various storage media presented herein include one or more devices and/or other machine-readable media for storing information. The term “machine-readable medium” includes, but is not limited to, wireless channels and various other media capable of storing, holding, and/or carrying instruction(s) and/or data.

제시된 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조는 예시적인 접근들의 일례임을 이해하도록 한다. 설계 우선순위들에 기반하여, 본 개시의 범위 내에서 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조가 재배열될 수 있다는 것을 이해하도록 한다. 첨부된 방법 청구항들은 샘플 순서로 다양한 단계들의 엘리먼트들을 제공하지만 제시된 특정한 순서 또는 계층 구조에 한정되는 것을 의미하지는 않는다.It is understood that the specific order or hierarchy of steps in the presented processes is an example of exemplary approaches. Based on design priorities, it is understood that the specific order or hierarchy of steps in the processes may be rearranged within the scope of the present disclosure. The appended method claims present elements of the various steps in a sample order, but are not meant to be limited to the specific order or hierarchy presented.

제시된 실시예들에 대한 설명은 임의의 본 개시의 기술 분야에서 통상의 지식을 가진 자가 본 개시를 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 개시의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이며, 여기에 정의된 일반적인 원리들은 본 개시의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 개시는 여기에 제시된 실시예들로 한정되는 것이 아니라, 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다.The description of the presented embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the embodiments presented herein, but is to be construed in the widest scope consistent with the principles and novel features presented herein.

Claims

A computer program stored in a computer-readable storage medium, wherein, when the computer program is executed on one or more processors, the one or more processors perform the following operations for determining a vulnerability based on a binary code of software, the actions are,
generating a vulnerability signature by performing pre-processing on vulnerability information of software;
generating a target function fingerprint by performing preprocessing on the target binary code; and
determining whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint;
including,
The operation of determining whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint includes:
selecting a candidate vulnerability signature and a candidate target function fingerprint by performing primary filtering on the vulnerability signature and the target function fingerprint;
including, and
The first filtering is
Filtering for identifying a code related to a vulnerability in each of the vulnerability signature and the target function fingerprint,
A computer program stored on a computer-readable storage medium.

The method of claim 1,
The vulnerability information is
Vulnerability patch information in the form of source code, including binary code related to the vulnerability and the binary code in which the vulnerability is patched,
A computer program stored on a computer-readable storage medium.

The method of claim 1,
The operation of generating a vulnerability signature by performing pre-processing of the vulnerability information of the software includes:
identifying a binary code related to a change based on the vulnerability information;
generating a vulnerability function fingerprint and a patched function fingerprint based on the binary code related to the identified change; and
generating a vulnerability signature including the vulnerable function fingerprint and the patch function fingerprint;
comprising,
A computer program stored on a computer-readable storage medium.

4. The method of claim 3,
The operation of generating a weak function fingerprint and a patch function fingerprint based on the binary code related to the identified change comprises:
acquiring property information of an executable function and a binary code based on the binary code related to the identified change;
obtaining property information of a function based on the executable function;
dividing the function into one or more strands based on independent code flows within the function; and
generating the weak function fingerprint and the patch function fingerprint by mapping the attribute information of the function and the attribute information of the binary to each of the one or more strands;
comprising,
A computer program stored on a computer-readable storage medium.

5. The method of claim 4,
The strand is
A set of instructions required to calculate the value of one variable, which is a standard unit for the similarity comparison,
A computer program stored on a computer-readable storage medium.

The method of claim 1,
The operation of generating a target function fingerprint by performing pre-processing on the target binary code,
obtaining an executable target function and attribute information of the target binary code based on the target binary code;
obtaining attribute information of a target function based on the target function;
dividing the target function into one or more strands based on independent code flows within the target function; and
generating the target function fingerprint by mapping the attribute information of the target function and the attribute information of the target binary to each of the one or more strands;
comprising,
A computer program stored on a computer-readable storage medium.

The method of claim 1,
The similarity comparison between the vulnerability signature and the target function fingerprint is
An N-Gram similarity comparison that divides each of the function fingerprint corresponding to the vulnerability signature and the target function fingerprint into predetermined units, and compares how many units among all the divided units are similar;
A computer program stored on a computer-readable storage medium.

The method of claim 1,
The operation of determining whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint includes:
calculating a first similarity between the weak function fingerprint included in the vulnerability signature and a function included in each of the target function fingerprints;
calculating a second similarity between a patch function fingerprint included in the vulnerability signature and a function included in each of the target function fingerprints; and
determining whether a vulnerability exists in the target software based on a difference between the first degree of similarity and the second degree of similarity;
comprising,
A computer program stored on a computer-readable storage medium.

9. The method of claim 8,
The first degree of similarity has a positive correlation with a probability that a vulnerability is determined in the software, and the second degree of similarity has a negative correlation with a probability that a vulnerability is determined in the software.
A computer program stored on a computer-readable storage medium.

9. The method of claim 8,
Determining whether a vulnerability exists in the target software based on a difference between the first degree of similarity and the second degree of similarity includes:
determining that a vulnerability exists in the target software when a difference between the first degree of similarity and the second degree of similarity exceeds a predetermined criterion;
comprising,
A computer program stored on a computer-readable storage medium.

delete

The method of claim 1,
The operation of determining whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint includes:
identifying similar strands by performing secondary filtering on the candidate vulnerability signature and candidate target function fingerprint;
comprising,
A computer program stored on a computer-readable storage medium.

A method for determining a vulnerability based on a target binary code of software executed in a processor of a computing device, the method comprising:
generating, by a processor included in the computing device, a vulnerability signature by performing pre-processing on vulnerability information of software;
generating, by the processor, a target function fingerprint by performing preprocessing on the target binary code; and
determining, by the processor, whether a vulnerability exists in the target software through a similarity comparison between the vulnerability signature and the target function fingerprint;
including,
The step of the processor determining whether a vulnerability exists in the target software through a similarity comparison between the vulnerability signature and the target function fingerprint,
selecting a candidate vulnerability signature and a candidate target function fingerprint by performing primary filtering on the vulnerability signature and the target function fingerprint;
including, and
The first filtering is
Filtering for identifying a code related to a vulnerability in each of the vulnerability signature and the target function fingerprint,
A method for determining a vulnerability based on the binary code of software running on a processor of a computing device.

A computing device for determining vulnerabilities based on the binary code of software,
a processor including one or more cores;
a memory including program codes executable by the processor; and
a network unit for transmitting and receiving data to and from the client;
including, and
The processor is
Performs pre-processing of software vulnerability information to generate vulnerability signatures,
Perform preprocessing on the target binary code to generate a target function fingerprint, and
determining whether a vulnerability exists in the target software by comparing the similarity between the vulnerability signature and the target function fingerprint;
When it is determined whether a vulnerability exists in target software by comparing the similarity between the vulnerability signature and the target function fingerprint, primary filtering is performed on the vulnerability signature and the target function fingerprint to obtain a candidate vulnerability signature and a candidate target function. select fingerprints, and
The first filtering is
Filtering for identifying a code related to a vulnerability in each of the vulnerability signature and the target function fingerprint,
A computing device for determining vulnerabilities based on the binary code of software.