KR102546946B1

KR102546946B1 - Method and System for Automatically Analyzing Bugs in Cellular Baseband Software using Comparative Analysis based on Cellular Specifications

Info

Publication number: KR102546946B1
Application number: KR1020210168382A
Authority: KR
Inventors: 김용대; 김은수; 김동관; 박철준; 윤인수
Original assignee: 한국과학기술원
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2023-06-26
Also published as: KR20230080860A

Abstract

이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법 및 시스템이 제시된다. 본 발명의 일 실시예에 따른 컴퓨터 장치에 의해 수행되는 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법은, 베이스밴드의 특성을 네트워크 통신을 위한 모뎀으로 활용하여 베이스밴드 소프트웨어를 셀룰러 규격과 비교 분석하는 단계를 포함하고, 상기 셀룰러 규격의 메시지 구조와 비교 분석하는 단계는, 상기 셀룰러 규격의 표준화된 메시지 구조를 활용하여, 상기 베이스밴드 소프트웨어에서 구현된 메시지 구조를 자동으로 검사할 수 있다. A method and system for automatically comparing and analyzing mobile communication baseband software based on mobile communication standard documents are presented. A method of automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document performed by a computer device according to an embodiment of the present invention utilizes the characteristics of the baseband as a modem for network communication and uses the baseband The step of comparing and analyzing the software with the cellular standard, and the step of comparing and analyzing the message structure of the cellular standard, utilizes the standardized message structure of the cellular standard, and automatically converts the message structure implemented in the baseband software. can be inspected

Description

Method and System for Automatically Analyzing Bugs in Cellular Baseband Software using Comparative Analysis based on Cellular Specifications}

본 발명의 실시예들은 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법 및 시스템에 관한 것으로, 더욱 상세하게는 베이스밴드 펌웨어와 셀룰러 규격의 비교 분석을 수행하는 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법 및 시스템에 관한 것이다. Embodiments of the present invention relate to a method and system for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document, and more particularly, to a mobile communication standard for performing comparative analysis between baseband firmware and cellular specifications. It relates to a method and system for automatically comparing and analyzing mobile communication baseband software based on documents.

스마트폰과 같은 셀룰러 장치의 베이스밴드 프로세서(Baseband Processor, BP)는 셀룰러 네트워크에서 중요한 역할을 한다. 비록 사용자들이 주로 애플리케이션 프로세서에서 실행되는 사용자 애플리케이션의 인터페이스와 상호작용하지만, 모든 애플리케이션 데이터는 그것의 무선 인터페이스를 사용하기 위해 BP에 의해 전송된다. BP는 소프트웨어를 실행하는데, 이것은 일반적으로 무선 통신 관리를 위한 실시간 운영 체제이다. 따라서, 그것은 낮은 수준의 디지털 신호 처리와 복잡한 셀룰러 프로토콜 스택(cellular protocol stack)을 포함한다. 사용자에게 원활한 네트워크 서비스를 제공하기 위해, 베이스밴드 소프트웨어는 계층 3(L3)에서 수많은 셀룰러 제어부 메시지를 사용하여 코어 네트워크와 지속적으로 통신한다.A baseband processor (BP) of a cellular device such as a smartphone plays an important role in a cellular network. Although users primarily interact with the interface of the user application running on the application processor, all application data is transmitted by the BP to use its air interface. The BP runs software, which is usually a real-time operating system for managing radio communications. Thus, it involves low-level digital signal processing and complex cellular protocol stacks. In order to provide users with seamless network services, the baseband software continuously communicates with the core network at layer 3 (L3) using numerous cellular control messages.

베이스밴드 소프트웨어는 악용될 경우 전송 데이터를 모니터링하고 수정하는 데 사용할 수 있기 때문에 유혹적인 공격 대상이다. 따라서 종래에는 특히 L3 프로토콜의 보안을 분석하기 위한 몇 가지 접근 방식을 제안했다. 구현 시 보안 버그를 발견하기 위해 종래에는 fuzzer를 사용하여 SMS 또는 셀 브로드캐스트 메시지와 같은 특정 프로토콜을 동적으로 분석하거나, 애드혹(ad-hoc) 방식으로 베이스밴드 소프트웨어의 작은 부분을 수동으로 검사했다.Baseband software is an enticing target because, if exploited, it can be used to monitor and modify transmitted data. Therefore, in the prior art, several approaches have been proposed to analyze the security of L3 protocols in particular. Implementations have traditionally used fuzzers to dynamically analyze specific protocols, such as SMS or cell broadcast messages, or manually inspected small pieces of baseband software ad-hoc to find security bugs.

이러한 접근 방식은 주로 세 가지 기술적 문제로 어려움을 겪는다. 즉, 베이스밴드 펌웨어(baseband firmware)의 모호성, 수동 분석의 제한된 적용 가능성, 그리고 자동화의 어려움이다. 첫째, 공급업체가 세부 사항을 공개하기를 꺼려하기 때문에 베이스밴드 펌웨어의 구조는 불분명하다. 둘째, 이러한 불명확함을 발견하기 위해서는 수동 분석이 불가피하며, 이는 다양한 베이스밴드 모델 또는 버전에 걸쳐 수많은 기능(즉, 90K 이상)을 조사하기 위한 상당한 반복적인 노력이 필요하다. 셋째, 자동화는 필요하기는 하지만 중요한 것은 또한 아니다. 베이스밴드 소프트웨어의 크기는 매우 크고(즉, 수십 MB) 셀룰러 프로토콜은 정적으로 분석되거나 fuzzer에 의해 동적으로 트리거될 수 없는 수많은 복잡한 상태를 포함한다. 또한, 버그를 식별하려면 프로그램 충돌이나 눈에 띄는 비정상적인 동작과 같은 명시적인 오라클(oracle)이 필요하므로 몇 가지 버그 유형으로 제한된다. 따라서 기존 접근 방식은 단일 공급업체 내에서 두 가지 장치 모델 또는 버전만 분석할 수 있다. This approach suffers primarily from three technical problems. Namely, the ambiguity of the baseband firmware, the limited applicability of manual analysis, and the difficulty of automation. First, the structure of the baseband firmware is unclear as vendors are reluctant to disclose details. Second, manual analysis is unavoidable to uncover these ambiguities, which requires a significant iterative effort to examine numerous features (i.e., 90K or more) across different baseband models or versions. Third, automation is necessary but not important either. The size of the baseband software is very large (i.e. tens of megabytes) and cellular protocols contain numerous complex states that cannot be statically analyzed or dynamically triggered by a fuzzer. Also, identifying a bug requires an explicit oracle, such as a program crash or noticeable unusual behavior, so it is limited to a few bug types. Therefore, existing approaches can analyze only two device models or versions within a single vendor.

N. Golde and D. Komaromy, "Breaking Band: reverse engineering and exploiting the shannon baseband," REcon, 2016. N. Golde and D. Komaromy, "Breaking Band: reverse engineering and exploiting the shannon baseband," REcon, 2016.

본 발명의 실시예들은 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법 및 시스템에 관하여 기술하며, 보다 구체적으로 셀룰러 베이스밴드 소프트웨어의 버그를 발견하기 위한 베이스밴드 펌웨어와 셀룰러 규격의 비교 분석을 자동으로 수행하는 기술을 제공한다. Embodiments of the present invention describe a method and system for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document, and more specifically, baseband firmware and cellular It provides technology to automatically perform comparative analysis of standards.

본 발명의 실시예들은 규격의 표준화된 메시지 구조를 활용하여 베이스밴드 소프트웨어에서 구현된 메시지 구조를 체계적으로 검사함으로써, 추출된 메시지 구조를 구문론 및 의미론적으로 규격과 비교 분석한 후 불일치를 보고하는, 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법 및 시스템을 제공하는데 있다. Embodiments of the present invention systematically examine the message structure implemented in the baseband software by utilizing the standardized message structure of the standard, compare and analyze the extracted message structure with the standard syntactically and semantically, and then report the discrepancy, It is to provide a method and system for automatically comparing and analyzing mobile communication baseband software based on mobile communication standard documents.

본 발명의 일 실시예에 따른 컴퓨터 장치에 의해 수행되는 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법은, 베이스밴드의 특성을 네트워크 통신을 위한 모뎀으로 활용하여 베이스밴드 소프트웨어를 셀룰러 규격과 비교 분석하는 단계를 포함하고, 상기 셀룰러 규격의 메시지 구조와 비교 분석하는 단계는, 상기 셀룰러 규격의 표준화된 메시지 구조를 활용하여, 상기 베이스밴드 소프트웨어에서 구현된 메시지 구조를 자동으로 검사할 수 있다. A method of automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document performed by a computer device according to an embodiment of the present invention utilizes the characteristics of the baseband as a modem for network communication and uses the baseband The step of comparing and analyzing the software with the cellular standard, and the step of comparing and analyzing the message structure of the cellular standard, utilizes the standardized message structure of the cellular standard, and automatically converts the message structure implemented in the baseband software. can be inspected

상기 비교 분석을 통해 상기 베이스밴드 소프트웨어의 메시지 구조에서 상기 셀룰러 규격을 준수하지 않는 불일치를 식별하는 단계를 더 포함할 수 있다. The method may further include identifying discrepancies that do not conform to the cellular standard in a message structure of the baseband software through the comparative analysis.

상기 베이스밴드 소프트웨어를 셀룰러 규격과 비교 분석하는 단계는, 상기 베이스밴드 소프트웨어를 분석하여 메시지 디코더를 식별하고, 상기 베이스밴드 소프트웨어에 내장된 메시지 구조를 추출하는 단계; 및 추출된 상기 베이스밴드 소프트웨어에 내장된 메시지 구조를 상기 셀룰러 규격의 메시지 구조와 비교 분석하는 단계를 포함할 수 있다. The step of comparing and analyzing the baseband software with the cellular standard may include: analyzing the baseband software to identify a message decoder and extracting a message structure embedded in the baseband software; and comparing and analyzing the extracted message structure embedded in the baseband software with the cellular standard message structure.

상기 셀룰러 규격의 메시지 구조와 비교 분석하는 단계는, 상기 내장된 프로토콜 구조가 구문적으로 상기 셀룰러 규격과 동일한지 여부를 비교 분석할 수 있다. In the comparing and analyzing the message structure of the cellular standard, whether or not the built-in protocol structure is syntactically identical to the cellular standard may be compared and analyzed.

상기 셀룰러 규격의 메시지 구조와 비교 분석하는 단계는, 상기 메시지 디코더의 기능이 기본 논리가 의미론적으로 기호 실행을 활용하는 상기 셀룰러 규격을 준수하는지 여부를 비교 분석할 수 있다. In the step of comparing and analyzing the message structure of the cellular standard, whether or not the function of the message decoder complies with the cellular standard in which a basic logic semantically utilizes symbol execution may be compared and analyzed.

상기 불일치를 분석하여 기능 또는 보안 버그를 생성할 수 있는지 확인하는 단계를 더 포함할 수 있다. The method may further include analyzing the inconsistency to determine whether a functional or security bug may be created.

상기 불일치를 식별하는 단계는, 상기 베이스밴드 소프트웨어의 메시지 구조에서 상기 셀룰러 규격의 메시지 구조에 매핑되지 않은 나머지 정보 요소(information element, IE)를 누락된 불일치 또는 알 수 없는 불일치로 보고할 수 있다. In the step of identifying mismatches, remaining information elements (IEs) not mapped to the message structure of the cellular standard in the message structure of the baseband software may be reported as missing mismatches or unknown mismatches.

상기 불일치를 식별하는 단계는, 다수개의 상기 베이스밴드 소프트웨어의 계층 3(L3) 메시지에서 불일치를 자동으로 식별하고, 분석을 위한 버그 가능성이 있는 지점을 표시할 수 있다. The step of identifying inconsistencies may automatically identify inconsistencies in a plurality of layer 3 (L3) messages of the baseband software and mark possible buggy points for analysis.

본 발명의 다른 실시예에 따른 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 시스템은, 베이스밴드의 특성을 네트워크 통신을 위한 모뎀으로 활용하여 베이스밴드 소프트웨어를 셀룰러 규격과 비교 분석하는 비교 분석부를 포함하고, 상기 비교 분석부는, 상기 셀룰러 규격의 표준화된 메시지 구조를 활용하여, 상기 베이스밴드 소프트웨어에서 구현된 메시지 구조를 자동으로 검사할 수 있다. A system for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document according to another embodiment of the present invention compares baseband software with cellular standards by utilizing the characteristics of baseband as a modem for network communication. and a comparison and analysis unit that performs analysis, and the comparison and analysis unit can automatically check a message structure implemented in the baseband software by utilizing a standardized message structure of the cellular standard.

상기 비교 분석을 통해 상기 베이스밴드 소프트웨어의 메시지 구조에서 상기 셀룰러 규격을 준수하지 않는 불일치를 식별하는 불일치 식별부를 더 포함할 수 있다. The method may further include a discrepancy identification unit that identifies discrepancies that do not comply with the cellular standard in the message structure of the baseband software through the comparative analysis.

상기 비교 분석부는, 상기 베이스밴드 소프트웨어를 분석하여 메시지 디코더를 식별하고, 상기 베이스밴드 소프트웨어에 내장된 메시지 구조를 추출하고, 추출된 상기 베이스밴드 소프트웨어에 내장된 메시지 구조를 상기 셀룰러 규격의 메시지 구조와 비교 분석할 수 있다. The comparison and analysis unit analyzes the baseband software to identify a message decoder, extracts a message structure embedded in the baseband software, and converts the extracted message structure embedded in the baseband software to a message structure of the cellular standard. can be compared and analyzed.

상기 비교 분석부는, 상기 내장된 프로토콜 구조가 구문적으로 상기 셀룰러 규격과 동일한지 여부를 비교 분석할 수 있다. The comparison and analysis unit may compare and analyze whether the built-in protocol structure is syntactically identical to the cellular standard.

상기 비교 분석부는, 상기 메시지 디코더의 기능이 기본 논리가 의미론적으로 기호 실행을 활용하는 상기 셀룰러 규격을 준수하는지 여부를 비교 분석할 수 있다. The comparison and analysis unit may compare and analyze whether the function of the message decoder complies with the cellular standard in which a basic logic semantically utilizes symbol execution.

상기 불일치를 분석하여 기능 또는 보안 버그를 생성할 수 있는지 확인하는 버그 생성 확인부를 더 포함할 수 있다. It may further include a bug generation confirmation unit that analyzes the inconsistency and determines whether a functional or security bug can be generated.

상기 불일치 식별부는, 상기 베이스밴드 소프트웨어의 메시지 구조에서 상기 셀룰러 규격의 메시지 구조에 매핑되지 않은 나머지 정보 요소(information element, IE)를 누락된 불일치 또는 알 수 없는 불일치로 보고할 수 있다. The mismatch identification unit may report remaining information elements (IEs) not mapped to the message structure of the cellular standard in the message structure of the baseband software as a missing mismatch or an unknown mismatch.

상기 불일치 식별부는, 다수개의 상기 베이스밴드 소프트웨어의 계층 3(L3) 메시지에서 불일치를 자동으로 식별하고, 분석을 위한 버그 가능성이 있는 지점을 표시할 수 있다. The inconsistency identification unit may automatically identify inconsistencies in a plurality of layer 3 (L3) messages of the baseband software, and mark possible bug points for analysis.

본 발명의 또 다른 실시예에 따른 컴퓨터 장치에 의해 수행되는 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법은, 규격과 바이너리별 메타데이터가 있는 베이스밴드 펌웨어에서 메시지 구조를 추출하는 단계; 및 상기 규격의 메시지 구조를 상기 베이스밴드 펌웨어의 바이너리 임베디드 메시지 구조와 구문적으로 비교하고 기호 실행을 사용하여 구현 논리를 의미론적으로 분석하는 단계를 포함하여 이루어질 수 있다. A method for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document performed by a computer device according to another embodiment of the present invention has a message structure in baseband firmware having specifications and binary-specific metadata. Extracting; and syntactically comparing the message structure of the standard with the binary embedded message structure of the baseband firmware and semantically analyzing implementation logic using symbolic execution.

또한, 상기 비교 및 분석을 통해 상기 규격의 메시지 구조와 상기 베이스밴드 펌웨어의 메시지 구조의 불일치를 식별하는 단계를 더 포함할 수 있다. The method may further include identifying discrepancies between the message structure of the standard and the message structure of the baseband firmware through the comparison and analysis.

본 발명의 실시예들에 따르면 규격의 표준화된 메시지 구조를 활용하여 베이스밴드 소프트웨어에서 구현된 메시지 구조를 체계적으로 검사함으로써, 추출된 메시지 구조를 구문론 및 의미론적으로 규격과 비교 분석한 후 불일치를 보고하는, 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법 및 시스템을 제공할 수 있다. According to the embodiments of the present invention, a message structure implemented in baseband software is systematically inspected using a standardized message structure of the standard, the extracted message structure is syntactically and semantically compared with the standard, and discrepancies are reported after analysis. It is possible to provide a method and system for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document.

도 1은 본 발명의 일 실시예에 따른 셀룰러 네트워크 구조를 개략적으로 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 EMM 절차에서의 ATTACH REJECT 메시지 구조를 나타내는 도면이다.
도 3은 본 발명의 일 실시예에 따른 수동 펌웨어 분석과 자동화된 BASESPEC을 나타내는 도면이다.
도 4는 본 발명의 일 실시예에 따른 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법을 나타내는 흐름도이다.
도 5는 본 발명의 일 실시예에 따른 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 시스템을 나타내는 블록도이다.
도 6을 본 발명의 일 실시예에 따른 메시지 구조를 나타내는 도면이다.
도 7은 본 발명의 일 실시예에 따른 구문 비교의 예시를 나타내는 도면이다.
도 8은 본 발명의 일 실시예에 따른 의미론적 분석을 나타내는 도면이다.
도 9는 본 발명의 일 실시예에 따른 디코더의 불일치와 처리기 기능의 영향 사이의 관계를 나타낸다. 1 is a diagram schematically illustrating a cellular network structure according to an embodiment of the present invention.
2 is a diagram illustrating the structure of an ATTACH REJECT message in an EMM procedure according to an embodiment of the present invention.
3 is a diagram illustrating manual firmware analysis and automated BASESPEC according to an embodiment of the present invention.
4 is a flowchart illustrating a method of automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document according to an embodiment of the present invention.
5 is a block diagram illustrating a system for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document according to an embodiment of the present invention.
6 is a diagram showing a message structure according to an embodiment of the present invention.
7 is a diagram illustrating an example of syntax comparison according to an embodiment of the present invention.
8 is a diagram illustrating semantic analysis according to an embodiment of the present invention.
Figure 9 shows the relationship between the decoder mismatch and the effect of the processor function according to one embodiment of the present invention.

이하, 첨부된 도면을 참조하여 실시예들을 설명한다. 그러나, 기술되는 실시예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명되는 실시예들에 의하여 한정되는 것은 아니다. 또한, 여러 실시예들은 당해 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위해서 제공되는 것이다. 도면에서 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다. Hereinafter, embodiments will be described with reference to the accompanying drawings. However, the described embodiments may be modified in many different forms, and the scope of the present invention is not limited by the embodiments described below. In addition, several embodiments are provided to more completely explain the present invention to those skilled in the art. The shapes and sizes of elements in the drawings may be exaggerated for clarity.

셀룰러 베이스밴드는 모바일 통신에서 중요한 역할을 한다. 그러나 여러 가지 이유로 보안을 평가하는 것은 상당히 어렵다. 베이스밴드 펌웨어가 모호하고 복잡하기 때문에 수동 분석은 불가피하지만, 이러한 분석은 다양한 모델이나 버전을 다루려면 반복적인 노력이 필요하다. 펌웨어가 상당히 크고 복잡한 셀룰러 프로토콜과 관련된 수많은 기능을 포함하고 있기 때문에 분석을 자동화하는 것도 중요하지 않다. 따라서 베이스밴드 분석에 대한 기존 접근 방식은 단일 공급업체 내의 두 가지 모델 또는 버전으로만 제한된다. Cellular baseband plays an important role in mobile communications. However, assessing security is quite difficult for a number of reasons. Manual analysis is unavoidable because baseband firmware is obscure and complex, but such analysis is an iterative effort to deal with different models or versions. Automating the analysis is also not critical, as the firmware is quite large and contains numerous functions related to complex cellular protocols. Thus, traditional approaches to baseband analysis are limited to two models or versions within a single vendor.

본 발명의 실시예에서는 베이스밴드 소프트웨어와 셀룰러 규격의 비교 분석을 수행하는 BASESPEC라는 새로운 접근방식을 제안한다. BASESPEC는 규격의 표준화된 메시지 구조를 활용함으로써, 베이스밴드 소프트웨어에서 구현된 메시지 구조를 체계적으로 검사한다. 메시지 구조가 대상 펌웨어에 어떻게 포함되어 있는지 결정하기 위해서는 수동적이면서도 일회성 분석 노력이 필요하다. 그런 다음, BASESPEC는 추출된 메시지 구조를 구문론 및 의미론적으로 규격과 비교하고 마지막으로 불일치를 보고한다. 이러한 불일치는 개발자의 실수를 나타내며, 이는 베이스밴드의 규격 준수를 위반하거나 잠재적인 취약성을 암시한다. An embodiment of the present invention proposes a new approach called BASESPEC that performs comparative analysis of baseband software and cellular specifications. BASESPEC systematically checks the message structure implemented in baseband software by utilizing the standardized message structure of the specification. Determining how the message structure is embedded in the target firmware requires a manual, one-time analysis effort. BASESPEC then compares the extracted message structure syntactically and semantically with the specifications and finally reports any discrepancies. These inconsistencies indicate developer error, either violating the baseband's compliance or implying potential vulnerabilities.

실시예들에 따르면 상위 3개 공급업체 중 한 곳의 9개 모델의 18개 베이스밴드 펌웨어 이미지로 BASESPEC를 평가한 결과 수백 개의 불일치를 발견했다. 이러한 불일치를 분석하여 5개의 기능 오류와 4개의 메모리 관련 취약성 등 9개의 오류 사례를 발견했다. 특히, 이 중 두 가지는 중요한 원격 코드 실행 0-days이다. 또한, 실시예들에 따르면 다른 공급업체의 3개 모델에 BASESPEC를 적용한 결과, BASESPEC은 여러 불일치를 발견했고 그 중 2개는 버퍼 오버플로우 버그(buffer overflow bug)를 발견하게 했다.According to the examples, evaluating BASESPEC with 18 baseband firmware images of 9 models from one of the top 3 vendors found hundreds of discrepancies. By analyzing these discrepancies, we found nine failure cases, including five functional errors and four memory-related vulnerabilities. In particular, two of these are significant remote code execution 0-days. In addition, according to embodiments, as a result of applying BASESPEC to three models from different vendors, BASESPEC found several inconsistencies, two of which caused buffer overflow bugs.

아래의 본 발명의 실시예들은 베이스밴드의 특성을 네트워크 통신을 위한 모뎀으로 활용하여 베이스밴드 구현 및 셀룰러 규격의 비교 분석을 수행하는 BASESPEC이라는 새로운 시스템을 제안한다. Embodiments of the present invention below propose a new system called BASESPEC that performs comparison analysis of baseband implementation and cellular standards by utilizing the characteristics of baseband as a modem for network communication.

BASESPEC은 주요 직관은, 베이스밴드 소프트웨어의 메시지 디코더가 수신 메시지를 구문 분석하기 위해 기계 친화적인 구조에 프로토콜 규격을 내장한다는 점이다. 따라서 내장된 프로토콜 구조를 쉽게 추출하고 규격에 대한 참조와 비교할 수 있다. 이를 통해 BASESPEC은 전체 비교 프로세스를 자동화하고 규격을 준수하지 않는 프로토콜 구현에서 불일치를 명시적으로 발견할 수 있다. 이러한 불일치는 프로토콜 구조를 내장할 때 개발자의 실수를 직접적으로 지적하거나 잠재적인 취약성을 암시할 수 있다.BASESPEC's main intuition is that the baseband software's message decoder embeds a protocol specification into a machine-friendly structure for parsing incoming messages. This makes it easy to extract the built-in protocol structure and compare it to a reference to the specification. This allows BASESPEC to automate the entire comparison process and explicitly find inconsistencies in non-compliant protocol implementations. These inconsistencies can directly point to developer mistakes or hint at potential vulnerabilities when embedding protocol structures.

비교 분석을 위해 BASESPEC은, 먼저 베이스밴드 펌웨어를 분석하여 메시지 디코더를 식별하고 펌웨어에 내장된 프로토콜 구조를 추출한다. 그런 다음, 추출된 구조와 규격의 두 가지 측면에서 비교한다. 1) 내장된 구조가 구문적으로 규격과 동일한지 여부 및 2) 디코더 기능이 의미론적으로 규격을 따르는지 여부이다. 따라서 실제 구현과 규격 간의 비교 분석을 통해 불일치를 명확하게 식별할 수 있다. 그런 다음, 이러한 불일치를 수동으로 분석하여 기능 또는 보안 버그를 생성할 수 있는지 확인한다. BASESPEC은 디코더 기능을 찾고 메시지 구조가 펌웨어에 포함된 방식을 결정하기 위해 대상 펌웨어의 초기 분석이 필요하다. 이 경우 상당한 수작업이 필요할 수 있지만 이 분석은 일회성 작업일 뿐이다. 이 분석을 통해 얻은 지식은 다른 베이스밴드 모델 또는 버전에 재사용될 수 있다. 이는 주 디코딩 논리가 공급업체 내의 다양한 베이스밴드 모델 또는 버전에 걸쳐 거의 변경되지 않기 때문이다. 한편, 자동화된 비교 절차는 펌웨어를 분석한 후 다른 공급업체에서 재사용할 수 있다.For comparative analysis, BASESPEC first analyzes the baseband firmware to identify the message decoder and extracts the protocol structure embedded in the firmware. Then, the two aspects of the extracted structure and specification are compared. 1) whether the embedded structure is syntactically identical to the specification and 2) whether the decoder function semantically conforms to the specification. Therefore, a comparative analysis between the actual implementation and the specification can clearly identify discrepancies. Then manually analyze these discrepancies to see if they can create functional or security bugs. BASESPEC requires an initial analysis of the target firmware to find the decoder capabilities and determine how the message structures are embedded in the firmware. While this may require a lot of manual work, this analysis is a one-time operation only. The knowledge gained from this analysis can be reused for other baseband models or versions. This is because the main decoding logic rarely changes across different baseband models or versions within a vendor. Meanwhile, an automated comparison procedure can be reused by other vendors after analyzing the firmware.

BASESPEC의 프로토타입을 사용하여 상위 3개 공급업체 중 한 곳의 9개 장치 모델의 18개 베이스밴드 펌웨어 이미지에서 표준 L3 메시지 구현을 분석했다. BASESPE은 베이스밴드 구현에서 기능적 오류와 잠재적으로 취약한 지점을 모두 나타내는 수백 개의 불일치를 식별했다. 실시예들에 따르면 그들의 기능적, 보안적 영향을 조사하여 33개의 메시지에 영향을 미치는 9개의 오류 사례를 발견했다. 이 사례들 중 5개는 기능적 오류이고 4개는 메모리 관련 취약성이다. 특히, 2개의 취약성은 중요한 원격 코드 실행(RCE) 0-days이다. BASESPEC의 적용 가능성을 평가하기 위해 상위 3개 공급업체의 3개 모델에 BASESPEC를 적용했다. 이 분석을 통해 BASESPEC는 여러 불일치를 확인했으며, 그 중 두 가지는 버퍼 오버플로우를 발견하게 했다.Using a prototype from BASESPEC, we analyzed standard L3 message implementations in 18 baseband firmware images from nine device models from one of the top three vendors. BASESPE has identified hundreds of inconsistencies that indicate both functional errors and potential weak points in the baseband implementation. According to the embodiments, their functional and security impact was investigated and found 9 error cases affecting 33 messages. Five of these cases are functional errors and four are memory related vulnerabilities. In particular, two vulnerabilities are critical Remote Code Execution (RCE) 0-days. To evaluate the applicability of BASESPEC, we applied BASESPEC to three models from the top three vendors. Through this analysis, BASESPEC identified several inconsistencies, two of which led to the discovery of buffer overflows.

요약하자면, 본 발명의 기여는 다음과 같다.In summary, the contributions of the present invention are as follows.

본 발명의 실시예들은 셀룰러 베이스밴드 소프트웨어의 버그를 발견하기 위한 BASESPEC이라는 새로운 접근방식을 제안한다. 여기서, BASESPEC는 베이스밴드 소프트웨어와 문서화된 규격의 임베디드 규격에 대한 비교 분석을 수행한다.Embodiments of the present invention propose a new approach called BASESPEC to find bugs in cellular baseband software. Here, BASESPEC performs a comparative analysis of baseband software and embedded specifications of documented specifications.

또한, 실시예들은 BASESPEC의 실용성을 입증한다. 상위 3개 공급업체 중 한 곳의 9개 모델의 18개 베이스밴드 펌웨어 이미지에서 BASESPEC의 자동화된 프로토타입을 실행함으로써 규격에 부합하지 않는 수백 개의 불일치를 식별했다.Further, the examples demonstrate the practicality of BASESPEC. By running BASESPEC's automated prototype on 18 baseband firmware images from nine models from one of the top three vendors, it identified hundreds of non-conforming discrepancies.

또한, 실시예들에 따르면 불일치를 추가로 분석하여 9개의 오류 사례를 발견했는데, 그 중 5개는 기능 오류이며 4개는 2개의 RCE 0-days를 포함하는 취약점이다. In addition, according to the embodiments, mismatches were further analyzed to find 9 error cases, 5 of which were functional errors and 4 were vulnerabilities involving 2 RCE 0-days.

또한, 실시예들에 따르면 다른 공급업체의 3개 펌웨어 이미지에 BASESPEC을 적용하면 여러 개의 불일치가 확인되었으며, 이 중 2개는 버퍼 오버플로우 버그를 발생시킨다.In addition, according to embodiments, applying BASESPEC to three firmware images from different vendors identified several inconsistencies, two of which caused buffer overflow bugs.

도 1은 본 발명의 일 실시예에 따른 셀룰러 네트워크 구조를 개략적으로 나타내는 도면이다.1 is a diagram schematically illustrating a cellular network structure according to an embodiment of the present invention.

도 1을 참조하면, 셀룰러 네트워크는 세 가지 주요 구성 요소, 즉 셀룰러 장치(Cellular Device, 120), 기지국(Base Station, 110) 및 코어 네트워크(Core Network, 130)를 가지고 있다. 이러한 구성 요소들은 셀 생성에 따라 다른 용어를 사용한다. 여기서는 단순성을 위해 일반적인 용어를 사용한다. 예를 들어, NodeB, eNodeB, gNodeB는 각각 3G, 4G, 5G의 기지국(110)을 나타낸다. Referring to FIG. 1 , a cellular network has three main components: a cellular device (Cellular Device, 120), a base station (Base Station, 110), and a core network (Core Network, 130). These components use different terms depending on the cell generation. For simplicity, we use generic terminology here. For example, NodeB, eNodeB, and gNodeB represent base stations 110 of 3G, 4G, and 5G, respectively.

셀룰러 장치(120)는 셀룰러 네트워크의 가장자리에 위치한 모든 장치를 말하며 사용자가 셀룰러 서비스에 액세스할 수 있도록 한다. 가장 일반적인 장치는 스마트폰이다. 셀룰러 장치(120)에는 일반적으로 성능을 위한 두 개의 별도의 프로세서가 있다. 모바일 운영 체제와 사용자 애플리케이션이 실행되는 애플리케이션 프로세서(application processor, AP) 및 무선/디지털 신호 처리가 수행되는 셀룰러 베이스밴드 프로세서(Baseband Processor, BP)이다.Cellular device 120 refers to any device located at the edge of a cellular network and allowing a user to access cellular services. The most common device is a smartphone. Cellular device 120 typically has two separate processors for performance. An application processor (AP) in which a mobile operating system and user applications are executed, and a cellular baseband processor (BP) in which wireless/digital signal processing is performed.

기지국(110)은 셀룰러 장치(120)에 무선 연결을 제공한다. 무선 인터페이스를 통해 코어 네트워크(130)에서 셀룰러 장치(120)로, 그리고 그 반대로 메시지를 전송한다. 따라서, 사용자에게 더 나은 서비스 품질을 제공하기 위해 무선 자원을 관리할 책임이 있다. 코어 네트워크(130)는 중요한 사용자 식별 및 암호화 및 무결성 검사와 같은 보안 서비스를 포함하는 이동성 관리 및 세션 관리와 같은 핵심 절차를 제공한다.Base station 110 provides a wireless connection to cellular device 120 . Messages are transmitted from the core network 130 to the cellular device 120 and vice versa over the air interface. Therefore, it is responsible for managing radio resources to provide users with better quality of service. The core network 130 provides key procedures such as mobility management and session management, including important user identification and security services such as encryption and integrity checking.

그리고, 셀룰러 프로토콜 스택(121)에는 OSI 모델로 여러 계층(layer)이 있다. 셀룰러 네트워크의 무선 인터페이스는 OSI 모델의 계층 1, 2에 있다. 다양한 핵심 절차 메시지가 계층 3에서 전달된다. 이러한 계층을 적절하게 처리하기 위해, 셀룰러 장치(120)의 베이스밴드는 셀룰러 프로토콜 스택(121)도 구현한다. 또한, 셀 범위와 로밍에 대한 하위 호환성을 제공하기 위해 최신 4G/5G 셀룰러 장치도 이전의 2G/3G 셀룰러 기술을 지원한다.In addition, the cellular protocol stack 121 has several layers as an OSI model. The air interface of a cellular network is at layers 1 and 2 of the OSI model. Various key procedural messages are carried at Layer 3. To properly handle these layers, the baseband of cellular device 120 also implements cellular protocol stack 121 . In addition, newer 4G/5G cellular devices also support older 2G/3G cellular technologies to provide backward compatibility for cell coverage and roaming.

아래에서는 셀룰러 규격 및 표준 계층 3 메시지에 대해 설명한다.The cellular specifications and standard layer 3 messages are described below.

셀룰러 규격은 ETSI와 같은 7개의 통신 표준 개발 기관을 통합하는 3세대 파트너십 프로젝트(3rd Generation Partnership Project, 3GPP)라고 불리는 국제간 작업 그룹에 의해 정의된다. 100개 이상의 규격서가 있으며 대부분의 문서는 수백 페이지이다. 엄청난 양과 복잡성 때문에 종래 기술에서 많은 실수가 관찰되었다.Cellular specifications are defined by an international working group called the 3rd Generation Partnership Project (3GPP), which unites seven telecommunication standards development organizations such as ETSI. There are over 100 specifications, most of which are hundreds of pages. Many mistakes have been observed in the prior art because of their sheer volume and complexity.

규격에 있는 다양한 프로토콜과 메시지들 중에서, 표준 계층 3(L3) 메시지는 사용자의 개인 정보를 보호하기 위한 이동성 관리, 세션 관리 또는 심지어 암호화 작업과 같은 복잡한 핵심 절차에서 사용된다. 따라서, 처리 루틴에서 여러 가지 취약성이 발견되었다. 표준 L3 메시지는 GSM 또는 LTE와 같은 특정 셀룰러 세대에 국한될 뿐만 아니라, 세대를 거쳐 여러 다른 프로토콜을 포함한다. [표 1]에는 L3 프로토콜, 프로토콜 판별기(PD) 및 규격서 번호가 나열되어 있다. 각 규격서는 해당 프로토콜의 세부 메시지 형식 및 방향을 정의한다. L3 메시지는 셀룰러 장치(즉, 다운링크) 또는 코어 네트워크(즉, 업링크)로 전송될 수 있다. 또한, L3 메시지는 전송되는 방향에 따라 형식이 다를 수 있다. 이하, [표 1]에 열거된 약어를 사용하여 각 L3 프로토콜을 나타낸다.Among the various protocols and messages in the specification, standard layer 3 (L3) messages are used in complex core procedures such as mobility management, session management or even cryptographic operations to protect user privacy. Therefore, several vulnerabilities were discovered in the processing routine. Standard L3 messages are not only limited to a specific cellular generation, such as GSM or LTE, but also cover different protocols across generations. [Table 1] lists the L3 protocol, protocol discriminator (PD), and specification number. Each specification defines the detailed message format and direction of the corresponding protocol. The L3 message may be sent to the cellular device (ie downlink) or to the core network (ie uplink). In addition, L3 messages may have different formats depending on the direction in which they are transmitted. Hereinafter, each L3 protocol is represented using the abbreviations listed in [Table 1].

[표 1][Table 1]

각 표준 L3 메시지에는 해당 규격서 [표 1]에 정의된 특정 형식이 있다. 표준 L3 메시지는 PD 및 메시지 ID를 포함하는 2 바이트 헤더로 시작한다. PD 및 메시지 ID의 튜플을 사용하면 수신자 베이스밴드가 메시지를 디코딩하기 위해 지정된 메시지의 형식을 결정할 수 있다. 메시지의 각 연속 필드를 표준 정보 요소(information element, IE)라고 한다.Each standard L3 message has a specific format defined in the corresponding specification [Table 1]. A standard L3 message starts with a 2-byte header containing the PD and Message ID. The tuple of PD and message ID allows the receiver baseband to determine the format of the message designated for decoding the message. Each successive field of a message is called a standard information element (IE).

표준 IE에는 IE 식별자(IE identifier, IEI), 길이 표시자(length indicator, LI), 및 값(각각 유형(type, T), 길이(length, L), 값(value, V))의 세 부분이 있을 수 있다. IE는 IEI의 발생에 따라 명령(imperative)이거나 비명령(non-imperative)일 수 있다. 명령 IE는 IEI가 없는 반면, 비명령 IE는 IEI가 있어야 한다. 메시지에서, 명령 IE는 비명령 IE보다 먼저 고정된 순서로 나타나야 하므로 IEI 없이 구별할 수 있다. LI는 값 부분의 바이트 수를 나타내는 반면, 규격의 IE 길이는 모든 부분의 바이트 수를 나타낸다. IE 형식은 T, V, TV, LV, TLV, LV-E, TLV-E의 7가지이다. 여기서 T, L, V는 각각 IEI, LI, 값의 발생을 나타낸다. -E 접미사는 1 바이트 LI를 2 바이트 LI로 확장하며, 길이는 0에서 65535 사이이다.Standard IE has three parts: an IE identifier (IEI), a length indicator (LI), and a value (type (T), length (L), and value (V), respectively). This can be. The IE may be either imperative or non-imperative depending on the occurrence of the IEI. Command IEs do not have an IEI, whereas non-command IEs must have an IEI. In a message, command IEs must appear in a fixed order before non-command IEs, so they can be distinguished without IEIs. LI indicates the number of bytes of the value part, whereas the IE length of the specification indicates the number of bytes of all parts. There are seven IE formats: T, V, TV, LV, TLV, LV-E, and TLV-E. where T, L, and V represent the occurrence of IEI, LI, and values, respectively. The -E suffix expands a 1-byte LI to a 2-byte LI, and is between 0 and 65535 in length.

도 2는 본 발명의 일 실시예에 따른 EMM 절차에서의 ATTACH REJECT 메시지 구조를 나타내는 도면이다.2 is a diagram illustrating the structure of an ATTACH REJECT message in an EMM procedure according to an embodiment of the present invention.

도 2를 참조하면, 원시 패킷 데이터가 있는 EMM 절차에서 ATTACH REJECT라는 예제 메시지를 보여준다. 메시지 구조의 각 행은 IE를 나타낸다. 헤더는 4 비트 PD(0x7), 4 비트 보안 헤더 유형(0x0), 및 8 비트 메시지 ID(0x44)로 구성된다. 또한, 명령 IE에는 비헤더 부분 IE, EMM 원인이 포함된다. 그런 다음, 패킷은 ESM 메시지 컨테이너와 T3346 값으로 계속되며, 이 중 IEI는 각각 0x78 및 0x5f이다. ESM 메시지 컨테이너 IE의 형식은 TLV-E이므로, LI는 0에서 65535까지의 길이를 나타낼 수 있다. T3346 값 IE의 길이는 LI가 걸리더라도 3으로 고정된다.Referring to FIG. 2, an example message called ATTACH REJECT in an EMM procedure with raw packet data is shown. Each row of the message structure represents an IE. The header consists of a 4-bit PD (0x7), a 4-bit Security Header Type (0x0), and an 8-bit Message ID (0x44). In addition, the command IE includes non-header part IE, EMM cause. The packet then continues with the ESM message container and the value T3346, of which the IEIs are 0x78 and 0x5f, respectively. Since the format of the ESM message container IE is TLV-E, LI can represent a length from 0 to 65535. The length of T3346 value IE is fixed to 3 even if LI is applied.

아래에서는 베이스밴드 프로세서에 대해 설명한다.The baseband processor is described below.

셀룰러 장치에서 BP는 디지털 신호 처리를 포함한 셀룰러 통신을 위한 모든 무선 기능을 관리하는 전용 프로세서이다. 무선 통신을 위한 실시간 요구 사항을 충족시키기 위해, 그것은 그것의 펌웨어로서 실시간 운영체제를 실행한다. 따라서, 그것의 펌웨어는 단일 실행 파일로 작동하며 런타임에 메모리에 로드된다. 그러므로, 여기에서는 베이스밴드 펌웨어를 베이스밴드 바이너리(baseband binary)로 부른다.In a cellular device, the BP is a dedicated processor that manages all radio functions for cellular communication including digital signal processing. To meet the real-time requirements for wireless communication, it runs a real-time operating system as its firmware. Thus, its firmware works as a single executable file and is loaded into memory at runtime. Therefore, baseband firmware is referred to herein as the baseband binary.

베이스밴드 소프트웨어는 일반적으로 독점적이며 제조업체는 소스 코드와 같은 세부 정보를 공개적으로 공유하지 않는다. 예를 들어 Qualcomm의 Snapdragon, MediaTe의 Helio와 삼성의 Exynos는 BP를 포함하는 3대 시스템-온-어-칩(system-on-a-chip) 제품이다. 그러나 이들 제조업체 중 누구도 자세한 정보를 공유하지 않는다. 따라서 연구자들은 베이스밴드 소프트웨어의 보안 문제를 분석하고 식별하기 위해 역엔지니어링을 수행하는 경우가 많다. 또한, 각 베이스밴드는 설계 선택에 따라 다른 구조를 가질 수 있다. 그러므로 베이스밴드 분석에는 대상 베이스밴드 구조를 지원하는 적절한 도구가 필요하다.Baseband software is usually proprietary and manufacturers do not publicly share details such as source code. For example, Qualcomm's Snapdragon, MediaTe's Helio and Samsung's Exynos are the big three system-on-a-chip products that include BP. However, none of these manufacturers are sharing detailed information. Therefore, researchers often perform reverse engineering to analyze and identify security issues in baseband software. Also, each baseband may have a different structure depending on design choices. Therefore, baseband analysis requires appropriate tools to support the target baseband structure.

베이스밴드에서 L3 메시지는 PD 및 메시지 ID에 의해 먼저 분류된다. 그런 다음, 베이스밴드는 미리 정의된 메시지 구조를 사용하여 IE의 정보를 얻기 위해 메시지를 구문 분석하고 디코딩한다. 메시지를 디코딩한 후 디코딩된 IE마다 적절한 작업을 수행하고, 마지막으로 규격에 정의된 대로 메시지를 처리한다. 이후, 디코딩 절차에 관련된 기능을 L3 메시지 디코더, 메시지 처리를 위한 기능을 IE 및 메시지 처리기라고 한다.In baseband, L3 messages are first classified by PD and message ID. The baseband then parses and decodes the message to obtain the IE's information using a predefined message structure. After decoding the message, appropriate actions are performed for each decoded IE, and finally, the message is processed as defined in the specification. Hereafter, functions related to the decoding procedure are referred to as L3 message decoders, and functions for message processing are referred to as IE and message handler.

베이스밴드 펌웨어에서 버그를 찾는 데 있어 기존 접근 방식을 방해하는 기술적 과제에는 주로 세 가지가 있다. There are mainly three technical challenges that hamper conventional approaches to finding bugs in baseband firmware.

첫째, 베이스밴드 펌웨어의 모호성이다. 셀룰러 베이스밴드 펌웨어는 공급업체가 자사의 독점적 구현을 보호하기 위해 세부 사항을 공개하지 않기 때문에 대부분 알려지지 않은 상태이다. 이러한 모호성은 펌웨어 분석을 심각하게 방해하므로 분석에 상당한 수동 작업이 필요하다. 수동 작업을 줄이기 위해 메모리 덤프는 초기화 단계를 이미 처리했으며 런타임 정보를 포함하는 경우가 많다. 그러나 이러한 기능을 얻으려면 실제 장치 및 특수 기능(예컨대, 이전 안드로이드 장치에서만 사용 가능한 숨겨진 덤프 메뉴) 또는 이를 트리거하기 위한 취약성이 필요하다. JTAG와 같은 하드웨어 디버그 인터페이스를 사용하는 것을 고려할 수 있지만 최신 장치에서는 사용할 수 없다. 또한, IDA Pro와 같은 최첨단 바이너리 분석 도구로는 메모리 덤프도 분석할 수 없다. 예를 들어 정적 분석의 기본인 함수 식별은 펌웨어의 모호성 때문에 실패하는 경우가 많다.First, the ambiguity of the baseband firmware. Cellular baseband firmware is largely unknown as vendors do not disclose details to protect their proprietary implementations. This ambiguity severely hinders firmware analysis, which requires significant manual effort. To reduce manual effort, memory dumps have already taken care of the initialization phase and often contain runtime information. However, getting these features requires a real device and special features (eg a hidden dump menu only available on older Android devices) or vulnerabilities to trigger them. You could consider using a hardware debug interface like JTAG, but it's not available on newer devices. Also, memory dumps cannot be analyzed by state-of-the-art binary analysis tools such as IDA Pro. For example, function identification, which is fundamental to static analysis, often fails due to ambiguities in the firmware.

둘째, 수동 분석의 적용 가능성 제한이다. 베이스밴드 펌웨어의 불명확함을 밝혀내기 위해 연구자들은 수동 분석에 초점을 맞추었다. 그러나 수백 개의 L3 메시지에 대한 수많은 기능을 수동으로 조사하는 것은 거의 불가능하기 때문에 이 방법은 확장성과 적용성에 근본적으로 한계가 있다. 따라서 유사한 유형의 취약성조차 발견되지 않은 상태로 남아 있는 경우가 많다. 또한, 모바일 장치가 소프트웨어와 하드웨어에서 빠르게 발전함에 따라 펌웨어 바이너리는 서로 상당한 차이를 가지고 있다. 따라서 단일 공급업체 내에서조차 다양한 장치 모델 또는 버전의 펌웨어를 검사하는 것은 여전히 어려운 과제이며, 추가적인 심각한 수동 작업이 필요하다.Second, the applicability of manual analysis is limited. To uncover ambiguities in the baseband firmware, the researchers focused on manual analysis. However, since it is almost impossible to manually investigate numerous functions of hundreds of L3 messages, this method has fundamental limitations in scalability and applicability. Therefore, even similar types of vulnerabilities often remain undiscovered. Also, as mobile devices evolve rapidly in software and hardware, firmware binaries have significant differences from each other. Therefore, checking the firmware of different device models or versions, even within a single vendor, remains a challenge and requires additional serious manual work.

셋째, 자동 분석의 어려움이다. 확장성과 적용 가능성을 달성하려면 베이스밴드 분석을 자동화하는 것이 명령이다. 자동화된 분석은 크게 정적 분석과 동적 분석으로 나눌 수 있지만 두 방법 모두 몇 가지 과제가 있다. 정적 분석은 베이스밴드 펌웨어가 매우 크며(즉, 수십 MB), 암호화 작업과 같이 분석할 수많은 사소한 기능이 포함되어 있다. 더욱이, 셀룰러 규격은 100개가 넘는 문서로 작성되어 매우 복잡하기 때문에 분석 규칙을 만드는 것은 어려운 일이다. 따라서 기존 연구는 실제 또는 에뮬레이트 하드웨어(emulated hardware)를 사용한 동적 분석(예컨대, fuzzing)에 크게 의존한다. 안타깝게도 베이스밴드의 많은 취약성은 복잡한 상태 때문에 동적으로 트리거하기 어렵다. 또한, 이러한 접근 방식은 버그를 식별하기 위해 프로그램 충돌과 같은 명시적 오라클에 의존하므로 버그를 몇 가지 버그 유형으로 제한한다.Third, the difficulty of automatic analysis. To achieve scalability and applicability, automating baseband analysis is imperative. Automated analysis can be broadly divided into static analysis and dynamic analysis, but both methods have some challenges. For static analysis, the baseband firmware is very large (i.e. tens of MB) and contains numerous trivial features to analyze, such as cryptographic operations. Moreover, making analysis rules is difficult because the cellular specifications are very complex with over 100 documents. Thus, existing studies rely heavily on dynamic analysis (eg, fuzzing) using real or emulated hardware. Unfortunately, many vulnerabilities in baseband are difficult to trigger dynamically because of their complex state. Additionally, this approach relies on explicit oracles, such as program crashes, to identify bugs, limiting bugs to a few bug types.

이러한 과제를 해결하기 위해, 본 발명의 실시예들은 베이스밴드 펌웨어와 셀룰러 규격의 비교 분석을 수행하는 BASESPEC이라는 새로운 접근방식을 제안한다. BASESPEC은 네트워크 통신에서 메시지 디코더의 자연적인 특성을 활용한다. 본 발명의 주요 직관은 1) 네트워크 통신의 메시지 디코더는 메시지 필드를 식별하고 구문 분석할 수 있도록 구현에 규격을 내장해야 한다는 것이다. 2) 그러한 내장 메시지 구조가 기계 친화적인 형태로 존재하므로, 실시예들은 확실히 그것들을 추출할 수 있다. 3) 추출된 구조에서의 비교 분석은 규격서와 관련하여 잘못 삽입된 구조를 식별할 수 있다. 4) 디코딩 루틴의 주요 논리가 거의 변하지 않기 때문에 메시지 구조는 유사하게 다양한 장치 모델/변환에서 추출될 수 있다. 이후, 베이스밴드 펌웨어의 내장 규격과 메시지 구조를 바이너리 임베디드 규격/메시지라고 부른다.In order to solve this problem, embodiments of the present invention propose a new approach called BASESPEC that performs comparative analysis of baseband firmware and cellular specifications. BASESPEC exploits the natural properties of message decoders in network communication. The main intuition of the present invention is that 1) message decoders of network communication should have specifications built into their implementations to be able to identify and parse message fields. 2) Since such built-in message structures exist in machine-friendly form, embodiments can certainly extract them. 3) Comparative analysis on the extracted structures can identify misplaced structures with respect to specifications. 4) The message structure can similarly be extracted from various device models/transformations, since the main logic of the decoding routine hardly changes. Hereafter, the built-in specifications and message structure of the baseband firmware are called binary embedded specifications/messages.

도 3은 본 발명의 일 실시예에 따른 수동 펌웨어 분석과 자동화된 BASESPEC을 나타내는 도면이다.3 is a diagram illustrating manual firmware analysis and automated BASESPEC according to an embodiment of the present invention.

도 3을 참조하면, 실시예들의 접근 방식은 크게 수동 펌웨어 분석(310)과 완전 자동화된 BASESPEC(320)의 두 부분으로 나뉜다. Referring to FIG. 3 , the approach of the embodiments is largely divided into two parts: manual firmware analysis 310 and fully automated BASESPEC 320 .

수동 펌웨어 분석(310)은 주로 메시지 디코더의 위치와 바이너리별 메타데이터(binary-specific metadata)라고 부르는 베이스밴드 펌웨어에 규격이 내장되어 있는 방법을 탐색한다. 디코딩 절차의 주요 논리는 동일한 공급업체 내의 다양한 장치 모델 또는 버전에 걸쳐 거의 변경되지 않기 때문에 이 절차는 수동이지만 일회성 작업이다. Passive firmware analysis 310 primarily searches for the location of message decoders and how specifications are built into the baseband firmware called binary-specific metadata. Since the main logic of the decoding procedure rarely changes across different device models or versions within the same vendor, this procedure is manual but a one-time task.

그런 다음, 완전 자동화된 BASESPEC(320)은 이러한 결과를 구문/의미론적 비교에 활용한다. 특히, 완전 자동화된 BASESPEC(320)은 대상 베이스밴드 바이너리에서 디코더 함수 주소와 임베디드 메시지 구조의 추출을 자동화한다. 구문 비교는 문자 그대로 바이너리 임베디드 규격이 설명서의 규격과 일치하는지 여부를 검증한다. 한편, 의미론적 비교는 디코더 기능의 기본 논리가 기호 실행을 활용하는 규격을 올바르게 준수하는지 여부를 조사한다. 마지막으로, 개발자의 실수를 나타내는 불일치를 보고하며, 이는 규격 준수를 위반하거나 나중에 분석하기 위해 잠재적으로 취약한 지점을 암시할 수 있다. 따라서 보고된 불일치의 영향을 받는 메시지만 분석하면 된다.The fully automated BASESPEC 320 then utilizes these results for syntactic/semantic comparisons. In particular, the fully automated BASESPEC 320 automates the extraction of decoder function addresses and embedded message structures from target baseband binaries. Syntax comparison literally verifies whether the binary embedded specification matches the specification of the manual. On the other hand, semantic comparison examines whether the underlying logic of the decoder function correctly complies with the specification utilizing symbolic execution. Finally, it reports discrepancies that indicate developer error, which may violate compliance or hint at a potentially weak spot for later analysis. Therefore, only the messages affected by the reported discrepancy need be analyzed.

실시예들은 앞에서 언급된 과제를 다음과 같이 다룬다. 먼저, 펌웨어, 특히 L3 메시지 디코더를 수동으로 분석하여 펌웨어의 모호성을 파악한다. 이 분석은 다른 베이스밴드 펌웨어에 재사용될 수 있다. 즉, 일회성 작업이다. 둘째, 완전 자동화된 BASESPEC(320)은 수많은 L3 메시지에서 불일치를 자동으로 식별하고 분석을 위한 버그 가능성이 있는 지점을 드러내므로 효율적이고 실용적인 분석을 가능하게 한다. 실제로 완전 자동화된 BASESPEC(320)에서 보고한 불일치를 분석하여 9개의 오류 사례를 발견했는데, 이 중 5개는 기능 오류이고 4개는 2개의 RCE 0-days을 포함한 취약성이다. 마지막으로, 주 디코딩 논리가 거의 변경되지 않기 때문에 완전 자동화된 BASESPEC(320)은 자동화가 있는 다양한 장치 모델 또는 버전에 적용할 수 있다. 디코더를 분석하기 위해 일회성 수동 작업이 필요하더라도 다른 공급업체의 펌웨어에도 적용할 수 있다.Embodiments address the aforementioned challenges as follows. First, manually analyze the firmware, especially the L3 message decoder, to identify ambiguities in the firmware. This analysis can be reused for other baseband firmware. That is, it is a one-time operation. Second, the fully automated BASESPEC 320 automatically identifies inconsistencies in numerous L3 messages and exposes potential buggy spots for analysis, thus enabling efficient and pragmatic analysis. In fact, by analyzing the discrepancies reported by the fully automated BASESPEC (320), we found 9 error cases, 5 of which were functional errors and 4 were vulnerabilities including 2 RCE 0-days. Finally, the fully automated BASESPEC 320 is adaptable to a variety of device models or versions with automation, as the main decoding logic hardly changes. It can be applied to firmware from other vendors, even if it requires a one-time manual effort to analyze the decoder.

셀룰러 네트워크의 다양한 프로토콜 중에서 표준 L3 메시지를 대상으로 선택한다. 이러한 메시지는 다양한 프로토콜을 포함하며 셀룰러 핵심 절차에서 중요한 역할을 한다. L3 프로토콜은 수많은 복잡한 논리 및 데이터 구조를 가지고 있기 때문에, 몇 가지 취약성이 발견되었다. 따라서 [표 1]에 열거된 표준 L3 메시지 분석에 중점을 둔다. L3 프로토콜의 모든 메시지가 표준 L3 메시지로 표시되는 것은 아니다. 이러한 기타 메시지는 본 발명의 범위를 벗어난다. Choose from a variety of protocols in cellular networks to target standard L3 messages. These messages cover a variety of protocols and play an important role in cellular core procedures. Since the L3 protocol has numerous complex logic and data structures, several vulnerabilities have been discovered. Therefore, we focus on the standard L3 message analysis listed in [Table 1]. Not all messages in the L3 protocol are represented as standard L3 messages. These other messages are outside the scope of this invention.

실시예들에 따르면 셀룰러 베이스밴드의 경우, 주로 상위 3개 모바일 프로세서 공급업체 중 한 곳(즉, 공급업체1)에 초점을 맞추고 있다. 구조가 ARM.1인 여러 최신 장치 모델에서 베이스밴드 펌웨어를 분석한다. 그러나 실시예들에 따른 접근 방식은 다른 베이스밴드 펌웨어에 적용될 수 있지만 펌웨어의 모호성을 밝혀내기 위해서는 상당한 수동적 노력이 필요할 수 있다. 또한, 상위 3개 공급업체 중 하나(즉, 공급업체2)의 펌웨어를 분석하여 BASESPEC을 성공적으로 적용했다.In the case of cellular baseband, according to embodiments, the focus is primarily on one of the top three mobile processor vendors (ie, vendor 1). Analyze baseband firmware on several modern device models with architecture ARM.1. However, the approach according to the embodiments can be applied to other baseband firmware, but considerable manual effort may be required to uncover firmware ambiguities. In addition, we successfully applied BASESPEC by analyzing the firmware of one of the top three vendors (i.e., Vendor 2).

도 4는 본 발명의 일 실시예에 따른 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법을 나타내는 흐름도이다.4 is a flowchart illustrating a method of automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 일 실시예에 따른 컴퓨터 장치에 의해 수행되는 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법은, 베이스밴드의 특성을 네트워크 통신을 위한 모뎀으로 활용하여 베이스밴드 소프트웨어를 셀룰러 규격과 비교 분석하는 단계(S110)를 포함하고, 셀룰러 규격의 메시지 구조와 비교 분석하는 단계는, 셀룰러 규격의 표준화된 메시지 구조를 활용하여, 베이스밴드 소프트웨어에서 구현된 메시지 구조를 자동으로 검사할 수 있다. Referring to FIG. 4 , a method for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document performed by a computer device according to an embodiment of the present invention is based on comparing baseband characteristics for network communication. It includes the step of comparing and analyzing the baseband software with the cellular standard by using it as a modem (S110), and the step of comparing and analyzing with the cellular standard message structure is implemented in the baseband software using the standardized message structure of the cellular standard. The message structure can be checked automatically.

또한, 실시예에 따라 비교 분석을 통해 베이스밴드 소프트웨어의 메시지 구조에서 셀룰러 규격을 준수하지 않는 불일치를 식별하는 단계(S120)를 더 포함할 수 있다. In addition, according to an embodiment, a step of identifying a mismatch that does not comply with the cellular standard in the message structure of the baseband software through comparative analysis (S120) may be further included.

또한, 실시예에 따라 불일치를 분석하여 기능 또는 보안 버그를 생성할 수 있는지 확인하는 단계(S130)를 더 포함할 수 있다. In addition, according to the embodiment, a step of analyzing the inconsistency to determine whether a functional or security bug can be generated (S130) may be further included.

본 발명의 일 실시예에 따른 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법은 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 시스템을 예를 들어 보다 구체적으로 설명할 수 있다. A method for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document according to an embodiment of the present invention takes a system for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document as an example. Can you please explain in more detail.

도 5는 본 발명의 일 실시예에 따른 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 시스템을 나타내는 블록도이다.5 is a block diagram illustrating a system for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document according to an embodiment of the present invention.

도 5를 참조하면, 본 발명의 일 실시예에 따른 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 시스템(500)은 비교 분석부(510)를 포함하여 이루어질 수 있다. 실시예에 따라 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 시스템(500)은 불일치 식별부(520) 및 버그 생성 확인부(530)를 더 포함할 수 있다.Referring to FIG. 5 , a system 500 for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document according to an embodiment of the present invention may include a comparison and analysis unit 510 . According to an embodiment, the system 500 for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document may further include a mismatch identification unit 520 and a bug generation confirmation unit 530 .

단계(S110)에서, 비교 분석부(510)는 베이스밴드의 특성을 네트워크 통신을 위한 모뎀으로 활용하여 베이스밴드 소프트웨어를 셀룰러 규격과 비교 분석할 수 있다. 이 때, 비교 분석부(510)는 셀룰러 규격의 표준화된 메시지 구조를 활용하여, 베이스밴드 소프트웨어에서 구현된 메시지 구조를 자동으로 검사할 수 있다. In step S110, the comparison and analysis unit 510 may compare and analyze the baseband software with the cellular standard by utilizing the characteristics of the baseband as a modem for network communication. At this time, the comparison and analysis unit 510 may automatically check the message structure implemented in the baseband software by utilizing the standardized message structure of the cellular standard.

보다 구체적으로, 비교 분석부(510)는 베이스밴드 소프트웨어를 분석하여 메시지 디코더를 식별하고, 베이스밴드 소프트웨어에 내장된 메시지 구조를 추출하고, 추출된 베이스밴드 소프트웨어에 내장된 메시지 구조를 셀룰러 규격의 메시지 구조와 비교 분석할 수 있다.More specifically, the comparison and analysis unit 510 analyzes the baseband software to identify a message decoder, extracts a message structure embedded in the baseband software, and converts the message structure embedded in the extracted baseband software into a cellular standard message. structure and comparative analysis.

비교 분석부(510)는 베이스밴드 소프트웨어를 셀룰러 규격과 구문적 및 의미론적으로 비교 분석할 수 있다. 비교 분석부(510)는 내장된 프로토콜 구조가 구문적으로 셀룰러 규격과 동일한지 여부를 비교 분석할 수 있다. 또한, 비교 분석부(510)는 메시지 디코더의 기능이 기본 논리가 의미론적으로 기호 실행을 활용하는 셀룰러 규격을 준수하는지 여부를 비교 분석할 수 있다.The comparison and analysis unit 510 may compare and analyze the baseband software with the cellular standard syntactically and semantically. The comparison and analysis unit 510 may compare and analyze whether the built-in protocol structure is syntactically identical to the cellular standard. In addition, the comparison and analysis unit 510 may compare and analyze whether the function of the message decoder complies with the cellular standard in which the basic logic semantically utilizes symbol execution.

단계(S120)에서, 불일치 식별부(520)는 비교 분석을 통해 베이스밴드 소프트웨어의 메시지 구조에서 셀룰러 규격을 준수하지 않는 불일치를 식별할 수 있다. 보다 구체적으로, 불일치 식별부(520)는 베이스밴드 소프트웨어의 메시지 구조에서 셀룰러 규격의 메시지 구조에 매핑되지 않은 나머지 정보 요소(IE)를 누락된 불일치 또는 알 수 없는 불일치로 보고할 수 있다. In step S120, the mismatch identification unit 520 may identify a mismatch that does not conform to the cellular standard in the message structure of the baseband software through comparative analysis. More specifically, the mismatch identification unit 520 may report remaining information elements (IEs) that are not mapped from the message structure of the baseband software to the message structure of the cellular standard as a missing mismatch or an unknown mismatch.

이 때, 불일치 식별부(520)는 다수개의 베이스밴드 소프트웨어의 계층 3(L3) 메시지에서 불일치를 자동으로 식별하고, 분석을 위한 버그 가능성이 있는 지점을 표시할 수 있다.At this time, the inconsistency identification unit 520 may automatically identify inconsistencies in layer 3 (L3) messages of a plurality of baseband software and display possible bug points for analysis.

단계(S130)에서, 버그 생성 확인부(530)는 불일치를 분석하여 기능 또는 보안 버그를 생성할 수 있는지 확인할 수 있다. In step S130, the bug creation check unit 530 analyzes the inconsistency to determine whether a functional or security bug can be created.

본 발명의 다른 실시예에 따른 컴퓨터 장치에 의해 수행되는 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법은, 규격과 바이너리별 메타데이터가 있는 베이스밴드 펌웨어에서 메시지 구조를 추출하는 단계, 및 규격의 메시지 구조를 베이스밴드 펌웨어의 바이너리 임베디드 메시지 구조와 구문적으로 비교하고 기호 실행을 사용하여 구현 논리를 의미론적으로 분석하는 단계를 포함하여 이루어질 수 있다. A method for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document performed by a computer device according to another embodiment of the present invention includes a message structure in baseband firmware having specifications and binary-specific metadata. It may include extracting and syntactically comparing the message structure of the specification with the binary embedded message structure of the baseband firmware and semantically analyzing the implementation logic using symbolic execution.

또한, 비교 및 분석을 통해 규격의 메시지 구조와 베이스밴드 펌웨어의 메시지 구조의 불일치를 식별하는 단계를 더 포함할 수 있다. Further, the method may further include identifying discrepancies between the message structure of the standard and the message structure of the baseband firmware through comparison and analysis.

본 발명의 다른 실시예에 따른 컴퓨터 장치에 의해 수행되는 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법은 앞에서 설명한 일 실시예에 따른 컴퓨터 장치에 의해 수행되는 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법과 그 구성이 중복되어 상세한 설명은 생략하기로 한다. 예컨대, 규격은 셀룰러 규격을 포함할 수 있으며, 베이스밴드 펌웨어는 베이스 소프트웨어에 포함될 수 있다. A method for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document performed by a computer device according to another embodiment of the present invention is a mobile communication standard performed by a computer device according to an embodiment described above. A method for automatically comparing and analyzing mobile communication baseband software based on documents and a detailed description thereof will be omitted because the configuration is redundant. For example, specifications may include cellular specifications, and baseband firmware may be included in base software.

아래에서 본 발명의 일 실시예에 따른 이동통신 표준 문서를 기반으로 이동통신 베이스밴드 소프트웨어를 자동으로 비교 분석하는 방법 및 시스템을 보다 구체적으로 설명한다.Below, a method and system for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document according to an embodiment of the present invention will be described in more detail.

펌웨어 모호성의 수동 발견Manual discovery of firmware ambiguity

여기에서는 베이스밴드 펌웨어의 모호성을 파악하기 위한 접근 방식을 자세히 설명한다. IDA Pro라는 최첨단 정적 분석 도구에서 몇 가지 문제를 처리하고, L3 디코딩 함수를 찾고, 메시지 구조가 내장된 방식을 결정하는 방법을 설명한다. 베이스밴드 펌웨어의 모호성과 복잡성으로 인해 수동 분석이 명령이라는 점이다. 그러나 분석 절차는 일회성 작업이며, 그 중 결과는 동일한 공급업체 내의 여러 메시지, 모델 또는 버전에 재사용될 수 있다. 여기에서는 주로 공급업체1의 펌웨어 분석 경험을 공유한다. 그러나 L3 메시지 디코딩 논리를 결정하기 위해서는 상당한 노력이 필요할 수 있지만 다른 공급업체의 펌웨어에도 유사한 접근 방식을 적용할 수 있다.Here, we detail an approach for identifying ambiguity in baseband firmware. It describes how to handle some problems, find L3 decoding functions, and determine how message structures are embedded in a state-of-the-art static analysis tool called IDA Pro. Due to the ambiguity and complexity of baseband firmware, manual analysis is mandated. However, the analysis procedure is a one-time operation, the results of which can be reused for multiple messages, models or versions within the same vendor. Here, we mainly share the firmware analysis experience of vendor 1. However, a similar approach can be applied to firmware from other vendors, although significant effort may be required to determine the L3 message decoding logic.

먼저, 펌웨어를 획득한다. First, obtain the firmware.

실시예들에 따르면 적용성과 접근성 때문에 물리적 장치를 요구하지 않고 베이스밴드 펌웨어를 분석 대상으로 선택했다. 베이스밴드 펌웨어를 얻는 방법은 주로 메모리 덤프와 펌웨어 영상 두 가지가 있다. 이전 연구(비특허문헌 1)는 펌웨어 초기화에 복잡한 분석이 필요 없는 런타임 메모리 상태, 메모리 레이아웃 및 전역 변수를 포함하므로 메모리 덤프에 의존했다. 그러나 이 방법은 메모리를 덤프하려면 실제 장치가 필요하므로 확장성과 적용성이 크게 저하된다. 게다가, 실시예들에 따르면 JTAG와 같은 메모리 덤프나 하드웨어 디버그 인터페이스를 트리거하기 위한 숨겨진 메뉴가 최근의 장치들에서 비활성화되었다는 것을 발견했다.According to the embodiments, baseband firmware was selected as the target of analysis because of its applicability and accessibility, without requiring a physical device. There are mainly two ways to obtain baseband firmware: memory dump and firmware image. Previous studies (Non-Patent Document 1) relied on memory dumps because firmware initialization includes runtime memory states, memory layouts, and global variables that do not require complex analysis. However, this method requires a real device to dump memory, so scalability and applicability are greatly reduced. Furthermore, according to embodiments, it was found that the hidden menu for triggering a memory dump or hardware debug interface such as JTAG has been disabled on recent devices.

따라서 업데이트를 위해 펌웨어 이미지를 유지하는 타사 웹 사이트를 사용하기로 결정했다. 공급업체의 클라우드 저장소에서 최신 펌웨어 이미지를 다운로드할 수도 있다. 그러나 타사 저장소는 제품 모델 및 버전별로 잘 구성된 펌웨어 이미지 목록을 제공한다. 그 중에서 [표 2]에 열거된 최신 플래그십 모델의 이미지를 선택했다. 초기 펌웨어 분석을 위해서는 하나의 영상만 분석하여 그 불명확함을 분해해야 한다. 그런 다음, 동일한 공급업체 내의 다른 이미지에 지식을 적용할 수 있다. 따라서 최신 모델(즉, [표 2]의 모델 A)의 최신 버전을 분석하기로 결정했다.So, I decided to use a third-party website that maintains firmware images for updates. You can also download the latest firmware image from the vendor's cloud storage. However, third-party repositories provide well-organized lists of firmware images by product model and version. Among them, the images of the latest flagship models listed in [Table 2] were selected. For initial firmware analysis, only one image needs to be analyzed to resolve the ambiguity. The knowledge can then be applied to other images within the same vendor. Therefore, it was decided to analyze the latest version of the latest model (i.e. model A in [Table 2]).

[표 2][Table 2]

그리고, 전처리를 포함한다.And, it includes pre-processing.

실시예들은 최첨단 바이너리 분석 도구인 IDA Pro를 사용하여 베이스밴드 펌웨어를 분석한다. IDA Pro는 유망한 도구이지만 1) 메모리 레이아웃 분석 및 2) 기능 식별을 위한 추가 전처리 단계가 필요하다. IDA Pro는 베이스밴드 펌웨어의 여러 런타임 메커니즘 때문에 90K 기능 중 수백 개만 정확하게 식별한다. 그러한 한계는 이미 어려운 것으로 알려져 있다. IDA의 자동 분석은 전처리 없이 임베디드 장치의 공통 진입점인 인터럽트 처리기에서 시작하는 450개의 기능만 탐지할 수 있다([표 2] 참조). 따라서 다음과 같이 두 가지 전처리 단계를 설계한다.Embodiments analyze baseband firmware using IDA Pro, a state-of-the-art binary analysis tool. IDA Pro is a promising tool, but requires additional pre-processing steps for 1) memory layout analysis and 2) function identification. IDA Pro correctly identifies only a few hundred of the 90K features due to multiple run-time mechanisms in the baseband firmware. Such limits are already known to be difficult. IDA's automatic analysis can only detect 450 functions starting from the interrupt handler, a common entry point for embedded devices, without preprocessing (see Table 2). Therefore, we design two preprocessing steps as follows.

첫째, 메모리 레이아웃 분석을 할 수 있다. 분석을 위해 베이스밴드 펌웨어를 적절한 메모리 레이아웃에 로드해야 한다. 그렇지 않으면 펌웨어의 데이터나 함수 포인터가 잘못된 메모리 주소를 가리키게 되어 추가 분석이 상당히 방해된다. 실제로, IDA를 사용하여 펌웨어를 열었을 때, 대부분의 함수에서 데이터/함수 포인터는 데이터에 접근하거나 유효하지 않은 메모리 주소에 있는 다른 함수들을 호출하려고 시도했다.First, memory layout analysis can be performed. For analysis, the baseband firmware must be loaded into the appropriate memory layout. Otherwise, data or function pointers in the firmware will point to incorrect memory addresses, which will significantly hinder further analysis. In fact, when I opened the firmware using IDA, the data/function pointers in most functions attempted to access data or call other functions at invalid memory addresses.

실시예들에 따르면 이 잘못된 포인터 문제가 scatter-loading으로 인해 발생한다는 것을 발견했고 IDA가 이를 지원하지 못했다. scatter-loading은 실행 시 초기에 로드된 파일을 여러 메모리 영역에 재할당하는 ARM의 로드 메커니즘이다. 이 기술은 데이터 영역의 압축을 지원하여 펌웨어 크기를 줄이기 때문에 ARM 기반 임베디드 시스템에 널리 사용된다. 펌웨어를 구축할 때, armlink라는 이름의 ARM 컴파일러의 구성요소는 런타임에 펌웨어를 초기화하기 위한 scatter-loading 기능을 삽입한다. 이러한 함수는 메모리 레이아웃을 올바르게 설정하기 위해 미리 정의된 표에 따라 메모리 영역을 복사, 압축 해제 또는 0 초기화한다. 따라서 scatter-loading을 처리하지 않으면 전체 바이너리 파일이 단일 연속 메모리 영역에 로드되어 데이터/함수 포인터가 유효하지 않게 된다.Embodiments have found that this bad pointer problem is caused by scatter-loading, and IDA does not support it. Scatter-loading is ARM's loading mechanism that reallocates an initially loaded file into several memory regions at run time. This technology is widely used in ARM-based embedded systems because it supports compression of the data area to reduce firmware size. When building firmware, a component of the ARM compiler named armlink inserts a scatter-loading function to initialize the firmware at runtime. These functions copy, decompress, or zero-initialize memory regions according to a predefined table to set the memory layout correctly. So, if scatter-loading is not taken care of, the entire binary file will be loaded into a single contiguous memory area, making the data/function pointers invalid.

scatter-loading 문제를 처리하고 적절한 메모리 레이아웃을 만들기 위해 scatter-loading 프로세스를 에뮬레이트한다. 특히, 실시예들은 scatter-loading 기능의 동작을 에뮬레이트한다. 즉, 메모리 영역을 복사, 압축 해제 및 0 초기화한다. 이러한 scatter-loading 기능은 고도로 최적화된 형태의 armlink로 사전 정의되므로 대부분의 최신 ARM 내장 장치는 이러한 기능을 재사용한다. 따라서 IDA의 FITH와 유사한 서명을 사용하여 이러한 기능을 탐지할 수 있다. scatter-loading 함수를 감지한 후, 실시예들은 이들의 상호 참조를 분석하고 사전 정의된 scatter-loading 표를 식별한다. 이 표에는 scatter-loading 함수의 실행 순서 및 파라미터를 나타내는 정보가 포함되어 있다. 표에 명시된 scatter-loading 프로세스를 에뮬레이트한다.It handles the scatter-loading problem and emulates the scatter-loading process to create the proper memory layout. In particular, the embodiments emulate the operation of a scatter-loading function. That is, the memory area is copied, decompressed, and initialized to 0. These scatter-loading functions are predefined in a highly optimized form of armlink, so most modern ARM embedded devices reuse these functions. Therefore, signatures similar to IDA's FITH can be used to detect these features. After detecting the scatter-loading functions, embodiments analyze their cross-references and identify a predefined scatter-loading table. This table contains information indicating the execution order and parameters of the scatter-loading function. Emulates the scatter-loading process specified in the table.

둘째, 기능 식별을 할 수 있다. 실시예들에 따른 대상(즉, 공급업체1의 펌웨어)은 ARM 구조를 기반으로 한다. 펌웨어의 기능을 식별하기 위해서는 사전에 바이트 코드를 분해해야 한다. 그러나 ARM 구조가 ARM과 Thumb의 두 가지 명령어 집합을 지원하기 때문에 ARM에서 알 수 없는 바이트를 분해하는 것은 오류가 발생하기 쉽다. ARM 명령어 집합은 32 비트 명령을 실행하는 기본 모드이며, Thumb 명령어 집합은 코드 크기를 줄이기 위해 소형의 16 비트 명령을 지원한다. 동일한 바이트를 두 가지 다른 지침으로 분해할 수 있기 때문에 직접 분해하면 많은 잘못된 분해 코드가 발생할 수 있다.Second, functional identification can be done. An object according to the embodiments (ie, vendor 1's firmware) is based on an ARM structure. In order to identify the functions of the firmware, the bytecode must be disassembled beforehand. However, disassembling an unknown byte in ARM is error-prone because the ARM architecture supports two instruction sets: ARM and Thumb. The ARM instruction set is the default mode for executing 32-bit instructions, and the Thumb instruction set supports small 16-bit instructions to reduce code size. Direct decomposition can lead to a lot of bad decomposition code, because the same bytes can be decomposed into two different instructions.

이 과제를 해결하기 위해 i) 빈번한 함수 프롤로그와 ii) Thumb 모드에서 함수 포인터의 특성을 활용하는 두 가지 간단한 기법을 설계했다. 첫째, 확인된 기능을 조사하여 ARM 모드와 Thumb 모드를 구별할 수 있는 기능 프롤로그 서명을 작성한다. 이러한 프롤로그 서명은 ARM 및 Thumb 모드에서 모두 PUSH 지침으로 구성된다. 그런 다음, 해당 서명을 검색하고 일치하는 서명이 발견되면 일치하는 서명 모드에서 분석하려고 시도한다. 서명 기반 일치에서 false positive(위양성)을 줄이기 위해 프롤로그 함수가 레지스터를 정상적으로 처리하는지 여부를 확인한다. 예를 들어 대부분의 기능은 LR 레지스터를 스택에 밀어 넣지만 PC 레지스터를 밀어 넣지 않으므로 PC 레지스터를 밀어 넣는 프롤로그나 LR 레지스터를 밀어 넣지 않는 프롤로그는 폐기한다.To solve this challenge, we designed two simple techniques: i) frequent function prologues and ii) taking advantage of the properties of function pointers in Thumb mode. First, a function prologue signature that can distinguish between ARM mode and Thumb mode is prepared by examining the confirmed function. These prolog signatures consist of PUSH instructions in both ARM and Thumb modes. It then searches for that signature, and if a matching signature is found, it tries to parse it in matching signature mode. To reduce false positives in signature-based matching, we check whether the prolog function handles registers normally. For example, most functions push LR registers onto the stack but don't push PC registers, so discard prologs that push PC registers or prologs that don't push LR registers.

함수 프롤로그를 탐지한 후 데이터 섹션의 함수 포인터를 분석하여 함수를 추가로 식별한다. 이를 위해 Thumb 모드의 특성을 활용한다. Thumb 모드 함수에 대한 함수 포인터는 홀수 주소를 사용한다. 특히, 포인터 값의 최소 비트는 항상 1이다. 대부분의 데이터는 짝수 주소와 정렬되므로 코드 섹션을 가리키는 홀수 주소는 Thumb 모드 함수 포인터일 수 있다. 그러므로, 실시예들은 그러한 포인터를 통해 간접적으로 호출되는 함수들을 찾을 수 있다.After detecting the function prologue, the function pointers in the data section are analyzed to further identify the function. To this end, the characteristics of Thumb mode are utilized. Function pointers for Thumb mode functions use odd addresses. In particular, the least bit of a pointer value is always 1. Most data is aligned to even addresses, so odd addresses pointing to sections of code can be Thumb mode function pointers. Therefore, embodiments can find functions that are called indirectly through such pointers.

이러한 전처리 기법은 IDA의 기능 식별 성능을 크게 향상시킨다. IDA Pro에 의해 식별된 기능의 초기 수는 [표 2]에 표시된 것처럼 모델 A에 대해 450개였다. 프롤로그 검출과 포인터 분석으로 구성된 메모리 레이아웃 분석 및 기능 식별을 동일한 펌웨어에 적용했다. 실시예들에 따른 메모리 레이아웃 분석은 504개의 새로운 기능을 발견했고, 실시예들에 따른 기능 식별 기술, 즉 기능 프롤로그 감지와 기능 포인터 분석은 각각 31,955개와 2,526개의 새로운 기능을 발견했다. IDA Pro에 새로 식별된 함수를 부여하면 각 함수의 코드 참조를 추가로 분석하여 더 많은 함수를 재귀적으로 찾는다. 결과적으로, 실시예들에 따른 전처리 단계는 IDA Pro가 결국 91,481개의 기능을 식별하는 데 도움이 되었다. 전처리는 일회성 작업일 뿐이며 다른 수동 작업 없이 다른 장치 모델의 펌웨어에 적용할 수 있다. 실제로 [표 2]에 나열된 다른 최신 모델을 성공적으로 전처리할 수 있다. IDA의 자동 분석을 포함하여 전체 전처리에 소요된 평균 시간은 2,557초였다. 경우에 따라 IDA는 전처리 전에 더 많은 함수를 찾거나(모델 B의 경우), 또는 자동 분석에 더 많은 시간이 필요했다(모델 A의 경우). 실시예들은 이 특이치들을 조사하고 있다.This preprocessing technique greatly improves the function identification performance of IDA. The initial number of features identified by IDA Pro was 450 for Model A, as shown in Table 2. Memory layout analysis and function identification consisting of prolog detection and pointer analysis were applied to the same firmware. The memory layout analysis according to the embodiments found 504 new functions, and the function identification techniques, that is, function prologue detection and function pointer analysis according to the embodiments, discovered 31,955 and 2,526 new functions, respectively. If you give IDA Pro a newly identified function, it will recursively find more functions by further analyzing each function's code reference. As a result, the pre-processing step according to the embodiments helped IDA Pro eventually identify 91,481 features. The pre-processing is only a one-time operation and can be applied to the firmware of different device models without any other manual work. In fact, other state-of-the-art models listed in Table 2 can be successfully preprocessed. The average time taken for the entire preprocessing including the automatic analysis of IDA was 2,557 seconds. In some cases, IDA found more functions before preprocessing (for model B), or required more time for automatic analysis (for model A). Examples investigate these outliers.

또한, 계층 3 디코더를 식별한다.Also identifies a layer 3 decoder.

표준 L3 메시지를 조사하려면 먼저 바이너리 분석을 통해 디코딩 논리를 찾아야 한다. 이 디코딩 논리를 디코더 함수로 구현하는 함수를 디코더 함수라고 부른다. 여기에서는 디코더 기능이 L3 메시지 구조에 대한 기계 친화적인 정보를 가지고 있기 때문에 디코더 기능에 초점을 맞춘다. 상술한 바와 같이, L3 프로토콜 메시지는 표준화된 구조를 가지고 있다. 메시지를 올바르게 구문 분석하기 위해 개발자는 규격의 메시지 구조를 기계 친화적인 형식으로 내장한다. 따라서 임베디드 구조를 체계적으로 분석하고 펌웨어가 L3 메시지를 해독하는 방법을 이해할 수 있다.To examine standard L3 messages, we first need to find the decoding logic through binary analysis. A function that implements this decoding logic as a decoder function is called a decoder function. Here, we focus on the decoder function because it has machine-friendly information about the L3 message structure. As described above, the L3 protocol message has a standardized structure. To parse messages correctly, developers embed the message structure of the specification into a machine-friendly format. Thus, you can systematically analyze the embedded structure and understand how the firmware decodes L3 messages.

디코더를 식별하기 위해 개발자가 베이스밴드 바이너리에서 남긴 디버그 정보(예컨대, 로깅 메시지)를 활용한다. 이 디버그 정보는 바이너리가 제거되면 사라지는 -g 옵션으로 컴파일러가 삽입한 정보와는 다르며, 이는 아래에서 보다 상세히 설명한다. 디버그 정보는 임베디드 장치(비특허문헌 1)를 분석할 때 손상된 바이너리에서 특정 함수를 찾는 데 일반적으로 사용되기 때문에 바이너리 분석의 다양한 방법 중에서 이 접근방식을 선택한다. 베이스밴드 펌웨어가 손상되고 90K 이상의 수많은 기능으로 구성되어(30MB 이상) 매우 크기 때문에 이러한 정보 없이 L3 디코더를 찾는 것은 상당히 어렵다. 따라서, 실시예들은 디버그 메시지를 사용하고 자세한 내용을 추가 연구를 위한 예를 아래에 공유한다. 마찬가지로, 디버그 정보의 구조는 다르지만 다른 공급업체의 펌웨어에서 디버그 정보를 사용하여 L3 디코더 기능을 발견했다. 한편, 표준 L3 메시지에 대한 비교 분석을 수행하는 실시예들에 따른 최종 목표는 디코더 기능이 식별되는 방법에 의존하지 않는다. 이 프로세스에는 다른 기술도 사용할 수 있다.It utilizes debug information (eg logging messages) left by the developer in the baseband binary to identify the decoder. This debug information is different from information inserted by the compiler with the -g option, which disappears when the binary is removed, and is described in more detail below. Since debug information is generally used to find a specific function in a corrupted binary when analyzing an embedded device (Non-Patent Document 1), this approach is selected from various methods of binary analysis. Finding the L3 decoder without this information is quite difficult, as the baseband firmware is corrupt and is very large (more than 30MB) with a lot of features over 90K. Therefore, the embodiments use the debug message and share the details below as an example for further study. Similarly, I found an L3 decoder function using debug information from another vendor's firmware, although the structure of the debug information is different. On the other hand, the final goal according to the embodiments of performing comparative analysis on a standard L3 message does not depend on how the decoder function is identified. Other techniques can also be used for this process.

먼저, 베이스밴드 바이너리에서 모든 디버그 정보를 검색한 다음 해당 디버그 정보만 참조하는 함수를 분석한다. 디버그 정보를 검색하는 동안 펌웨어가 특정 구조를 사용하여 디버그 메시지와 정보를 기록한다. 매직 값 DBT로 시작하는 구조는 파일 경로 및 참조되는 줄 번호와 함께 디버그 메시지를 포함한다. 따라서 먼저 DBT를 사용하여 모든 디버그 정보를 검색한다. 수많은 함수가 디버그 정보를 간접적으로 참조하기 때문에(약 10만 건), 실시예들은 디버그 정보와 함수를 정확하게 일치시키기 위해 경량 역 슬라이스 분석을 수행한다. 다음으로, 실시예들은 디버그 정보의 파일 경로를 기반으로 각 함수를 분류하는데, 이는 동일한 계층이나 라이브러리의 함수들이 경로를 공유할 수 있기 때문이다. 분류 후, L3, SS, EMM 또는 NAS와 같은 키워드를 포함하는 디버그 메시지와 경로를 사용하여 L3 함수를 찾는다. 그런 다음, 디코드, 코덱, 여러 IE의 이름 등의 키워드를 사용하여 수신 메시지 디코딩과 관련된 기능을 찾는다. 결과적으로, 실시예들에 따르면 표준 L3 메시지를 분석하는 함수를 식별했다. 실시예들은 단일 디코더 함수가 프로토콜에 관계없이 모든 표준 L3 메시지를 디코딩한다는 것을 발견했다. 이러한 메시지는 동일한 표준 구조를 가지기 때문이다. 따라서 디코더 하나로도 충분히 처리할 수 있다.First, it retrieves all the debug information from the baseband binary, and then analyzes functions that only reference that debug information. While retrieving debug information, the firmware uses a specific structure to log debug messages and information. A structure starting with the magic value DBT contains the debug message along with the file path and line number being referenced. Therefore, we first retrieve all the debug information using DBT. Since a large number of functions indirectly reference debug information (about 100,000), embodiments perform lightweight inverse slice analysis to accurately match functions with debug information. Next, the embodiments classify each function based on the file path of the debug information, since functions in the same layer or library can share the path. After classification, L3 functions are found using path and debug messages containing keywords such as L3, SS, EMM or NAS. It then uses keywords such as decode, codec, and names of different IEs to find features related to decoding incoming messages. Consequently, we have identified a function that parses a standard L3 message according to embodiments. Embodiments have found that a single decoder function decodes all standard L3 messages regardless of protocol. This is because these messages have the same standard structure. Therefore, even one decoder can be sufficiently processed.

그리고, 바이너리 임베디드 메시지 구조를 얻는다. And, get the binary embedded message structure.

도 6을 본 발명의 일 실시예에 따른 메시지 구조를 나타내는 도면이다.6 is a diagram showing a message structure according to an embodiment of the present invention.

마지막으로, 표준 L3 메시지 구조가 베이스밴드 바이너리에 어떻게 내장되는지 결정한다. 이를 위해 디코더 기능과 데이터 참조를 분석한다. 내장된 메시지 구조(620)의 단순화된 구조는 도 6에 도시되어 있으며, 규격서 메시지 구조(610)의 EMM, ATTACH REJECT 메시지 구조를 예로 들 수 있다. 내장된 메시지 구조(620)는 네 가지 유형의 목록으로 계층 구조로 인코딩된다.Finally, it determines how the standard L3 message structure is embedded into the baseband binary. To do this, we analyze decoder functions and data references. A simplified structure of the embedded message structure 620 is shown in FIG. 6, and the EMM, ATTACH REJECT message structure of the specification message structure 610 can be taken as an example. The embedded message structure 620 is hierarchically encoded with four types of lists.

첫째, 프로토콜 목록(Protocol List, 621)은 계층의 최상위 목록이다. 그것은 각 L3 프로토콜의 메시지 목록(Msg List, 622)에 대한 포인터를 보유하며 PD에 의해 색인화된다. EMM 프로토콜의 PD가 7이므로 프로토콜 목록의 7번째 항목에 접근한다.First, the protocol list (Protocol List, 621) is the topmost list of the hierarchy. It holds a pointer to each L3 protocol's message list (Msg List, 622) and is indexed by the PD. Since the PD of the EMM protocol is 7, the 7th item in the protocol list is accessed.

둘째, 각 프로토콜에 대해 Msg 목록(622)이 정의된다. 그것은 프로토콜의 각 메시지의 Msg IE 목록(623)에 대한 포인터를 보유하며 메시지 ID 값에 의해 색인화된다. 이 예에서 EMM, ATTACH REJECT 메시지의 메시지 ID는 0x44이므로, 해당 Msg IE 목록(623)은 0x44를 사용하여 액세스된다.Second, a Msg list 622 is defined for each protocol. It holds a pointer to the Msg IE list 623 of each message of the protocol, indexed by message ID value. In this example, the message ID of the EMM, ATTACH REJECT message is 0x44, so the corresponding Msg IE list 623 is accessed using 0x44.

셋째, 각 메시지에 대해 Msg IE 목록(623)이 정의된다. 메시지에는 메시지의 각 IE에 대한 명령 플래그와 인덱스가 포함되어 있다. 명령 플래그에는 IE가 메시지에서 명령 또는 비명령으로 인코딩되는지 여부가 표시되며 색인은 전역 IE 목록(624)에서 IE 위치를 나타낸다. 예에서와 같이 EMM, ATTACH REJECT 메시지의 처음 세 IE(PD, 보안 헤더 유형 및 메시지 유형)는 모든 메시지의 공통 IE이므로 Msg IE 목록에 나열되지 않는다.Third, a Msg IE list 623 is defined for each message. The message contains the command flag and index for each IE in the message. The command flag indicates whether the IE is encoded as a command or non-command in the message and the index indicates the IE's position in the global IE list 624. As in the example, the first three IEs (PD, Security Header Type, and Message Type) of the EMM, ATTACH REJECT message are common IEs for all messages, so they are not listed in the Msg IE list.

넷째, 전역(global) IE 목록(624)은 L3 프로토콜에 사용되는 모든 IE에 대한 정보를 포함하고 있으며 각 IE에 할당된 인덱스로 접근한다. 이 정보는 IE의 길이와 IEI로 구성된다. 여기서, 길이는 값 부분의 크기만 나타내지만 규격서의 길이는 IE의 전체 크기를 나타낸다.Fourth, the global IE list 624 includes information on all IEs used in the L3 protocol and is accessed through an index assigned to each IE. This information consists of the length of the IE and the IEI. Here, the length represents only the size of the value part, but the length of the specification represents the entire size of the IE.

이러한 목록을 반복하여 모든 내장된 메시지 구조(620)를 추출한다. 펌웨어 분석을 통해 L3 디코더 주소와 펌웨어의 메시지 구조 정보를 얻는 방법을 확인할 수 있다. 이 지식을 사용하여 아래에 설명된 바와 같이 BASESPEC을 자동화한다.Iterate through this list to extract all embedded message structures (620). Through firmware analysis, we can see how to obtain L3 decoder address and message structure information in firmware. Use this knowledge to automate BASESPEC as described below.

BASESPEC 설계BASESPEC DESIGN

여기에서는 BASESPEC의 설계를 상세히 설명한다. 앞에서 설명한 도 3은 BASESPEC의 개요를 보여준다. BASESPEC는 베이스밴드 펌웨어의 메시지 구조와 규격서의 메시지 구조를 비교하여 불일치를 자동으로 보고한다. 이를 위해, BASESPEC는 먼저 규격과 바이너리별 메타데이터가 있는 펌웨어에서 메시지 구조를 추출한다. 그런 다음, 규격의 구조를 바이너리 임베디드 구조 구문적으로 비교하고 기호 실행을 사용하여 구현 논리를 의미론적으로 검토한다. 이러한 비교를 바탕으로 BASESPEC은 규격과 MP 사이의 다양한 유형의 불일치를 보고한다. BASESPEC는 바이너리(즉, 알 수 없는 불일치) 또는 규격(즉, 누락된 불일치)에만 존재하는 의심스러운 IE를 보고한다. 또한, 길이가 다른 IE(즉, 유효하지 않은 불일치)를 보고한다. 불일치 결과를 얻은 후, 실시예들은 그 함축성을 추가로 분석할 수 있다.Here, the design of BASESPEC is explained in detail. Figure 3 described above shows an overview of BASESPEC. BASESPEC automatically reports discrepancies by comparing the message structure of the baseband firmware with the message structure of the specification. To do this, BASESPEC first extracts the message structure from firmware with specifications and binary-specific metadata. Then, the structure of the specification is syntactically compared to the binary embedded structure, and the implementation logic is semantically reviewed using symbolic execution. Based on these comparisons, BASESPEC reports various types of discrepancies between specifications and MPs. BASESPEC reports suspicious IEs that exist only in binaries (i.e. unknown mismatches) or specifications (i.e. missing mismatches). It also reports IEs of different lengths (i.e., invalid mismatches). After obtaining discrepant results, the examples can be further analyzed for their implications.

먼저, 규격서에서 메시지 구조를 추출한다.First, the message structure is extracted from the specification.

베이스밴드 펌웨어의 L3 메시지 구조를 검사하기 위해 BASESPEC은 규격서에서 참조 구조를 추출한다. 3GPP와 그 협력 기관들은 그들의 웹사이트에 명세 문서를 제공한다. BASESPEC은 [표 1]에 나열된 최신 규격서를 다운로드하고 원시 텍스트 형식으로 변환한다. 그런 다음, 정규 표현을 사용하여 변환된 원시 텍스트에서 메시지 구조를 추출한다. 규격서의 메시지 구조는 도 6에서와 같이 IE 형식의 목록인 메시지 내용과 각 L3 프로토콜에 대한 메시지 유형 목록의 두 부분을 포함한다. BASESPEC은 각 표준 L3 메시지에 대해 이러한 구조를 자동으로 구문 분석한다.To examine the L3 message structure of the baseband firmware, BASESPEC extracts a reference structure from the specification. 3GPP and its partners provide specification documents on their websites. BASESPEC downloads the latest specifications listed in Table 1 and converts them to raw text format. Then, the message structure is extracted from the transformed raw text using regular expressions. As shown in FIG. 6, the message structure of the specification includes two parts: message content, which is a list in IE format, and message type list for each L3 protocol. BASESPEC automatically parses these structures for each standard L3 message.

이 텍스트 처리는 사소한 것처럼 보이지만, BASESPEC은 아래에 열거된 몇 가지 문제 상황을 해결해야 한다.Although this text processing may seem trivial, BASESPEC should address several problem situations listed below.

첫째, 변환 오류를 해결해야 한다. 규격서를 원시 텍스트 형식으로 변환하면 몇 가지 유형의 오류가 발생한다. 규격서는 Microsoft Word(예컨대, DOC) 또는 Adobe(예컨대, PDF) 형식과 같이 인간 친화적인 형식으로 작성되어 있다. 시각적 풍부함(예컨대, 표와 도면)은 독자들이 이러한 문서를 더 완전하게 이해하는 데 도움이 된다. 그러나 BASESPEC은 체계적인 분석을 위해 문서를 기계가 이해할 수 있는 형식으로 변환해야 한다. 이러한 인간 친화적인 문서를 원시 텍스트 형식으로 변환하려면 OCR과 같은 오류가 발생하기 쉬운 방법이 필요하다. 따라서 이러한 변환은 종종 부정확하거나 누락된 단어/문장을 포함한 몇 가지 오류를 초래한다.First, conversion errors must be resolved. Converting the specification to raw text format introduces several types of errors. The specification is written in a human-friendly format such as Microsoft Word (eg DOC) or Adobe (eg PDF) format. Visual richness (eg, tables and diagrams) helps readers understand these documents more fully. However, BASESPEC needs to convert documents into a machine-understandable format for systematic analysis. Error-prone methods such as OCR are needed to convert these human-friendly documents into raw text format. Therefore, these conversions often result in several errors including incorrect or missing words/sentences.

이러한 변환 오류를 완화하기 위해 BASESPEC은 서로 다른 문서 형식을 함께 사용한다. 3GPP와 ETSI는 동일한 규격서를 3GPP의 DOC 파일과 ETSI의 PDF 파일 두 가지 형식으로 제공한다. 각 형식의 변환 오류는 결정론적이고 보완적이라는 것을 발견했다. 예를 들어 EMM 및 ESM 메시지의 규격서([표 1])를 처리할 때 DOC 파일의 메시지 유형에 대한 표 변환은 실패했지만 PDF 파일 변환은 성공적이었다. 이와는 대조적으로, 메시지 내용을 위해 표를 변환하는 것은 반대의 경우를 보여주었다. 따라서 BASESPEC은 변환된 표의 행 수를 확인하여 서로 다른 변환 결과 사이에서 올바른 원시 텍스트를 선택한다. 행이 더 많은 표가 올바른 텍스트일 가능성이 높다. 놀랍게도, 이 접근법은 오류를 발생시키지 않았다. 변환을 위해 BASESPEC은 DOC와 PDF 파일에 각각 antiword 및 pdftotext를 사용한다.To mitigate these conversion errors, BASESPEC uses different document formats together. 3GPP and ETSI provide the same specifications in two formats: 3GPP's DOC file and ETSI's PDF file. We found that the conversion errors of each format are deterministic and complementary. For example, when processing the specifications of EMM and ESM messages (Table 1), the conversion of the tables for message types in DOC files failed, but conversion to PDF files was successful. In contrast, converting tables for message content showed the opposite case. Therefore, BASESPEC checks the number of rows in the converted table to select the correct raw text among the different conversion results. Tables with more rows are more likely to have correct text. Surprisingly, this approach produced no errors. For conversion, BASESPEC uses antiword and pdftotext for DOC and PDF files, respectively.

둘째, 단어 불일치를 해결해야 한다. BASESPEC은 텍스트 처리를 위해 규격서에 일관성이 없는 많은 단어를 다루어야 한다. 규격은 수많은 사람들이 수작업으로 작성하기 때문에 이러한 모순이 불가피하여 규격을 체계적으로 분석하기가 어렵다. 이러한 불일치에는 중복 및/또는 누락 단어 5건, 단어 사이에 잘못된 공백 14건, 축약 사용 5건, 잘못된 구분 기호 14건 및 단일 의미를 나타내는 여러 다른 용어가 포함된다. 예를 들어, RR 메시지인 SYSTEM INFORMATION TYPE 15는 SYSTEM INFORMATION 15로 작성되기도 한다. 또한, 셀룰러 장치로 전송되는 다운링크(DL) 메시지 표시에는 UE, 기지국, MS, DL의 네 가지 다른 이름이 있다. 또한, DTM ASSIGNMENT COMMAND 메시지에는 하나의 IE 형식 길이에서 누락된 구분 기호 '-'가 있다. 일부 표 이름에는 서비스 요청 메시지 내용과 같은 중복 단어가 있다. 실시예들은 이러한 모든 불일치를 해결하고 비교를 위해 메시지 정보를 성공적으로 검색했다. 이 분야의 텍스트 처리를 적용하는 향후 연구를 위해 수정할 수 있도록 3GPP에 문제를 보고했다.Second, word mismatches must be resolved. BASESPEC has to deal with many inconsistent words in the specification for text processing. Since specifications are written manually by many people, such contradictions are unavoidable, making it difficult to systematically analyze specifications. These discrepancies included 5 duplicated and/or missing words, 14 incorrect spaces between words, 5 uses of abbreviations, 14 incorrect delimiters, and several other terms with a single meaning. For example, the RR message SYSTEM INFORMATION TYPE 15 is sometimes written as SYSTEM INFORMATION 15. In addition, there are four different names for downlink (DL) message indications transmitted to cellular devices: UE, base station, MS, and DL. Also, the DTM ASSIGNMENT COMMAND message has a missing delimiter '-' in one IE format length. Some table names have duplicate words, such as service request message content. Embodiments resolved all these discrepancies and successfully retrieved message information for comparison. We have reported the problem to 3GPP so that it can be corrected for future research applying text processing in this area.

셋째, 불규칙한 IE 형식을 해결해야 한다. 규격서에서 메시지 구조를 추출하는 동안 여러 중첩된 IE와 잘못된 IE 형식을 발견했다. 예를 들어 일부 SMS 메시지에는 중첩 메시지가 있을 수 있으므로 중첩 메시지의 IE를 확인해야 한다. 메시지 구조를 적절하게 비교하기 위해 중첩된 IE를 납작하게 만들었다. 또한, INTER SYSTEM TO UTRAN HANDOVER COMMAND 메시지의 CN-MS 투명 정보 IE에는 IEI가 없으므로 TLV 형식이 유효하지 않다. TLV 형식이 있는 IE에는 IEI가 포함되어야 한다. 그러나, 이는 규격에 정의된 예외적인 경우였다. 여기에서는 결과를 비교할 때 위의 사례들을 처리하기 위해 예외를 두었다.Third, the irregular IE format needs to be addressed. While extracting the message structure from the specification, I found several nested IEs and invalid IE formats. For example, some SMS messages may have overlapping messages, so you should check IE for overlapping messages. I flattened the nested IE to properly compare the message structure. Also, the TLV format is invalid because there is no IEI in the CN-MS transparent information IE of the INTER SYSTEM TO UTRAN HANDOVER COMMAND message. IEs with TLV format must contain IEIs. However, this was an exceptional case defined in the standard. An exception is made here to handle the above cases when comparing results.

다음으로, 바이너리별 메타데이터를 추출한다.Next, metadata for each binary is extracted.

추가 분석을 위해 BASESPEC은 구문 비교를 위한 바이너리 임베디드 메시지 구조 정보 및 의미 비교를 위한 L3 디코더 주소 등 바이너리별 메타데이터를 추출한다. 이는 서로 다른 베이스밴드 바이너리에 걸쳐 구별된다. 그러나 BASESPEC은 베이스밴드 바이너리에 관계없이 이 정보를 추출할 수 있으며 다중 베이스밴드 모델 또는 버전에 적용할 수 있다.For further analysis, BASESPEC extracts binary-specific metadata such as binary embedded message structure information for syntax comparison and L3 decoder address for semantic comparison. This is differentiated across different baseband binaries. However, BASESPEC can extract this information regardless of the baseband binary and can be applied to multiple baseband models or versions.

펌웨어 영상이 주어지면, BASESPEC은 모든 펌웨어 분석 절차를 수행하고 바이너리별 메타데이터를 추출한다. 펌웨어 전처리 자동화를 위해 BASESPEC은 IDA Pro의 SCIT와 유사하게 scatter-loading과 관련된 기능의 사전 작성된 서명을 검색한다. 그런 다음, 복사, 압축 해제, 0 초기화 기능을 에뮬레이트한다. 그런 다음, BASESPEC은 로드된 펌웨어를 스캔하여 Thumb 모드 기능에 대한 함수 프롤로그와 포인터를 탐지한다. L3 디코더 식별 자동화를 위해 L3 관련 디버그 구조를 올바르게 식별하기 위해 앞뒤 슬라이서를 구현한다. 그런 다음, 디버그 구조를 상호 참조함으로써 L3 디코더를 식별할 수 있다. 마지막으로, BASESPEC은 함수가 L3 메시지를 디코딩하는 동안 구조를 참조할 때 디코더에서 메시지 구조의 주소를 찾는다. 메시지 구조는 구문 비교에 사용되며 디코더에 관한 정보는 의미론적 비교에 사용된다.Given a firmware image, BASESPEC performs all firmware analysis procedures and extracts binary-specific metadata. For firmware pre-processing automation, BASESPEC retrieves pre-written signatures of functions related to scatter-loading, similar to IDA Pro's SCIT. Then, it emulates copy, decompress, and zero-initialization functions. BASESPEC then scans the loaded firmware to detect function prologues and pointers to Thumb mode functions. For automating L3 decoder identification, implement forward and backward slicers to correctly identify L3-related debug structures. The L3 decoder can then be identified by cross-referencing the debug structure. Finally, BASESPEC looks up the address of the message structure in the decoder when the function references the structure while decoding the L3 message. The message structure is used for syntactic comparison and the information about the decoder is used for semantic comparison.

그리고, 메시지 구조의 구문을 비교한다.Then, the syntax of the message structure is compared.

BASESPEC은 먼저 IE 레벨의 세분성에서 베이스밴드 바이너리로부터 추출한 메시지 구조와 규격서에서 추출한 메시지 구조를 구문적으로 비교한다. 규격의 각 메시지에 대해 BASESPEC은 PD 및 메시지 유형을 사용하여 바이너리에서 해당 메시지를 가져온다. 다음으로, BASESPEC은 유형(즉, 명령 또는 비명령)에 따른 바이너리로부터 규격에서 메시지의 IE를 반복적으로 매핑한다. 마지막으로 BASESPEC은 매핑된 IE를 비교하고 불일치를 보고하며, 이를 구문 불일치라고 한다. 이러한 구문 불일치는 베이스밴드 바이너리에 메시지 구조를 내장하는 개발자의 실수를 직접적으로 식별할 수 있다. 구문 비교 절차를 다음과 같이 자세히 설명한다.BASESPEC first syntactically compares the message structure extracted from the baseband binary at the granularity of the IE level with the message structure extracted from the specification. For each message in the specification, BASESPEC fetches that message from the binary using the PD and message type. Next, BASESPEC recursively maps the IE of messages in the specification from binary according to type (i.e., command or non-command). Finally, BASESPEC compares the mapped IEs and reports mismatches, which we call syntactic mismatches. This syntactical inconsistency can directly identify a developer's mistake in embedding the message structure in the baseband binary. The syntax comparison procedure is explained in detail as follows.

도 7은 본 발명의 일 실시예에 따른 구문 비교의 예시를 나타내는 도면이다.7 is a diagram illustrating an example of syntax comparison according to an embodiment of the present invention.

1) 메시지 가져오기: 규격의 각 메시지에 대해 BASESPEC은 먼저 PD와 메시지 ID를 사용하여 베이스밴드 바이너리에서 해당 메시지 구조를 가져온다. 도 7에 도시된 바와 같이, BASESPEC은 각각 PD(0x7, 710)와 메시지 ID(0x44, 720)를 프로토콜 목록 및 Msg 목록의 지수로 사용하여 EMM ATTACH REJECT 메시지에 해당하는 Msg IE 목록을 가져온다.1) Retrieving message: For each message in the specification, BASESPEC first fetches the corresponding message structure from the baseband binary using the PD and message ID. As shown in FIG. 7, BASESPEC uses PD (0x7, 710) and message ID (0x44, 720) as indexes of the protocol list and Msg list, respectively, to bring the Msg IE list corresponding to the EMM ATTACH REJECT message.

2) IE 매핑: 다음으로, BASESPEC은 각 IE를 Msg IE 목록에서 규격의 IE로 매핑한다. BASESPEC은 명령 IE로서 IE 유형에 따라 이 매핑을 수행하며 비명령 IE는 고유한 형식을 가지고 있다. 명령 IE의 경우, BASESPEC는 메시지에서 고정된 순서를 가지므로 순서에 의존한다. 예를 들어, 도 7에서 BASESPEC는 Msg IE 목록의 첫 번째 항목은 명령 플래그가 설정된 첫 번째 IE인 EM 원인 IE를 나타낸다고 결론짓는다. Msg IE 목록에는 메시지의 헤더 IE(즉, PD 및 메시지 ID)가 이미 Msg IE 목록을 가져오는 데 사용되므로 메시지 ID 뒤에 IE만 포함된다. 임의의 순서로 나타날 수 있는 비명령 IE의 경우, BASESPEC은 식별자인 IEI를 사용한다. 예를 들어, BASESPEC는 도 7의 Msg IE 목록의 두 번째 항목을 ESM 메시지 컨테이너 IE로 간주한다. 왜냐하면 IEI(0x78)가 명세서의 IE와 일치하기 때문이다.2) IE mapping: Next, BASESPEC maps each IE from the Msg IE list to the IE of the specification. BASESPEC is a command IE that does this mapping according to the IE type, and non-command IEs have their own format. In the case of the command IE, BASESPEC has a fixed order in the message and therefore is order dependent. For example, in FIG. 7 BASESPEC concludes that the first entry in the Msg IE list represents the EM cause IE, which is the first IE for which the command flag is set. The Msg IE list contains only the IE after the message ID because the message's header IE (i.e. PD and Message ID) is already used to get the Msg IE list. For non-command IEs that can appear in any order, BASESPEC uses the identifier IEI. For example, BASESPEC regards the second item of the Msg IE list in FIG. 7 as an ESM message container IE. This is because the IEI (0x78) matches the IE in the specification.

매핑 프로세스 후, BASESPEC은 매핑되지 않은 나머지 IE를 누락 또는 알 수 없는 불일치로 보고한다. 누락된 불일치는 규격에 존재하지만 바이너리로 구현되지 않는 IE를 나타낸다. 한편, 알 수 없는 불일치는 바이너리로 존재하는 IE를 나타낸다. 예를 들어, BASESPEC은 IEI(0x5F)가 Msg IE 목록에 존재하지 않기 때문에 도 7의 T3346 값을 매핑하지 못한다. 따라서 BASESPEC은 이를 누락된 불일치로 보고한다. 마찬가지로, IEI(0xff)는 규격에 해당 IE가 없기 때문에 BASESPEC은 Msg IE 목록의 세 번째 IE를 알 수 없는 불일치로 보고한다.After the mapping process, BASESPEC reports the remaining unmapped IEs as missing or unknown mismatches. A missing inconsistency indicates an IE that exists in the specification but is not implemented as a binary. On the other hand, an unknown mismatch indicates IE present in binary. For example, BASESPEC cannot map the value of T3346 in FIG. 7 because IEI (0x5F) does not exist in the Msg IE list. BASESPEC therefore reports this as a missing discrepancy. Similarly, since IEI (0xff) has no corresponding IE in the specification, BASESPEC reports the third IE in the Msg IE list as an unknown mismatch.

3) IE 비교: BASESPEC은 매핑에서 IE 쌍을 비교하고 불일치 결과를 보고한다. BASESPEC은 먼저 규격의 IE를 바이너리에 대하여 유사한 형식으로 변환해야 한다. 특히, BASESPEC은 바이너리와 규격의 길이가 다르기 때문에 규격의 IE 길이를 조정한다. 바이너리에서의 길이는 IE의 값 부분(즉, 값의 길이)만 고려하는 반면, 규격에서의 길이는 IEI와 LI(즉, IE 길이)도 포함한다. BASESPEC은 규격)의 형식에 따라 IEI와 LI의 크기를 뺀다. 예를 들어, 도 7에 도시된 바와 같이, BASESPEC은 그것의 형식이 1 바이트 IEI(T)와 2 바이트 확장 LI(-E가 있는 L)를 포함하기 때문에 그것의 값 길이를 계산하기 위해 ESM 메시지 컨테이너 IE의 IE 길이에서 3 바이트를 뺀다. 마찬가지로, BASESPEC은 2 바이트를 T3346 값 IE의 길이로 빼는데, IEI(T)는 1 바이트 LI(L)을 가지고 있다. BASESPEC은 값(V)만 있기 때문에 EMM 원인 IE의 IE 길이를 조정하지 않는다.3) Compare IEs: BASESPEC compares pairs of IEs in the mapping and reports discrepant results. BASESPEC must first convert the IE of the specification into a similar format for binary. In particular, BASESPEC adjusts the IE length of the standard because the binary and standard lengths are different. Length in binary only considers the value part of the IE (ie the length of the value), whereas length in the specification also includes the IEI and LI (ie the length of the IE). BASESPEC subtracts the size of IEI and LI according to the format of the standard). For example, as shown in Figure 7, BASESPEC uses the ESM message to calculate its value length because its format contains a 1-byte IEI (T) and a 2-byte extended LI (L with -E). Subtract 3 bytes from the IE length of the container IE. Similarly, BASESPEC subtracts 2 bytes to the length of the T3346 value IE, where IEI(T) has 1 byte LI(L). BASESPEC does not adjust the IE length of the EMM cause IE because it only has a value (V).

그런 다음, BASESPEC는 조정된 IE를 비교한다. 길이가 다를 경우 BASESPEC는 이를 유효하지 않은 불일치로 보고한다. 예를 들어, 도 7에서 EMM 원인 IE의 값 길이는 규격과 바이너리 모두에서 동일하므로 BASESPEC은 불일치를 보고하지 않는다. 한편, 규격(3 바이트)에 있는 ESM 메시지 컨테이너 IE의 최소값 길이는 바이너리(0 바이트)의 길이와 다르므로 BASESPEC은 이를 잘못된 불일치로 표시한다.BASESPEC then compares the adjusted IE. If the lengths differ, BASESPEC reports this as an invalid mismatch. For example, in FIG. 7 , BASESPEC does not report a mismatch because the value length of the EMM cause IE is the same in both standard and binary. On the other hand, the minimum value length of the ESM message container IE in the specification (3 bytes) is different from the length of the binary (0 bytes), so BASESPEC marks it as a false mismatch.

또한, 메시지 구조를 의미론적으로 비교한다.It also compares message structures semantically.

구문 분석 외에도 BASESPEC은 의미론적 분석을 수행한다. 구문 분석이 메시지 구조의 명백한 불일치를 식별할 수 있지만, 베이스밴드 바이너리에서 주어진 메시지에 대한 실제 디코딩 논리는 구문 형식과 다를 수 있다. 이를 위해 의미론적 분석은 디코더 기능에서 수신 메시지가 구문 분석되는 방법에 초점을 맞춘다. BASESPEC은 구현과 규격 사이의 메시지 처리에서 불일치를 발견함으로써 디코더 기능의 의미론적 결함을 드러낸다. 여기에서는 이러한 불일치를 의미론적 불일치라고 부른다. 이러한 의미론적 불일치는 규격과 다른 베이스밴드의 의도하지 않은 동작을 의미할 수 있다.Besides syntactic analysis, BASESPEC also performs semantic analysis. Although syntax analysis can identify obvious inconsistencies in message structure, the actual decoding logic for a given message in baseband binary may differ from the syntax format. To this end, semantic analysis focuses on how incoming messages are parsed in the decoder function. BASESPEC exposes semantic flaws in decoder functionality by finding discrepancies in message handling between implementations and specifications. We call this discrepancy a semantic discrepancy. Such semantic inconsistency may mean an unintended operation of a baseband that is different from the standard.

의미론적 분석을 위해 BASESPEC은 주소를 제공하는 디코더 함수를 상징적으로 실행한다. 그런 다음, BASESPEC은 기호 실행에서 얻어진 제약조건을 그 고유한 역할을 사용하여 IEI와 LI로 변환한다. IEI는 비명령 IE를 구별하는 반면, LI는 값 부분의 크기를 지정한다. 다음으로 BASESPEC은 식별된 IEI와 LI를 기반으로 메시지 구조를 구축하여 구문 비교와 유사한 규격의 구조와 비교한 후 마지막으로 불일치를 보고한다. For semantic analysis, BASESPEC symbolically executes the decoder function that provides the address. BASESPEC then converts the constraints obtained in symbolic execution into IEIs and LIs using their own roles. IEI distinguishes non-command IE, while LI specifies the size of the value part. Next, BASESPEC builds a message structure based on the identified IEI and LI, compares it with the structure of a similar standard for syntax comparison, and finally reports a discrepancy.

도 8은 본 발명의 일 실시예에 따른 의미론적 분석을 나타내는 도면이다.8 is a diagram illustrating semantic analysis according to an embodiment of the present invention.

도 8을 참조하면, 샘플 EMM ATTACH REJECT 메시지를 사용한 L3 디코더 의미론적 분석(810)의 전반적인 절차를 보여준다.Referring to FIG. 8, an overall procedure of L3 decoder semantic analysis 810 using a sample EMM ATTACH REJECT message is shown.

1) 기호 실행: BASESPEC은 제한되지 않은 기호 실행의 개념에 따라 전체 베이스밴드 바이너리 대신 디코더 기능을 분석한다. 제한되지 않은 기호 실행은 확장성을 위해 전체 바이너리를 실행하지 않고 개별 함수를 직접 분석한다. 따라서 BASESPEC은 디코더 함수의 엔트리에서 반환될 때까지 기호 실행을 수행한다. 효율적인 분석을 위해 BASESPEC은 PD와 메시지 유형을 통합하여 한 번에 하나의 L3 메시지를 처리한다. 메시지 본문, 즉 IE는 가능한 IE를 고려하도록 제한되지 않는다. 예를 들어 도 8의 메시지에는 PD(0x7) 및 메시지 유형(0x44)에 대한 구체적인 값이 있다. 이는 EMM ATTACH REJECT 메시지를 나타낸다. 그러나 메시지 본문은 제한되지 않은 기호 변수(v1-v4)로 구성된다.1) Symbolic Execution: BASESPEC analyzes decoder functions instead of full baseband binaries according to the concept of unconstrained symbolic execution. Unconstrained symbolic execution directly analyzes individual functions without executing the full binary for extensibility. Therefore, BASESPEC performs symbolic execution from the entry of the decoder function until it returns. For efficient analysis, BASESPEC processes one L3 message at a time by integrating PD and message type. The message body, i.e. the IE, is not restricted to consider possible IEs. For example, the message of FIG. 8 has specific values for PD (0x7) and message type (0x44). This represents an EMM ATTACH REJECT message. However, the message body consists of unrestricted symbolic variables (v1-v4).

기호 실행은 기호 변수와 제약 조건을 생성한다. 이 변수들은 IEI와 LI의 디코딩 의미를 포함한다. 각 기호 변수는 IE 필드(즉, IEI, LI 또는 값) 중 하나를 나타내며, 각 제약 조건은 디코더가 필드를 처리하는 방법을 나타낸다. 디코더에서 기호 변수와 연관된 조건부 분기는 이러한 변수의 제약 조건을 생성한다. 이러한 제약 조건들은 비명령 IE의 경우 IEI를 확인하거나 임베디드 메시지 구조에 기초하여 LI를 검증함으로써 만들어질 수 있다. 예를 들어, 도 8의 프로그램 상태는 따르는 경로에 따라 기호 변수의 다른 제약 조건을 포함한다. v2==0x5F의 제약조건을 포함하는 S1 상태는 IEI 값으로 0x5F를 가진 IE를 디코딩하는 경로를 따랐을 수 있다.Symbolic execution creates symbolic variables and constraints. These variables include the decoding semantics of IEI and LI. Each symbolic variable represents one of the IE fields (i.e., IEI, LI, or value), and each constraint dictates how the decoder processes the field. Conditional branches associated with symbolic variables in the decoder create constraints on these variables. These constraints can be made by verifying the IEI in the case of non-command IE or by validating the LI based on the embedded message structure. For example, the program state of Figure 8 includes different constraints of symbolic variables depending on the path followed. An S1 state containing the constraint of v2==0x5F could have followed the path of decoding an IE with 0x5F as the IEI value.

2) IEI 및 LI 식별: 기호 상태가 디코더 기능의 끝에 도달하면 BASESPEC은 수집된 기호 변수와 제약조건에서 IEI와 LI를 식별한다. 첫째, BASESPEC은 메모리 주소 지정의 용도를 사용하여 LI를 식별한다. LI가 값 부분의 크기를 지정함에 따라 디코더 함수는 주소 계산에서 LI를 사용하여 다음 IE에 접속한다. 예를 들어, 도 8에서 v3이 주소 A에 위치한 LI라고 가정하자. 그런 다음, v4와 그 다음 바이트는 길이가 v3인 값 부분이다. 그러므로 디코더 함수가 다음 IE에 접속하기를 원할 때, 그것은 A+v3+1을 IE의 주소로 사용할 것이다. 따라서 LI에 대한 기호 변수는 주소에 사용되는지 여부를 확인하여 식별할 수 있다. 또한, -E 접미사 형식을 가진 2 바이트 LI도 유사하게 식별될 수 있다. 마지막 IE의 LI는 이러한 방식으로 식별될 수 없지만, 여기에서는 다른 부분이 식별된 후 마지막 미확인 기호 변수를 LI로 가정할 수 있다.2) IEI and LI identification: When the symbolic state reaches the end of the decoder function, BASESPEC identifies the IEI and LI from the collected symbolic variables and constraints. First, BASESPEC uses the purpose of memory addressing to identify LIs. As LI specifies the size of the value part, the decoder function uses LI in address calculation to access the next IE. For example, assume that v3 in FIG. 8 is an LI located at address A. Then, v4 and the bytes following it are the value part of length v3. Therefore, when the decoder function wants to access the next IE, it will use A+v3+1 as the address of the IE. Thus, symbolic variables for LI can be identified by checking whether they are used in addresses. Also, a 2-byte LI with the -E suffix format can be similarly identified. The last IE's LI cannot be identified in this way, but here we can assume the last unresolved symbol variable to be the LI after the other parts have been identified.

다음으로, 비명령 IEI의 IEI는 디코딩 루틴에서 사전 정의된 IEI 값과 비교되어야 하므로 간단한 방법으로 식별될 수 있다. 그러므로 기호변수가 LI가 아닌 경우 IEI로 식별되며, 그 값을 엄격히 제한하는 제약조건들이 있다. 도 8과 같이 일부 제약조건은 IEI의 가능한 값인 0x5F, 0x78 또는 0x16으로 v2 값을 엄격히 제한한다. 따라서 v2는 IEI 부분이다. IE의 값 부분은 디코딩 루틴에서 제약되지 않기 때문에 암묵적으로 식별된다. 디코더 함수가 실제 값을 읽지 않고 값 부분의 주소만 출력으로 저장하는 경우에는 기호 변수에 액세스할 수 없다. 값이 다른 메모리 영역에 액세스되고 복사되는 경우 기호 실행 중에 이러한 작업을 식별하고 값 부분을 결정할 수 있다.Next, the IEI of the non-command IEI must be compared with the predefined IEI value in the decoding routine, so it can be identified in a simple way. Therefore, if a symbolic variable is not LI, it is identified as IEI, and there are constraints that strictly limit its value. As shown in FIG. 8, some constraints strictly limit the value of v2 to possible values of IEI such as 0x5F, 0x78, or 0x16. So v2 is the IEI part. The value part of the IE is implicitly identified because it is not constrained in the decoding routine. If the decoder function does not read the actual value and only stores the address of the value part as output, then the symbolic variable cannot be accessed. If values are accessed and copied to other memory areas, these operations can be identified and value parts determined during symbol execution.

3) 경로 증가 시 처리: 기호 실행 과정에서 BASESPEC은 경로 증가를 방지하기 위해 상태 프루닝(pruning)을 수행하는데, 이는 기호 실행 기반 접근법의 잘 알려진 문제이다. 특히 BASESPEC은 오류 처리 논리에 도달한 경로를 제거한다. 디코더 기능이 메시지에서 명백한 오류를 감지하면 복잡한 오류 처리 논리를 호출한다. 이 오류 처리 논리는 합법적인 메시지 디코딩과는 무관하지만 경로 증가를 일으킨다. 따라서 이 경로를 버리고 경로 증가를 방지할 수 있다. 디코더 함수가 오류를 나타내도록 플래그 변수를 설정하므로 플래그 변수로 이러한 경로를 구별할 수 있다. 따라서 BASESPEC은 플래그 변수를 설정한 경로를 프루닝한다. 3) Handling on path growth: During symbolic execution, BASESPEC performs state pruning to prevent path growth, which is a well-known problem of symbolic execution-based approaches. In particular, BASESPEC removes the path to the error handling logic. When the decoder function detects an obvious error in the message, it invokes complex error-handling logic. This error handling logic has nothing to do with legitimate message decoding, but causes path growth. Therefore, this route can be discarded and route growth can be prevented. These paths can be distinguished by the flag variable because the decoder function sets the flag variable to indicate an error. Therefore, BASESPEC prunes the path with the flag variable set.

또한, 경로 증가를 방지하기 위해 각 상태에서 분석되는 비명령 IE의 수를 제한한다. 비명령 IE는 메시지에서 임의의 순서로 나타날 수 있다. 따라서 수많은 조합이 기호 실행에서 나타나고, 이는 결국 복잡한 제약조건을 가진 수많은 상태를 생성한다. 상태 증가를 방지하기 위해, 대부분의 비명령 IE는 선택 사항이며 서로 관련이 없기 때문에 BASESPEC은 독립 상태에서 각각의 비명령 IE를 별도로 분석한다. 특히, BASESPEC은 각 활성 상태의 IEI를 주기적으로 식별하여 여러 비명령 IE에 대한 제약조건을 가진 경우 상태를 제거한다. 모든 명령 IE는 메시지에 표시되어야 하므로 각 상태에서 분석된다.It also limits the number of non-command IEs analyzed in each state to prevent path growth. Non-command IEs can appear in any order in a message. Thus, numerous combinations emerge in symbolic execution, which in turn creates numerous states with complex constraints. To prevent state growth, BASESPEC analyzes each non-command IE separately in an independent state, since most non-command IEs are optional and unrelated to each other. Specifically, BASESPEC periodically identifies the IEI in each active state and removes the state if it has constraints on multiple non-command IEs. All commands IE are analyzed in each state as they must be displayed in the message.

4) IE 비교: 비교를 위해 BASESPEC는 확인된 IEI 및 메시지의 LI 값에 기초하여 의미인식 메시지 구조를 구성한다. 기호 실행의 각 상태는 모든 명령 IE와 일부 비명령 IE에 대한 정보를 가지고 있다. BASESPEC은 먼저 다양한 상태로부터 정보를 수집하여 가능한 IE 목록을 작성한다. IEI와 LI의 쌍은 비명령 IE를 구성하며, IEI가 없는 LI는 명령 IE를 구성한다. BASESPEC은 IEI와 LI 부분의 의미론을 분석하며, 그러한 IE가 값 부분만 가지고 있기 때문에 LI가 없는 명령 IE를 식별하지 않는다. 예를 들어, 도 8에서, S1 상태는 v2를 IEI(0x5F)로, v3을 LI로 하는 비명령 IE를 구성한다. S2 상태는 v2가 IEI(0x78)이고 v3:v4가 확장 LI인 비명령 IE로 구성된다. ATTACH REJECT 메시지에도 모든 상태에서 v1이 포함된 명령 IE가 있지만, LI가 없고 값 부분만 있기 때문에 식별되지 않는다. 그런 다음, BASESPEC은 메시지 구조를 도 8의 오른쪽 표로 구성한다. 그것은 길이의 명시적 범위를 보여주기 위해 LI를 구체화한다. 메시지 구조는 디코더의 내부 논리를 반영하기 때문에 의미 인식된다.4) IE comparison: For comparison, BASESPEC constructs a semantic recognition message structure based on the identified IEI and LI value of the message. Each state of symbolic execution contains information about all instruction IEs and some non-command IEs. BASESPEC first builds a list of possible IEs by gathering information from various states. A pair of IEI and LI constitutes a non-command IE, and a LI without an IEI constitutes a command IE. BASESPEC analyzes the semantics of the IEI and LI parts, and does not identify command IEs without LIs because such IEs only have value parts. For example, in Fig. 8, state S1 constitutes a non-command IE with v2 as IEI (0x5F) and v3 as LI. The S2 state consists of a non-command IE where v2 is an IEI (0x78) and v3:v4 is an extended LI. The ATTACH REJECT message also has a command IE with v1 in all states, but it is not identified because there is no LI and only a value part. Then, BASESPEC organizes the message structure into the table on the right of Figure 8. It embodies LI to show an explicit range of lengths. The message structure is semantically recognizable because it reflects the internal logic of the decoder.

마지막으로, BASESPEC은 메시지 구조를 구문 비교에서와 유사하게 규격서와 비교한다. 명령 IE는 고정된 순서로 나타나야 하므로, BASESPEC는 LI가 없는 명령 IE를 건너뛰면서 순차적으로 이들의 LI를 비교한다. 비명령 IE의 경우, BASESPEC는 먼저 IEI를 사용하여 이들을 일치시킨 다음 이들의 LI를 비교한다. BASESPEC은 의미론적 불일치로 일치하지 않는 차이 또는 나머지 IE를 보고한다.Finally, BASESPEC compares the message structure to the specification, similar to syntactic comparison. Because command IEs must appear in a fixed order, BASESPEC compares their LIs sequentially, skipping command IEs without LIs. For non-command IEs, BASESPEC first matches them using the IEI and then compares their LIs. BASESPEC reports inconsistent differences or remaining IEs as semantic mismatches.

아래에서는 구현 분석을 설명한다.An implementation analysis is described below.

BASESPEC은 규격과 바이너리 구현 사이의 불일치를 자동으로 발견하지만, 불일치의 영향을 이해하기 위해서는 추가적인 수동 분석이 필요하다. 메시지가 주어지면 디코더 함수는 메시지를 구문 분석하고 메시지의 IE를 해당 처리기 함수에 전달하여 추가 처리를 수행한다. BASESPEC은 메시지의 체계적인 구조를 활용하여 디코딩 루틴을 자동으로 분석한다. 그러나 처리기 함수는 복잡한 의미론(예컨대, 세션 관리 또는 호출 제어)을 가지고 있어 자동으로 정확성을 검증하는 것이 어렵다. 따라서, 실시예들은 그것을 분석하기 위해 수동 분석에 의존한다. 그럼에도 불구하고 BASESPEC에서 보고한 불일치가 이 분석에 힌트를 제공할 수 있다는 점에 주목할 필요가 있다. 실시예들은 복잡한 논리로 전체 기능이 아닌 불일치 IE에 해당하는 루틴만 분석해야 한다.BASESPEC automatically detects discrepancies between specifications and binary implementations, but requires additional manual analysis to understand the impact of discrepancies. Given a message, the decoder function parses the message and passes the message's IE to the corresponding handler function for further processing. BASESPEC utilizes the systematic structure of the message to automatically analyze the decoding routine. However, handler functions have complex semantics (eg, session management or call control), making it difficult to automatically verify correctness. Thus, embodiments rely on manual analysis to analyze it. Nevertheless, it is worth noting that discrepancies reported by BASESPEC may provide hints to this analysis. Embodiments have to analyze only the routines corresponding to the inconsistent IE, not the entire function with complex logic.

도 9는 본 발명의 일 실시예에 따른 디코더의 불일치와 처리기 기능의 영향 사이의 관계를 나타낸다.Figure 9 shows the relationship between the decoder mismatch and the effect of the processor function according to one embodiment of the present invention.

도 9를 참조하면, 디코더(910)의 불일치와 처리기(920) 기능의 영향 사이의 관계를 나타내며, 특히 누락된 불일치 및 알 수 없는 불일치는 베이스밴드 펌웨어의 기능 오류를 직접적으로 나타낸다. 911과 같이, 누락된 불일치는 양성 IE의 감소를 야기한다. 만약 명령 IE(즉, 필수 필드)가 불일치로 인해 떨어지면 펌웨어가 규격(즉, 기능 오류)을 준수하지 못한다는 것을 보여준다. 또한, 알 수 없는 불일치는 누락된 불일치와 밀접하게 결합된다. 개발자가 실수로 잘못된 IEI 값을 삽입하면 알 수 없는 불일치와 누락된 불일치가 동시에 나타난다. 이러한 경우 알 수 없는 불일치는 기능 오류를 직접적으로 나타낸다.Referring to Fig. 9, the relationship between decoder 910 inconsistencies and processor 920 functionalities is shown, in particular, missed mismatches and unknown mismatches directly indicate functional errors in the baseband firmware. Like 911, missed mismatches result in a decrease in positive IE. If a command IE (i.e., a required field) is dropped due to a mismatch, it indicates that the firmware is not compliant (i.e., a functional error). Also, unknown mismatches are closely coupled to missed mismatches. If a developer accidentally inserts an incorrect IEI value, unknown mismatches and missing mismatches appear at the same time. In this case, an unknown mismatch directly indicates a functional error.

또한, 유효하지 않은 불일치는 두 가지 영향을 미칠 수 있다. 즉, 디코더가 특정 IE의 길이를 제대로 검증하지 못했다는 것을 본질적으로 나타내기 때문에 기능 오류 또는 메모리 손상을 일으킬 수 있다. IE에 대한 디코더의 길이 제한이 규격에 정의된 것보다 더 엄격한 경우, 양성 IE를 거부하는 기능 오류를 나타내기 때문에 추가 분석이 필요하지 않다(912) 한편, 길이 제한이 더 크면 추가 처리 시 메모리 손상 버그가 발생할 수 있다(913). 예를 들어, 실제 길이가 더 클 수 있지만 개발자가 사양에 따라 특정 IE의 길이를 맹목적으로 가정할 경우 버퍼 오버플로우가 발생할 수 있다. 처리기 기능은 추가 검사를 받을 수 있으므로 잘못된 불일치의 영향을 확인하기 위해 처리기에 대한 수동 분석이 필요하다. 주목할 만한 점은 이러한 유효하지 않은 불일치가 베이스밴드 펌웨어에 메시지 구조를 내장하는 개발자의 실수에 대한 유용한 통찰력을 제공한다는 것이며, 이로 인해 몇 가지 중요한 보안 취약성을 발견할 수 있다는 것이다.Also, an invalid mismatch can have two effects. That is, it can cause functionality errors or memory corruption because it essentially indicates that the decoder did not properly verify the length of a particular IE. If the decoder's length limit for the IE is more stringent than that defined in the specification, then further analysis is not required since rejecting a benign IE indicates a functional error (912). A bug may occur (913). For example, buffer overflows can occur if developers blindly assume the length of a particular IE according to the specification, even though the actual length may be greater. Handler functions may be subject to additional scrutiny, so manual analysis of the handler is required to determine the effects of false mismatches. Notably, these invalid mismatches provide useful insight into the mistakes of developers embedding message structures into baseband firmware, which can lead to the discovery of several important security vulnerabilities.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The devices described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. A processing device may run an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. can be embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

A method for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document performed by a computer device,
The step of comparing and analyzing the message structure implemented in the baseband software with the standardized message structure of the cellular standard by using the characteristics of the baseband as a modem for network communication.
including,
The step of comparing and analyzing the message structure implemented in the baseband software with the standardized message structure of the cellular standard,
A method of automatically comparing and analyzing mobile communication baseband software by automatically checking a message structure implemented in the baseband software by utilizing a standardized message structure of the cellular standard.

According to claim 1,
Identifying a mismatch in which the message structure implemented in the baseband software does not conform to the standardized message structure of the cellular standard through the comparative analysis.
Further comprising, a method for automatically comparing and analyzing mobile communication baseband software.

According to claim 1,
The step of comparing and analyzing the message structure implemented in the baseband software with the standardized message structure of the cellular standard,
analyzing the baseband software to identify a message decoder and extracting a message structure implemented in the baseband software; and
Comparing and analyzing the extracted message structure with a standardized message structure of the cellular standard
A method for automatically comparing and analyzing mobile communication baseband software.

According to claim 3,
The step of comparing and analyzing the extracted message structure with the standardized message structure of the cellular standard,
Comparing and analyzing whether the protocol structure embedded in the baseband software is syntactically identical to the cellular standard
Characterized in, a method for automatically comparing and analyzing mobile communication baseband software.

According to claim 3,
The step of comparing and analyzing the extracted message structure with the standardized message structure of the cellular standard,
Comparatively analyzing whether the function of the message decoder complies with the cellular standard, which has a basic logic that semantically utilizes symbolic execution.
Characterized in, a method for automatically comparing and analyzing mobile communication baseband software.

According to claim 2,
analyzing the discrepancy to see if it could create a functional or security bug;
Further comprising, a method for automatically comparing and analyzing mobile communication baseband software.

According to claim 2,
The step of identifying the discrepancies is:
Reporting remaining information elements (IEs) that are not mapped to the standardized message structure of the cellular specification in the message structure implemented in the baseband software as missing discrepancies or unknown discrepancies.
Characterized in, a method for automatically comparing and analyzing mobile communication baseband software.

According to claim 2,
The step of identifying the discrepancies is:
Automatically identifying discrepancies in Layer 3 (L3) messages of a number of said baseband software and flagging potential buggy points for analysis.
Characterized in, a method for automatically comparing and analyzing mobile communication baseband software.

In the system for automatically comparing and analyzing mobile communication baseband software based on mobile communication standard documents,
Comparison and analysis unit that compares and analyzes the message structure implemented in the baseband software with the standardized message structure of the cellular standard by using the characteristics of the baseband as a modem for network communication
including,
The comparative analysis unit,
A system for automatically comparing and analyzing mobile communication baseband software that automatically checks the message structure implemented in the baseband software by utilizing the standardized message structure of the cellular standard.

According to claim 9,
A mismatch identification unit identifying a mismatch in which the message structure implemented in the baseband software does not comply with the standardized message structure of the cellular standard through the comparative analysis.
Further comprising, a system for automatically comparing and analyzing mobile communication baseband software.

According to claim 9,
The comparative analysis unit,
Analyzing the baseband software to identify a message decoder, extracting a message structure implemented in the baseband software, and comparing and analyzing the extracted message structure with a standardized message structure of the cellular standard.
Characterized in that, a system for automatically comparing and analyzing mobile communication baseband software.

According to claim 11,
The comparative analysis unit,
Comparing and analyzing whether the protocol structure embedded in the baseband software is syntactically identical to the cellular standard
Characterized in that, a system for automatically comparing and analyzing mobile communication baseband software.

According to claim 11,
The comparative analysis unit,
Comparatively analyzing whether the function of the message decoder complies with the cellular standard, which has a basic logic that semantically utilizes symbolic execution.
Characterized in that, a system for automatically comparing and analyzing mobile communication baseband software.

According to claim 10,
A bug creation confirmation unit that analyzes the inconsistency and checks whether a functional or security bug can be created.
Characterized in that, a system for automatically comparing and analyzing mobile communication baseband software.

According to claim 10,
The mismatch identification unit,
Reporting remaining information elements (IEs) that are not mapped to the standardized message structure of the cellular specification in the message structure implemented in the baseband software as missing discrepancies or unknown discrepancies.
Characterized in that, a system for automatically comparing and analyzing mobile communication baseband software.

According to claim 10,
The mismatch identification unit,
Automatically identifying discrepancies in Layer 3 (L3) messages of a number of said baseband software and flagging potential buggy points for analysis.
Characterized in that, a system for automatically comparing and analyzing mobile communication baseband software.

A method for automatically comparing and analyzing mobile communication baseband software based on a mobile communication standard document performed by a computer device,
extracting a message structure from the baseband firmware having specifications and binary-specific metadata; and
Syntactically comparing the message structure of the standard with the binary embedded message structure of the baseband firmware and semantically analyzing the implementation logic using symbolic execution.
A method for automatically comparing and analyzing mobile communication baseband software.

According to claim 17,
Identifying discrepancies between the message structure of the standard and the message structure of the baseband firmware through the comparison and analysis.
Further comprising, a method for automatically comparing and analyzing mobile communication baseband software.