RU2757265C1

RU2757265C1 - System and method for assessing an application for the presence of malware

Info

Publication number: RU2757265C1
Application number: RU2020131447A
Authority: RU
Inventors: Игорь Игоревич Кузнецов; Сергей Александрович Минеев
Original assignee: Акционерное общество "Лаборатория Касперского"
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2021-10-12

Abstract

FIELD: information security.

SUBSTANCE: invention relates to the field of computing technology for ensuring information security. The technical result is achieved by: a) for a given application, forming a source code thereof; b) forming an intermediate code based on the generated source code; c) sampling the characteristics of at least one application from an application database based on the formed intermediate code; d) determining the degree of maliciousness of the given application based on the selected characteristics of the application.

EFFECT: increase in the accuracy of determining the degree of maliciousness of an application.

18 cl, 3 dwg

Description

Область техникиTechnology area

Изобретение относится к области информационной безопасности, а более конкретно к системам и способам оценки приложения на вредоносность.The invention relates to the field of information security, and more specifically to systems and methods for evaluating an application for harmfulness.

Уровень техникиState of the art

Бурное развитие компьютерных технологий в последнее десятилетие, а также широкое распространение разнообразных вычислительных устройств (персональных компьютеров, ноутбуков, планшетов, смартфонов и т.д.) стали мощным стимулом для использования упомянутых устройств в разнообразных сферах деятельности и для огромного количества задач (от интернет-серфинга до банковских переводов и ведения электронного документооборота). Параллельно с ростом количества вычислительных устройств и программного обеспечения, работающего на этих устройствах, быстрыми темпами росло и количество вредоносных программ.The rapid development of computer technologies in the last decade, as well as the widespread use of various computing devices (personal computers, laptops, tablets, smartphones, etc.) have become a powerful incentive for the use of these devices in various fields of activity and for a huge number of tasks (from the Internet surfing to bank transfers and electronic document management). In parallel with the growth in the number of computing devices and software running on these devices, so did the number of malicious programs.

В настоящий момент существует огромное количество разновидностей вредоносных программ. Одни крадут с устройств пользователей их персональные и конфиденциальные данные (например, логины и пароли, банковские реквизиты, электронные документы). Другие формируют из устройств пользователей так называемые бот-сети (англ. botnet) для атак таких, как отказ в обслуживании (англ. DDoS - Distributed Denial of Service) или для перебора паролей методом грубой силы (англ. brute force) на другие компьютеры или компьютерные сети. Третьи предлагают пользователям платный контент через навязчивую рекламу, отправку SMS-сообщений на платные номера и т.д.At the moment, there are a huge number of types of malware. Some steal personal and confidential data from users' devices (for example, logins and passwords, bank details, electronic documents). Others form so-called botnets from users' devices for attacks such as DDoS (Distributed Denial of Service) or for brute force brute-force attacks on other computers or computer networks. Still others offer users paid content through intrusive ads, sending SMS messages to paid numbers, etc.

Для обнаружения всего многообразия вредоносных программ используются разнообразные технологии, такие как:Various technologies are used to detect the whole variety of malicious programs, such as:

статический анализ - анализ программы на вредоносность, исключающей запуск или эмуляцию работы анализируемой программы, на основании данных содержащихся в файлах, составляющих анализируемую программу.static analysis - analysis of a program for malware, which excludes launching or emulating the operation of the analyzed program, based on the data contained in the files that make up the analyzed program.

динамический анализ - анализ программы на вредоносность на основании данных, полученных в ходе выполнения или эмуляции работы анализируемой программы.dynamic analysis - analysis of a program for harmfulness based on data obtained during the execution or emulation of the analyzed program.

И статический, и динамический анализ обладают своими плюсами и минусами. Статический анализ менее требователен к ресурсам компьютерной системы, на которой выполняется анализ, а поскольку он не требует выполнения или эмуляции анализируемой программы, статический анализ более быстрый, но при этом менее эффективен, т.е. имеет более низкий процент обнаружения новых типов вредоносных программ. Динамический анализ из-за того, что использует данные, получаемые при выполнении или эмуляции работы анализируемой программы, является более медленным и предъявляет более высокие требования к ресурсам компьютерной системы на которой выполняется анализ, но при этом и более эффективен для обнаружения новых типов вредоносных программ.Both static and dynamic analysis have their own pros and cons. Static analysis is less demanding on the resources of the computer system on which the analysis is performed, and since it does not require execution or emulation of the analyzed program, static analysis is faster, but at the same time less efficient, i.e. has a lower detection rate for new types of malware. Dynamic analysis, due to the fact that it uses data obtained during the execution or emulation of the analyzed program, is slower and makes higher demands on the resources of the computer system on which the analysis is performed, but at the same time it is more efficient for detecting new types of malicious programs.

Для повышения эффективности статического анализа применяются разнообразные технологии, в том числе дизассемблирование (как частный случай декомпиляции) и последующий анализ уже не самого исполнимого кода исследуемого приложения, а более глубокий и эффективный анализ восстановленного исходного кода исследуемого приложения. К примеру такой анализ выполняется «в потоке», когда данные выбираются из сетевого трафика.To increase the efficiency of static analysis, various technologies are used, including disassembly (as a special case of decompilation) and subsequent analysis of not the executable code of the application under study, but a deeper and more efficient analysis of the restored source code of the application under study. For example, this analysis is performed "in-stream" when data is fetched from network traffic.

Зачастую дополняя (или подменяя) декомпиляцию выступает анализ данных, полученных при эмуляции (или исполнении) приложений, т.е. когда получают доступ не к инструкциям приложения (или алгоритмам приложения), а к результатам выполнения таких инструкций или алгоритмов.Often, supplementing (or replacing) decompilation is the analysis of data obtained during emulation (or execution) of applications, i.e. when access is not to the instructions of the application (or algorithms of the application), but to the results of the execution of such instructions or algorithms.

В патентной публикации US 9288220 B2 описана технология обнаружения вредоносного ПО в сетевом трафике. С этой целью из данных, выбранных из сетевого трафика, выделяют характерные признаки (признаки, характеризующие тип исполнимого файла, поведение исполнимого файла при эмуляции, выполняемые при исполнении файла (вызовы API функций), срабатывание сигнатур в файле и т.д.), в качестве которых может выступать признаковое описание выбранных данных, так называемый вектор признаков (англ. feature vector), составленный из значений, соответствующих набору признаков для объекта, содержащего выбранные данные. Применяя модели обнаружения безопасных файлов, обнаружения вредоносных файлов и определения типов вредоносных файлов, предварительно обученные с использованием методов машинного обучения на основании шаблонов, составленных из схожих с упомянутыми характерных признаков, определяют, с каким весом и к какому типу вредоносного 110 относятся выбранные данные, и выносят решение об обнаружении вредоносного 110 в сетевом трафике.US Patent Publication 9288220 B2 describes a technology for detecting malware in network traffic. For this purpose, characteristic features (features characterizing the type of the executable file, the behavior of the executable file during emulation, executed during file execution (API function calls), the triggering of signatures in the file, etc.) are distinguished from the data selected from the network traffic. which can be a feature description of the selected data, the so-called feature vector, composed of values corresponding to a set of features for an object containing the selected data. Using models for detecting safe files, detecting malicious files, and detecting types of malicious files, pre-trained using machine learning methods based on patterns composed of similar to the mentioned characteristics, determine the weight and what type of malicious 110 the selected data belongs to, and make a decision on the detection of malicious 110 in network traffic.

Настоящее изобретение позволяет решать задачу обнаружения вредоносных файлов с использованием элементов статического анализа.The present invention makes it possible to solve the problem of detecting malicious files using static analysis elements.

Раскрытие изобретенияDisclosure of invention

Изобретение предназначено для обеспечения информационной безопасности.The invention is intended to ensure information security.

Технический результат настоящего изобретения заключается в определении степени вредоносности приложения.The technical result of the present invention is to determine the degree of harmfulness of the application.

Еще один технический результат настоящего изобретения заключается в обнаружении вредоносного приложения.Another technical result of the present invention is the detection of a malicious application.

Еще один технический результат настоящего приложения заключается в обнаружении вредоносного кода, внедренного в приложение.Another technical result of this application is to detect malicious code embedded in the application.

Данный результат достигается с помощью использования системы формирования индивидуального содержимого для пользователя сервиса, которая содержит: средство журналирования, предназначенное для сбора данных об использовании пользователем вычислительного устройства; средство обучения, предназначенное для обучения модели поведения пользователя на основании собранных данных; средство передачи, предназначенное для передачи обученной модели средству формирования содержимого; средство формирования содержимого, предназначенное для формирования индивидуального содержимого для пользователя сервиса на основании заранее заданного окружения сервиса с учетом предоставленной средством передачи модели поведения.This result is achieved through the use of a system for generating individual content for the user of the service, which contains: a logging tool designed to collect data on the user's use of the computing device; a training tool for training a user behavior model based on the collected data; transfer means for transferring the trained model to the content generating means; content generating means for generating individual content for a service user based on a predetermined service environment, taking into account the behavior model provided by the transmission means.

Данный результат достигается с помощью использования системы определения степени вредоносности, которая содержит: средство формирования исходного кода, предназначенное для формирования для заданного приложения исходного кода; средство формирования промежуточного кода, предназначенное для формирования промежуточного кода на основании сформированного исходного кода; средство выборки характеристик, предназначенное для выборки характеристик по меньшей мере одного приложения из базы приложений на основании сформированного промежуточного кода; средство определения степени вредоносности, предназначенное для определения степени вредоносности заданного приложения на основании выбранных характеристик приложения.This result is achieved by using a system for determining the degree of harmfulness, which contains: a source code generating tool designed to generate a source code for a given application; intermediate code generating means for generating intermediate code based on the generated source code; characteristic sampling means for retrieving characteristics of at least one application from the application database based on the generated intermediate code; severity determination means for determining the severity of a given application based on the selected characteristics of the application.

В другом частном случае реализации системы в качестве приложения может выступать по меньшей мере одно из: исполнимый файл; совокупность по меньшей мере одного исполнимого файла и по меньшей мере одного файла ресурсов.In another particular case of the system implementation, at least one of the following can act as an application: an executable file; a collection of at least one executable file and at least one resource file.

Еще в одном частном случае реализации системы исходный код формируют декомпиляцией заданного приложения.In another particular case of the system implementation, the source code is formed by decompiling a given application.

В другом частном случае реализации системы средство формирования промежуточного кода дополнительно предназначено для: выделения из сформированного исходного кода по меньшей мере одного базового блока, где в качестве базового блока может выступать по меньшей мере: функция; совокупность инструкций, расположенных между инструкцией перехода по заданному адресу и инструкцией, расположенной по заданному адресу; по меньшей мере: вычисления свертки от базового блока; выделения кодов инструкций, содержащихся в базовом блоке; выделения константных аргументов инструкций, содержащихся в базовом блоке; выделения переменные со стека; выделения не попадающих в область виртуальных адресов аргументов инструкций, содержащихся в базовом блоке, формирования символьного выражения базового блока, включающее для инструкций, содержащихся в базовом блоке, по меньшей мере: переменную на входе инструкции; константный аргумент инструкции; результат вызова инструкции; из базового блока.In another particular case of the implementation of the system, the means for generating the intermediate code is additionally intended for: separating from the generated source code at least one basic block, where at least one of the following can act as the basic block: a function; a set of instructions located between a jump instruction at a given address and an instruction located at a given address; at least: calculating a convolution from the base block; highlighting the codes of instructions contained in the base unit; allocation of constant arguments of instructions contained in the basic block; allocating variables from the stack; highlighting the arguments of the instructions contained in the base unit that do not fall into the area of virtual addresses, forming a symbolic expression of the base unit, including, for the instructions contained in the base unit, at least: a variable at the instruction input; constant argument to the instruction; the result of invoking the instruction; from the base unit.

Еще в одном частном случае реализации системы из промежуточного кода исключают инструкции, относящиеся по меньшей мере к: коду стандартных библиотек; коду известных безопасных приложений.In another particular case of the implementation of the system, instructions relating at least to: the code of standard libraries are excluded from the intermediate code; known secure applications.

В другом частном случае реализации системы база приложений содержит характеристики и сформированный промежуточный код по меньшей мере одного приложения.In another particular case of the system implementation, the application base contains characteristics and generated intermediate code of at least one application.

Еще в одном частном случае реализации системы в качестве характеристик приложения выступают по меньшей мере: категория, к которой относится указанное приложение, при этом в качестве категории выступает по меньшей мере: категория безопасных приложений, включающая приложения, не имеющие вредоносного функционала; категория вредоносных приложений, включающая приложения, имеющие вредоносный функционал; категория неизвестных приложений, включающая приложения, функционал которых неизвестен; степень вредоносности приложения, обозначающая вероятность того, что приложение имеет вредоносный функционал.In another particular case of the implementation of the system, the characteristics of the application are at least: the category to which the specified application belongs, while the category is at least: the category of safe applications, including applications that do not have malicious functionality; a category of malicious applications, which includes applications that have malicious functionality; a category of unknown applications, which includes applications whose functionality is unknown; the severity of the application, indicating the likelihood that the application has malicious functionality.

В другом частном случае реализации системы выбирают из базы приложений характеристики приложения, промежуточный код которого имеет схожесть со сформированным промежуточным кодом выше заранее заданного порогового значения.In another particular case, the implementation of the system is selected from the application database characteristics of the application, the intermediate code of which is similar to the generated intermediate code above a predetermined threshold value.

Еще в одном частном случае реализации системы упомянутая система дополнительно содержит средство идентификации, предназначенное для определения на основании степени вредоносности заданного приложения по меньшей мере: участка кода в заданном приложении, схожего по функциональности с участком кода приложения из базы приложений; вредоносного кода, внедренного в заданное приложение; категорию заданного приложения.In yet another particular case of the implementation of the system, said system further comprises identification means designed to determine, based on the severity of a given application, at least: a piece of code in a given application, similar in functionality to a piece of application code from the application base; malicious code embedded in a given application; the category of the given application.

Данный результат достигается с помощью использования способа определения степени вредоносности, при этом способ содержит этапы, которые реализуются с помощью средств из системы определения степени вредоносности и на которых: формируют для заданного приложения его исходный код; формируют промежуточный код на основании сформированного исходного кода; осуществляют выборку характеристик по меньшей мере одного приложения из базы приложений на основании сформированного промежуточного кода; определяют степень вредоносности заданного приложения на основании выбранных характеристик приложения.This result is achieved by using a method for determining the degree of harmfulness, while the method contains stages that are implemented using the means from the system for determining the degree of harmfulness and at which: for a given application, its source code is generated; generating an intermediate code based on the generated source code; fetching characteristics of at least one application from the application base based on the generated intermediate code; determine the severity of a given application based on the selected characteristics of the application.

В другом частном случае реализации способа в качестве приложения может выступать по меньшей мере: один исполнимый файл; по меньшей мере один исполнимый файл и по меньшей мере один файл ресурсов, совокупно обеспечивающие функционал приложения.In another particular case of the implementation of the method, at least one executable file can act as an application; at least one executable file and at least one resource file collectively providing application functionality.

Еще в одном частном случае реализации способа исходный код формируют декомпиляцией заданного приложения.In another particular case of the implementation of the method, the source code is formed by decompiling a given application.

В другом частном случае реализации способа формирование промежуточного кода содержит этапы, на которых: выделяют из сформированного исходного кода по меньшей мере один базовый блок, где в качестве базового блока выступает по меньшей мере: функция; совокупность инструкций, расположенных между инструкцией перехода по заданному адресу и инструкцией, расположенной по заданному адресу; на основании выделенного базового блока по меньшей мере: вычисляют свертку от базового блока; выделяют коды инструкций, содержащихся в базовом блоке; выделяют константные аргументы инструкций, содержащихся в базовом блоке; выделяют не попадающие в область виртуальных адресов аргументы инструкций, содержащихся в базовом блоке, формируют символьное выражение базового блока, включающее для инструкций, содержащихся в базовом блоке, по меньшей мере: переменную на входе инструкции; константный аргумент инструкции; результат вызова инструкции.In another particular case of the implementation of the method, the formation of the intermediate code contains the stages, at which: select from the generated source code at least one basic block, where the basic block is at least: a function; a set of instructions located between a jump instruction at a given address and an instruction located at a given address; based on the selected base block at least: calculating convolution from the base block; allocate codes of instructions contained in the base unit; extract the constant arguments of the instructions contained in the base block; the arguments of the instructions contained in the base block are selected that do not fall into the area of virtual addresses, a symbolic expression of the base block is formed, including, for the instructions contained in the base block, at least: a variable at the instruction input; constant argument to the instruction; the result of invoking the instruction.

Еще в одном частном случае реализации способа из промежуточного кода исключают инструкции, относящиеся по меньшей мере к: коду стандартных библиотек; коду известных безопасных приложений.In yet another particular case of the implementation of the method, instructions relating at least to: the code of standard libraries are excluded from the intermediate code; known secure applications.

В другом частном случае реализации способа база приложений содержит характеристики и сформированный промежуточный код по меньшей мере одного приложения.In another particular case of the implementation of the method, the application base contains characteristics and generated intermediate code of at least one application.

Еще в одном частном случае реализации способа в качестве характеристик приложения выступают по меньшей мере: категория, к которой относится указанное приложение, при этом в качестве категории выступает по меньшей мере: категория безопасных приложений, включающая приложения, не имеющие вредоносного функционала; категория вредоносных приложений, включающая приложения, имеющие вредоносный функционал; категория неизвестных приложений, включающая приложения, функционал которых неизвестен, степень вредоносности приложения, обозначающая вероятность того, что приложение имеет вредоносный функционал.In another particular case of the implementation of the method, the characteristics of the application are at least: the category to which the specified application belongs, while the category is at least: the category of safe applications, including applications that do not have malicious functionality; a category of malicious applications, which includes applications that have malicious functionality; category of unknown applications, including applications whose functionality is unknown, the degree of malware application, indicating the likelihood that the application has malicious functionality.

В другом частном случае реализации способа выбирают из базы приложений характеристики приложения, промежуточный код которого имеет степень схожести со сформированным промежуточным кодом выше заранее заданного порогового значения.In another particular case of the implementation of the method, the characteristics of the application are selected from the application base, the intermediate code of which has a degree of similarity with the generated intermediate code above a predetermined threshold value.

Еще в одном частном случае реализации способа дополнительно на основании определенной степени вредоносности заданного приложения определяют по меньшей мере: участка кода в заданном приложении, схожего по функциональности с участком кода приложения из базы приложений; вредоносного кода, внедренного в заданное приложение; категорию заданного приложения.In another particular case of the implementation of the method, on the basis of a certain degree of harmfulness of a given application, at least are determined: a piece of code in a given application, similar in functionality to a piece of application code from the application base; malicious code embedded in a given application; the category of the given application.

Краткое описание чертежейBrief Description of Drawings

Фиг. 1 представляет структурную схему системы определения степени вредоносности.FIG. 1 is a block diagram of a system for determining the degree of harmfulness.

Фиг. 2 представляет структурную схему способа определения степени вредоносности.FIG. 2 is a block diagram of a method for determining the degree of harmfulness.

Фиг. 3 представляет пример компьютерной системы общего назначения, персональный компьютер или сервер.FIG. 3 is an example of a general purpose computer system, personal computer or server.

Хотя изобретение может иметь различные модификации и альтернативные формы, характерные признаки, показанные в качестве примера на чертежах, будут описаны подробно. Следует понимать, однако, что цель описания заключается не в ограничении изобретения конкретным его воплощением. Наоборот, целью описания является охват всех изменений, модификаций, входящих в рамки данного изобретения, как это определено приложенной формуле.Although the invention may take various modifications and alternative forms, the characteristic features shown by way of example in the drawings will be described in detail. It should be understood, however, that the purpose of the description is not to limit the invention to a specific embodiment. On the contrary, the purpose of the description is to cover all changes, modifications falling within the scope of this invention, as defined by the appended claims.

Описание вариантов осуществления изобретенияDescription of embodiments of the invention

Объекты и признаки настоящего изобретения, способы для достижения этих объектов и признаков станут очевидными посредством отсылки к примерным вариантам осуществления. Однако настоящее изобретение не ограничивается примерными вариантами осуществления, раскрытыми ниже, оно может воплощаться в различных видах. Сущность, приведенная в описании, является ничем иным, как конкретными деталями, необходимыми для помощи специалисту в области техники в исчерпывающем понимании изобретения, и настоящее изобретение определяется в объеме приложенной формулы.The objects and features of the present invention, methods for achieving these objects and features will become apparent by reference to exemplary embodiments. However, the present invention is not limited to the exemplary embodiments disclosed below, but may be embodied in various forms. The essence recited in the description is nothing more than the specific details necessary to assist a person skilled in the art in a thorough understanding of the invention, and the present invention is defined within the scope of the appended claims.

Структурная схема системы определения степени вредоносности состоит из средства формирования исходного кода 110, средства формирования промежуточного кода 120, средства выборки характеристик 130, базы приложений 131, средства определения степени вредоносности 140, средства идентификации 150.The structural diagram of the system for determining the degree of harmfulness consists of means for generating the source code 110, means for generating intermediate code 120, means for selecting characteristics 130, application base 131, means for determining the degree of harmfulness 140, identification means 150.

В качестве приложения 101 может выступать по меньшей мере:Application 101 can be at least:

один исполнимый файл;one executable file;

по меньшей мере один исполнимый файл и по меньшей мере один файл ресурсов, совокупно обеспечивающие функционал приложения 101.at least one executable file and at least one resource file collectively providing the functionality of the application 101.

Например, в качестве исполнимых файлов могут использоваться файлы форматов РЕ, ELF, MACHO, BIN и т.д.For example, files of PE, ELF, MACHO, BIN, etc. formats can be used as executable files.

Средство формирования исходного кода 110 предназначено для формирования для заданного приложения 101 исходного кода.The source code generator 110 is designed to generate source code for a given application 101.

В одном из вариантов реализации системы исходный код формируют декомпиляцией заданного приложения 101.In one embodiment of the system, the source code is generated by decompiling a given application 101.

Например, дизассемблер выбирается в зависимости от типа приложения из заранее подготовленного набора дизассемблеров.For example, a disassembler is selected depending on the type of application from a pre-prepared set of disassemblers.

Еще в одном примере в качестве дизассемблера может использоваться «Capstone disassemblersIn another example, Capstone disassemblers can be used as a disassembler.

Еще в одном из вариантов реализации системы декомпиляция выполняется таким образом, чтобы по приложениям, работающим на разных платформах и имеющим разные типы (например, РЕ и ELF), но имеющие схожий функционал формируется схожий исходный код.In another embodiment of the system, decompilation is performed in such a way that for applications running on different platforms and having different types (for example, PE and ELF), but having similar functionality, a similar source code is generated.

Средство формирования промежуточного кода 120 предназначено для формирования промежуточного кода на основании сформированного исходного кода.The intermediate code generating means 120 is for generating an intermediate code based on the generated source code.

При этом промежуточный код или байт-код (байткод; англ. bytecode, также иногда р-код, p-code от portable code) - стандартное промежуточное представление, в которое может быть переведена компьютерная программа автоматическими средствами. По сравнению с исходным кодом, удобным для создания и чтения человеком, байт-код - это компактное представление программы, уже прошедшей синтаксический и семантический анализ. В нем в явном виде закодированы типы, области видимости и другие конструкции. С технической точки зрения байт-код представляет собой машинно-независимый код низкого уровня, генерируемый транслятором из исходного кода.In this case, the intermediate code or byte code (bytecode; English bytecode, also sometimes p-code, p-code from portable code) is a standard intermediate representation into which a computer program can be translated by automatic means. Compared to source code, which is easy to create and read by humans, bytecode is a compact representation of a program that has already gone through parsing and semantic analysis. It explicitly encodes types, scopes, and other constructs. Technically speaking, bytecode is a low-level machine-independent code generated by a translator from source code.

В одном из вариантов реализации системы средство формирования промежуточного кода 120 дополнительно предназначено для:In one embodiment of the system, the intermediate code generating means 120 is additionally intended for:

а) выделения из сформированного исходного кода по меньшей мере одного базового блока, где в качестве базового блока может выступать по меньшей мере:a) extracting at least one basic block from the generated source code, where at least one of the following can act as the basic block:

функция;function;

совокупность инструкций, расположенных между инструкцией перехода по заданному адресу и инструкцией, расположенной по заданному адресу;a set of instructions located between a jump instruction at a given address and an instruction located at a given address;

б) по меньшей мере:b) at least:

вычисления свертки от базового блока;calculating convolution from the base block;

выделения кодов инструкций (англ. opcode), содержащихся в базовом блоке;highlighting instruction codes (English opcode) contained in the base unit;

выделения константных аргументов инструкций, содержащихся в базовом блоке;allocation of constant arguments of instructions contained in the basic block;

выделения переменные со стека;allocating variables from the stack;

выделения не попадающих в область виртуальных адресов аргументов инструкций, содержащихся в базовом блоке,allocation of arguments of instructions contained in the base block that do not fall into the range of virtual addresses,

формирования символьного выражения базового блока, включающее для инструкций, содержащихся в базовом блоке, по меньшей мере:forming a symbolic expression of the base unit, including, for instructions contained in the base unit, at least:

переменную на входе инструкции;variable at the input of the instruction;

константный аргумент инструкции;constant argument to the instruction;

результат вызова инструкции.the result of invoking the instruction.

из базового блока.from the base unit.

Таким образом при подготовке промежуточного кода для выделенного блока сохраняется логика работы упомянутого блока, а большая часть переменных данных исключается. В результате два базовых блока, работающих с разными данными, но применяющие схожие алгоритмы будут иметь схожие промежуточные коды.Thus, when preparing the intermediate code for the selected block, the logic of the operation of the said block is preserved, and most of the variable data is excluded. As a result, two basic blocks working with different data, but using similar algorithms, will have similar intermediate codes.

Еще в одном из вариантов реализации системы в качестве функции свертки выступает алгоритм MD5. Таким образом промежуточный код может составлять совокупность М05-значений.In another embodiment of the system, the MD5 algorithm acts as a convolution function. Thus, the intermediate code can be a collection of M05 values.

Еще в одном из вариантов реализации системы из промежуточного кода исключают инструкции, относящиеся по меньшей мере к:In yet another embodiment of the system, instructions relating at least to:

коду стандартных библиотек;standard library code;

коду известных безопасных приложений.known secure applications.

Например, предварительно составляется коллекция безопасных файлов (англ. whitelist collection), для которой формируется промежуточный код аналогично описанному выше. Промежуточный код по базовым блокам, присутствующим в приложениях указанной коллекции исключается из промежуточного кода, сформированного по заданному файлу 101.For example, a whitelist collection is preliminarily compiled, for which intermediate code is generated similarly to the one described above. The intermediate code for the basic blocks present in the applications of the specified collection is excluded from the intermediate code generated for the specified file 101.

Исключение из промежуточного кода описанных выше инструкций позволяет уменьшить вероятность ошибок первого и второго рода при определении вредоносности приложения, поскольку уменьшает объем данных, описывающих безопасный код (и безопасный функционал) относительно общего объема данных заданного приложения 101.The exclusion of the instructions described above from the intermediate code makes it possible to reduce the likelihood of type I and type II errors in determining the maliciousness of an application, since it reduces the amount of data describing safe code (and safe functionality) relative to the total amount of data for a given application 101.

Средство выборки характеристик 130 предназначено для выборки характеристик по меньшей мере одного приложения из базы приложений 131 на основании сформированного промежуточного кода.The feature extractor 130 is designed to retrieve the characteristics of at least one application from the application base 131 based on the generated intermediate code.

База приложений 131 предназначена для хранения характеристик и сформированного промежуточного кода по меньшей мере одного приложения.The application base 131 is designed to store characteristics and generated intermediate code of at least one application.

В одном из вариантов реализации системы в качестве характеристик приложения выступают по меньшей мере:In one embodiment of the system, the characteristics of the application are at least:

категория, к которой относится указанное приложение, при этом в качестве категории выступает по меньшей мере:the category to which the specified application belongs, while the category is at least:

категория безопасных приложений, включающая приложения, не имеющие вредоносного функционала;a category of safe applications, which includes applications that do not have malicious functionality;

категория вредоносных приложений, включающая приложения, имеющие вредоносный функционал;a category of malicious applications, which includes applications that have malicious functionality;

категория неизвестных приложений, включающая приложения, функционал которых неизвестен,category of unknown applications, including applications, the functionality of which is unknown,

степень вредоносности приложения, обозначающая вероятность того, что приложение имеет вредоносный функционал.the severity of the application, indicating the likelihood that the application has malicious functionality.

При этом каждая категория приложения может содержать в себя подкатегории, например категория вредоносных приложений может содержать в себе категории семейств (типов) вредоносных приложений (например, черви, трояны, потенциально опасные приложения и т.д.).Moreover, each category of an application can contain subcategories, for example, a category of malicious applications can contain categories of families (types) of malicious applications (for example, worms, Trojans, potentially dangerous applications, etc.).

Еще в одном из вариантов реализации системы выбирают из базы приложений 131 характеристики приложения, промежуточный код которого имеет степень схожести со сформированным промежуточным кодом выше заранее заданного порогового значения.In another embodiment of the system, the application characteristics are selected from the application database 131, the intermediate code of which has a degree of similarity with the generated intermediate code above a predetermined threshold value.

Средство определения степени вредоносности 140 предназначено для определения степени вредоносности заданного приложения 101 на основании выбранных характеристик приложения.The means for determining the severity level 140 is for determining the severity level of a given application 101 based on the selected characteristics of the application.

В одном из вариантов реализации системы степень вредоносности представляет собой численное значение в заданном диапазоне (например, от 0 до 100), где минимальное значение означает гарантированно безопасное приложение, а максимальное значение - гарантированно вредоносное.In one embodiment of the system, the severity level is a numerical value in a given range (for example, from 0 to 100), where the minimum value means a guaranteed safe application, and the maximum value means a guaranteed malicious one.

Средство идентификации 150 предназначено для определения на основании степени вредоносности заданного приложения 101 по меньшей мере:The identification means 150 is intended to determine, based on the severity of the specified application 101, at least:

участка кода 151 в заданном приложении, схожего по функциональности с участком кода приложения из базы приложений 131;a piece of code 151 in a given application, similar in functionality to a piece of application code from the application base 131;

вредоносного кода, внедренного в заданное приложениеmalicious code embedded in a given application

категорию заданного приложения.the category of the given application.

В одном из вариантов реализации системы база приложений 131 содержит вредоносные приложения, что позволяет находить вредоносный код в заданном приложении 101. Это в свою очередь позволяет быстрее реагировать антивирусным компаниям и компаниям, занимающимся информационной безопасностью, на появление новых, неизвестных ранее вредоносных приложений (код приложений является новым, но вредоносные алгоритмы - известными, что и позволяет обнаруживать новые приложения).In one implementation of the system, the application database 131 contains malicious applications, which makes it possible to find malicious code in a given application 101. This, in turn, allows antivirus and information security companies to react more quickly to the emergence of new, previously unknown malicious applications (application code is new, but malicious algorithms are known, which allows new applications to be detected).

Еще в одном из вариантов реализации системы база приложений 131 содержит только безопасные приложения. Поэтому при анализе заданного приложения, схожего с приложениями из базы приложений 131, но содержащего внедренный код, этот код выявляется, поскольку содержащие его базовые блоки отсутствуют в приложениях из базы приложений 131. После того, как неизвестные блоки выявляются, они (исполнимый код, содержащийся в базовых блоках) могут быть проанализированы любым известным из уровня техники способом (например, на основании сигнатурного или эвристического анализа). Если обнаруживается вредоносный функционал в анализируемых блоках, то признается, что в заданное приложение был внедрен вредоносный код.In yet another embodiment of the system, the application base 131 contains only secure applications. Therefore, when analyzing a given application, which is similar to applications from the application database 131, but contains embedded code, this code is detected, since the basic blocks containing it are absent in applications from the application database 131. After the unknown blocks are identified, they (the executable code contained in basic blocks) can be analyzed by any method known from the prior art (for example, based on signature or heuristic analysis). If malicious functionality is detected in the analyzed blocks, then it is recognized that malicious code has been injected into the specified application.

Самый простой пример описанного выше случая - приложение, инфицированное один из вирусов.The simplest example of the case described above is an application infected with one of the viruses.

Рассмотрим пример описанной выше технологии: анализируются два файла (File 1, File 2), из каждого блока выделяются функциональные блоки, по которым вычисляется промежуточный код и свертка. Полученные свертки сравниваются между собой на схожесть:Consider an example of the technology described above: two files are analyzed (File 1, File 2), functional blocks are extracted from each block, by which intermediate code and convolution are calculated. The resulting convolutions are compared with each other for similarity:

File 1:File 1:

Базовый блок #1 (до первой инструкции передачи управления):Base unit # 1 (before the first control transfer instruction):

Промежуточный код #1 (содержащий инструкции):Intermediate code # 1 (containing instructions):

Свертка:Convolution:

(адреса начала и конца базового блока, md5 хеш исходных инструкций базового блока, md5 хеш от блока псевдоинструкций, строка исходных байт)(addresses of the beginning and end of the basic block, md5 hash of the original instructions of the basic block, md5 hash from the block of pseudo-instructions, a string of original bytes)

File 2:File 2:

Свертка:Convolution:

Видно, что исходные байты и их хеш не совпадают, а хеши псевдокода одинаковые.It can be seen that the original bytes and their hash do not match, but the pseudocode hashes are the same.

Структурная схема способа определения степени вредоносности содержит этап 210, на котором формируют исходный код, этап 220, на котором формируют промежуточный код, этап 230, на котором выбирают характеристики, этап 240, на котором определяют степень вредоносности, этап 250, на котором определяют участок кода.The block diagram of the method for determining the degree of harmfulness comprises step 210, at which the source code is generated, step 220, at which the intermediate code is generated, step 230, at which the characteristics are selected, step 240, at which the degree of harmfulness is determined, step 250, at which the section of the code is determined ...

На этапе 210 с помощью средства формирования исходного кода 110 формируют для заданного приложения 101 его исходный код.In step 210, the source code for the given application 101 is generated by the source code generating means 110.

На этапе 220 с помощью средства формирования промежуточного кода 120 формируют промежуточный код на основании сформированного на этапе 210 исходного кода.In step 220, the intermediate code generating means 120 generates an intermediate code based on the source code generated in step 210.

В одном из частных случаев реализации способа этап 220 содержит этапы, на которых:In one of the special cases of the implementation of the method, step 220 comprises the steps at which:

а) выделяют из сформированного на этапе 210 исходного кода кода по меньшей мере один базовый блок, где в качестве базового блока выступает по меньшей мере:a) at least one basic block is extracted from the source code generated at step 210, where at least one of the following acts as the basic block:

функция;function;

б) на основании выделенного базового блока по меньшей мере:b) based on the allocated base unit at least:

вычисляют свертку от базового блока;calculating convolution from the base block;

выделяют коды инструкций, содержащихся в базовом блоке;allocate codes of instructions contained in the base unit;

выделяют константные аргументы инструкций, содержащихся в базовом блоке;extract the constant arguments of the instructions contained in the base block;

выделяют не попадающие в область виртуальных адресов аргументы инструкций, содержащихся в базовом блоке,allocate the arguments of the instructions contained in the base block that do not fall into the scope of virtual addresses,

формируют символьное выражение базового блока, включающее для инструкций, содержащихся в базовом блоке, по меньшей мере:a symbolic expression of the base block is formed, including, for the instructions contained in the base block, at least:

На этапе 230 с помощью средства выборки характеристик 130 осуществляют выборку характеристик по меньшей мере одного приложения из базы приложений 131 на основании сформированного на этапе 220 промежуточного кода.At step 230, the feature fetch means 130 fetches the features of at least one application from the application database 131 based on the intermediate code generated at step 220.

На этапе 240 с помощью средства определения степени вредоносности 140 определяют степень вредоносности заданного приложения 101 на основании выбранных на этапе 230 характеристик приложения.In step 240, the severity determination means 140 determines the severity of the specified application 101 based on the characteristics of the application selected in step 230.

На этапе 250 с помощью средства идентификации 150 на основании определенной на этапе 240 степени вредоносности заданного приложения 101 определяют по меньшей мере:At step 250, using the identification means 150, based on the severity level of the specified application 101 determined at step 240, at least:

участок кода в заданном приложении 101, схожего по функциональности с участком кода приложения из базы приложений;a piece of code in a given application 101, similar in functionality to a piece of application code from the application base;

вредоносный код, внедренный в заданное приложение 101;malicious code embedded in a given application 101;

категорию заданного приложения 101.preset application category 101.

Фиг. 3 представляет пример компьютерной системы общего назначения, персональный компьютер или сервер 20, содержащий центральный процессор 21, системную память 22 и системную шину 23, которая содержит разные системные компоненты, в том числе память, связанную с центральным процессором 21. Системная шина 23 реализована, как любая известная из уровня техники шинная структура, содержащая в свою очередь память шины или контроллер памяти шины, периферийную шину и локальную шину, которая способна взаимодействовать с любой другой шинной архитектурой. Системная память содержит постоянное запоминающее устройство (ПЗУ) 24, память с произвольным доступом (ОЗУ) 25. Основная система ввода/вывода (BIOS) 26, содержит основные процедуры, которые обеспечивают передачу информации между элементами персонального компьютера 20, например, в момент загрузки операционной системы с использованием ПЗУ 24.FIG. 3 shows an example of a general-purpose computer system, a personal computer or server 20, comprising a central processing unit 21, a system memory 22, and a system bus 23 that contains various system components, including memory associated with the central processing unit 21. The system bus 23 is implemented as any bus structure known from the prior art, containing in turn a bus memory or a bus memory controller, a peripheral bus and a local bus that is capable of interfacing with any other bus architecture. System memory contains read-only memory (ROM) 24, random access memory (RAM) 25. The main input / output system (BIOS) 26 contains basic procedures that transfer information between the elements of the personal computer 20, for example, at the time of loading the operating room. systems using ROM 24.

Персональный компьютер 20 в свою очередь содержит жесткий диск 27 для чтения и записи данных, привод магнитных дисков 28 для чтения и записи на сменные магнитные диски 29 и оптический привод 30 для чтения и записи на сменные оптические диски 31, такие как CD-ROM, DVD-ROM и иные оптические носители информации. Жесткий диск 27, привод магнитных дисков 28, оптический привод 30 соединены с системной шиной 23 через интерфейс жесткого диска 32, интерфейс магнитных дисков 33 и интерфейс оптического привода 34 соответственно. Приводы и соответствующие компьютерные носители информации представляют собой энергонезависимые средства хранения компьютерных инструкций, структур данных, программных модулей и прочих данных персонального компьютера 20.The personal computer 20, in turn, contains a hard disk 27 for reading and writing data, a magnetic disk drive 28 for reading and writing to removable magnetic disks 29 and an optical drive 30 for reading and writing to removable optical disks 31, such as CD-ROM, DVD -ROM and other optical media. The hard disk 27, the magnetic disk drive 28, and the optical drive 30 are connected to the system bus 23 via the hard disk interface 32, the magnetic disk interface 33, and the optical drive interface 34, respectively. Drives and corresponding computer storage media are non-volatile storage media for computer instructions, data structures, program modules and other data of a personal computer 20.

Настоящее описание раскрывает реализацию системы, которая использует жесткий диск 27, сменный магнитный диск 29 и сменный оптический диск 31, но следует понимать, что возможно применение иных типов компьютерных носителей информации 56, которые способны хранить данные в доступной для чтения компьютером форме (твердотельные накопители, флеш карты памяти, цифровые диски, память с произвольным доступом (ОЗУ) и т.п.), которые подключены к системной шине 23 через контроллер 55.The present description discloses an implementation of a system that uses a hard disk 27, a removable magnetic disk 29 and a removable optical disk 31, but it should be understood that other types of computer storage media 56 that are capable of storing data in a computer readable form (solid state drives, flash memory cards, digital disks, random access memory (RAM), etc.), which are connected to the system bus 23 through the controller 55.

Компьютер 20 имеет файловую систему 36, где хранится записанная операционная система 35, а также дополнительные программные приложения 37, другие программные модули 38 и данные программ 39. Пользователь имеет возможность вводить команды и информацию в персональный компьютер 20 посредством устройств ввода (клавиатуры 40, манипулятора «мышь» 42). Могут использоваться другие устройства ввода (не отображены): микрофон, джойстик, игровая консоль, сканер и т.п.Подобные устройства ввода по своему обычаю подключают к компьютерной системе 20 через последовательный порт 46, который в свою очередь подсоединен к системной шине, но могут быть подключены иным способом, например, при помощи параллельного порта, игрового порта или универсальной последовательной шины (USB). Монитор 47 или иной тип устройства отображения также подсоединен к системной шине 23 через интерфейс, такой как видеоадаптер 48. В дополнение к монитору 47, персональный компьютер может быть оснащен другими периферийными устройствами вывода (не отображены), например, колонками, принтером и т.п.Computer 20 has a file system 36, where the recorded operating system 35 is stored, as well as additional software applications 37, other program modules 38 and program data 39. The user has the ability to enter commands and information into the personal computer 20 through input devices (keyboard 40, manipulator " mouse "42). Other input devices can be used (not shown): microphone, joystick, game console, scanner, etc. Similar input devices, as usual, are connected to computer system 20 through serial port 46, which in turn is connected to the system bus, but can be connected in another way, such as a parallel port, game port, or universal serial bus (USB). A monitor 47 or other type of display device is also connected to the system bus 23 via an interface such as a video adapter 48. In addition to the monitor 47, the personal computer may be equipped with other peripheral output devices (not displayed), such as speakers, a printer, etc. ...

Персональный компьютер 20 способен работать в сетевом окружении, при этом используется сетевое соединение с другим или несколькими удаленными компьютерами 49. Удаленный компьютер (или компьютеры) 49 являются такими же персональными компьютерами или серверами, которые имеют большинство или все упомянутые элементы, отмеченные ранее при описании существа персонального компьютера 20, представленного на Фиг. 3. В вычислительной сети могут присутствовать также и другие устройства, например, маршрутизаторы, сетевые станции, пиринговые устройства или иные сетевые узлы.The personal computer 20 is capable of operating in a networked environment using a network connection with other or more remote computers 49. The remote computer (or computers) 49 are the same personal computers or servers that have most or all of the elements mentioned earlier in the description of the entity. the personal computer 20 shown in FIG. 3. In a computer network, there may also be other devices, such as routers, network stations, peer-to-peer devices, or other network nodes.

Сетевые соединения могут образовывать локальную вычислительную сеть (LAN) 50 и глобальную вычислительную сеть (WAN). Такие сети применяются в корпоративных компьютерных сетях, внутренних сетях компаний и, как правило, имеют доступ к сети Интернет. В LAN- или WAN-сетях персональный компьютер 20 подключен к локальной сети 50 через сетевой адаптер или сетевой интерфейс 51. При использовании сетей персональный компьютер 20 может использовать модем 54 или иные средства обеспечения связи с глобальной вычислительной сетью, такой как Интернет. Модем 54, который является внутренним или внешним устройством, подключен к системной шине 23 посредством последовательного порта 46. Следует уточнить, что сетевые соединения являются лишь примерными и не обязаны отображать точную конфигурацию сети, т.е. в действительности существуют иные способы установления соединения техническими средствами связи одного компьютера с другим.Network connections can form a local area network (LAN) 50 and a wide area network (WAN). Such networks are used in corporate computer networks, internal networks of companies and, as a rule, have access to the Internet. In LAN or WAN networks, personal computer 20 is connected to local network 50 via a network adapter or network interface 51. When using networks, personal computer 20 may use a modem 54 or other means of providing communication with a wide area network, such as the Internet. Modem 54, which is an internal or external device, is connected to the system bus 23 via a serial port 46. It should be noted that the network connections are only exemplary and are not required to reflect the exact configuration of the network, i. E. in fact, there are other ways of establishing a connection by technical means of communication of one computer with another.

В заключение следует отметить, что приведенные в описании сведения являются примерами, которые не ограничивают объем настоящего изобретения, определенного формулой.In conclusion, it should be noted that the information given in the description are examples, which do not limit the scope of the present invention defined by the claims.

Claims

1. System for determining the degree of harmfulness, which contains:

a) a source code generating means for generating a source code for a given application;

b) intermediate code generating means for generating intermediate code based on the generated source code;

c) a feature selection means for selecting the characteristics of at least one application from the application base based on the generated intermediate code;

d) means for determining the degree of severity, designed to determine the degree of severity of a given application based on the selected characteristics of the application.

2. The system according to claim 1, in which at least one of the following can act as an application:

executable file;

a collection of at least one executable file and at least one resource file.

3. The system of claim 1, wherein the source code is generated by decompiling a given application.

4. The system according to claim 1, in which the intermediate code generating means is additionally intended for:

a) extracting at least one basic block from the generated source code, where at least one of the following can act as the basic block:

function;

a set of instructions located between a jump instruction at a given address and an instruction located at a given address;

b) at least:

calculating convolution from the base block;

highlighting the codes of instructions contained in the base unit;

allocation of constant arguments of instructions contained in the basic block;

allocating variables from the stack;

allocation of arguments of instructions contained in the base block that do not fall into the range of virtual addresses,

forming a symbolic expression of the base unit, including, for instructions contained in the base unit, at least:

variable at the input of the instruction;

constant argument to the instruction;

the result of calling the instruction from the base unit.

5. The system of claim 4, wherein the intermediate code excludes instructions relating at least to:

standard library code;

known secure applications.

6. The system of claim. 1, in which the application base contains characteristics and generated intermediate code of at least one application.

7. The system according to claim 1, in which the characteristics of the application are at least:

the category to which the specified application belongs, while the category is at least:

a category of safe applications, which includes applications that do not have malicious functionality;

a category of malicious applications, which includes applications that have malicious functionality;

category of unknown applications, including applications, the functionality of which is unknown,

the severity of the application, indicating the likelihood that the application has malicious functionality.

8. The system according to claim 1, in which the characteristics of the application are selected from the application base, the intermediate code of which is similar to the generated intermediate code above a predetermined threshold value.

9. The system according to claim 1, which further comprises identification means for determining, based on the severity of a given application, at least:

a piece of code in a given application, similar in functionality to a piece of application code from the application base;

malicious code embedded in a given application;

the category of the given application.

10. A method for determining the degree of harmfulness, the method comprising stages that are implemented using the means from the system according to claim 1 and in which:

a) generate its source code for a given application;

b) generating an intermediate code based on the generated source code;

c) sampling the characteristics of at least one application from the application database based on the generated intermediate code;

d) determine the severity of a given application based on the selected characteristics of the application.

11. The method according to claim 10, wherein at least:

one executable file;

at least one executable file and at least one resource file collectively providing application functionality.

12. The method of claim 10, wherein the source code is generated by decompiling the specified application.

13. The method according to claim 10, according to which the formation of the intermediate code comprises the stages, at which:

a) at least one basic block is extracted from the generated source code, where at least one of the following acts as the basic block:

function;

b) based on the allocated base unit at least:

calculating convolution from the base block;

allocate codes of instructions contained in the base unit;

extract the constant arguments of the instructions contained in the base block;

allocate the arguments of the instructions contained in the base block that do not fall into the scope of virtual addresses,

a symbolic expression of the base block is formed, including, for the instructions contained in the base block, at least:

variable at the input of the instruction;

constant argument to the instruction;

the result of invoking the instruction.

14. The method of claim 13, wherein the intermediate code excludes instructions relating to at least:

standard library code;

known secure applications.

15. The method of claim 10, wherein the application base comprises characteristics and generated intermediate code of at least one application.

16. The method according to claim 10, wherein at least the following are the characteristics of the application:

17. The method according to claim 10, according to which characteristics of an application are selected from the application base, the intermediate code of which has a degree of similarity with the generated intermediate code above a predetermined threshold value.

18. The method according to claim 10, according to which, additionally, based on a certain degree of harmfulness of a given application, at least is determined:

malicious code embedded in a given application;

the category of the given application.