EA042211B1

EA042211B1 - METHOD AND SYSTEM FOR PROTOCOL MESSAGE CLASSIFICATION IN DATA NETWORK

Info

Publication number: EA042211B1
Application number: EA202190290
Authority: EA
Inventors: Эммануэле Цамбон
Original assignee: Секьюрити Мэттерс Б.В.
Priority date: 2011-07-26
Filing date: 2012-07-26
Publication date: 2023-01-25

Description

Область техники, к которой относится изобретениеThe field of technology to which the invention belongs

Изобретение относится к области сетей передачи данных, в частности к области классифицирования сообщений в сетях передачи данных, например для обнаружения вредоносных вторжений в таких сетях передачи данных.The invention relates to the field of data networks, in particular to the field of message classification in data networks, for example, to detect malicious intrusions in such data networks.

Уровень техникиState of the art

Во многих сетях передачи данных разворачиваются системы обнаружения для обнаружения вредоносных вторжений. Такие вторжения содержат данные источника атаки или зараженных компьютеров, которые могут оказать воздействие на работу серверов, компьютеров или иного оборудования.Many data networks deploy detection systems to detect malicious intrusions. Such intrusions contain data from the source of the attack or infected computers, which may affect the operation of servers, computers or other equipment.

Существует два основных типа таких систем обнаружения вторжения: системы обнаружения вторжения, основанные на сигнатуре и основанные на аномалии.There are two main types of such intrusion detection systems: signature-based and anomaly-based intrusion detection systems.

Основанные на сигнатуре системы обнаружения вторжения (SBS) основываются на методиках сопоставления с образцом. Система содержит базу данных сигнатур, т.е. последовательности данных, которые известны по атакам в прошлом. Эти сигнатуры сопоставляются с тестируемыми данными. Когда найдено совпадение, вызывается оповещение о тревоге. Требуется обновление базы данных сигнатур экспертами после идентификации новой атаки.Signature-based intrusion detection systems (SBS) are based on pattern matching techniques. The system contains a database of signatures, i.e. sequences of data that are known from attacks in the past. These signatures are matched against the data being tested. When a match is found, an alert is raised. The signature database needs to be updated by experts after a new attack is identified.

В отличие от этого, основанная на аномалии система обнаружения вторжения (ABS) сначала строит статистическую модель, описывающую нормальный сетевой трафик во время так называемой фазы обучения. Затем, во время так называемой фазы тестирования система анализирует данные и классифицирует любой трафик или действие, которое значительно отличается от модели, как атаку. Преимущество основанной на аномалии системы состоит в том, что она может обнаружить атаки нулевого дня, т.е. атаки, которые как таковые еще не были идентифицированы экспертами. Для обнаружения наибольшего количества атак, ABS требуется инспектировать полезную нагрузку сетевого трафика.In contrast, an anomaly-based intrusion detection system (ABS) first builds a statistical model describing normal network traffic during a so-called learning phase. Then, during the so-called testing phase, the system analyzes the data and classifies any traffic or activity that differs significantly from the model as an attack. The advantage of an anomaly-based system is that it can detect zero-day attacks, i.e. attacks that as such have not yet been identified by experts. To detect the largest number of attacks, ABS needs to inspect the payload of network traffic.

Существующие способы основаны на анализе n-грамм, который либо применяется к (необработанной) полезной нагрузке пакета, либо к ее частям.Existing methods are based on n-gram analysis, which is either applied to the (raw) packet payload or parts of it.

Тем не менее, в некоторых сетях передачи данных вредоносные данные очень похожи на допустимые данные. Это может иметь место в так называемой сети SCADA (Диспетчерское Управление и Сбор Данных) или другой Сети Управления Производственным Процессом. В SCADA или сети Управления Производственным Процессом осуществляется обмен сообщениями протокола между компьютерами, серверами и иным оборудованием на прикладном уровне сети передачи данных. Эти сообщения протокола могут содержать инструкции для управления машинами. Сообщение протокола с вредоносной инструкцией (установить частоту вращения равной 100 об/мин) может быть очень похоже на допустимую инструкцию (установить частоту вращения равной 10 об/мин).However, in some data networks, malicious data is very similar to legitimate data. This may take place in a so-called SCADA (Supervisory Control and Data Acquisition) network or other Process Control Network. In a SCADA or Manufacturing Process Control network, protocol messages are exchanged between computers, servers and other equipment at the application layer of the data network. These protocol messages may contain instructions for controlling machines. A protocol message with a malicious instruction (set RPM to 100 RPM) can look very similar to a valid instruction (Set RPM to 10 RPM).

Когда вредоносные данные очень похожи на допустимые данные, вредоносные данные могут быть классифицированы основанной на аномалии системой обнаружения вторжения как нормальные или допустимые данные, что может подвергать опасности работу компьютеров, серверов и иного оборудования в сети.When malicious data closely resembles legitimate data, the malicious data may be classified by the anomaly-based intrusion detection system as normal or legitimate data, which can compromise the operation of computers, servers, and other equipment on the network.

Сущность изобретенияThe essence of the invention

Цель изобретения может состоять в предоставлении усовершенствованной системы и/или способа обнаружения вторжения.The purpose of the invention may be to provide an improved system and/or method for intrusion detection.

В соответствии с аспектом изобретения, предоставляется способ обнаружения вторжения для обнаружения вторжения в трафике данных сети передачи данных, при этом способ содержит этапы, на которых синтаксически анализируют трафик данных для извлечения по меньшей мере одного поля протокола сообщения протокола трафика данных;According to an aspect of the invention, there is provided an intrusion detection method for detecting an intrusion in data traffic of a data communication network, the method comprising parsing the data traffic to extract at least one data traffic protocol message protocol field;

ассоциируют извлеченное поле протокола с соответствующей моделью для данного поля протокола, причем модель выбирается из набора моделей;associating the retrieved protocol field with a corresponding model for the given protocol field, the model being selected from the set of models;

оценивают, находится ли содержимое извлеченного поля протокола в безопасной области, как определяется моделью; и генерируют сигнал обнаружения вторжения в случае, когда установлено, что содержимое извлеченного поля протокола находится за пределами безопасной области.judging whether the content of the extracted protocol field is in a safe area, as determined by the model; and generating an intrusion detection signal when it is determined that the content of the retrieved protocol field is outside the secure area.

Синтаксический анализ трафика данных позволяет различать отдельные поля протокола (именуемые как поля протокола) в соответствии с которыми имеет место передача данных по сети данных. Затем создается ассоциация (если успешно) между этим полем (полем протокола) и моделью. Для этой цели, предоставляется набор моделей. Выбирается подходящая модель для извлеченного поля протокола, как будет более подробно объяснено ниже. Затем поле протокола оценивается с использованием модели, чтобы установить, находится ли содержимое поле протокола в нормальной, безопасной, приемлемой области или нет. В последнем случае, может быть выполнено подходящее действие. Посредством синтаксического анализа сообщения протокола, можно различать отдельные поля протокола трафика данных, и может быть выбрана подходящая модель для оценки данного конкретного поля протокола. Тем самым, может быть выполнена адекватная оценка, поскольку разные поля протокола могут быть оценены с применением разных моделей, например каждое поле протокола - с применением соответствующей модели, которая приспособлена для данного конкретного поля протокола, например с применением модели, ко- 1 042211 торая приспособлена для типа поля протокола и/или содержимого. Способ обнаружения вторжения в соответствии с изобретением может быть реализуемым компьютером способом обнаружения вторжения. Блок синтаксического анализа (т.е. синтаксический анализ) может использовать предварительно определенную спецификацию протокола. Также, например, в случае, когда протокол неизвестен, протокол может быть изучен посредством отслеживания трафика данных в сети и извлечения из него спецификации протокола.The parsing of the data traffic makes it possible to distinguish between the individual protocol fields (referred to as protocol fields) according to which data transmission takes place over the data network. An association is then created (if successful) between this field (protocol field) and the model. For this purpose, a set of models is provided. An appropriate model is selected for the extracted protocol field, as will be explained in more detail below. The protocol field is then evaluated using the model to determine whether the contents of the protocol field are in the normal, secure, acceptable area or not. In the latter case, a suitable action may be taken. By parsing the protocol message, the individual protocol fields of the data traffic can be distinguished, and an appropriate model can be selected to evaluate that particular protocol field. Thus, adequate evaluation can be performed because different fields of the protocol can be evaluated using different models, for example, each field of the protocol - using the appropriate model that is adapted to this particular field of the protocol, for example, using a model that is adapted for the protocol field type and/or content. The intrusion detection method according to the invention may be a computer-implemented intrusion detection method. The parser (ie parse) may use a predefined protocol specification. Also, for example, in the case where the protocol is unknown, the protocol can be learned by monitoring data traffic on the network and extracting the protocol specification from it.

В контексте данного документа, под понятием протокол может пониматься набор правил, который определяет содержимое некоторых или всех сообщений, передаваемых по сети данных.In the context of this document, the concept of a protocol can be understood as a set of rules that defines the content of some or all of the messages transmitted over a data network.

Сетевой протокол может содержать определение сообщений протокола, также известных как Протокольные Единицы Данных (PDU). Сообщение протокола (PDU) в свою очередь может содержать одно или более полей. Может существовать много типов полей. Поле может содержать либо другую PDU, либо элементарный объект данных (например, число, строку или двоичный непрозрачный объект). Как будет более подробно описано ниже, сетевой протокол может быть организован в виде дерева, в котором узлами являются PDU, а листьями дерева являются элементарные объекты данных (поля). Для каждого поля (или каждого существенного поля) может быть предусмотрена отдельная модель. В качестве примера, предположим, что сообщение протокола содержит персональные данные человека (содержащие, например, имя, адрес и персональные установки): тогда сообщение протокола, которое передает персональные данные, может содержать поля имя, адрес, и персональные установки. Поле имя может, например, в свою очередь содержать поля фамилия, имя, имя входа в систему, и т.д. Поле адрес может, например, содержать поля домашний адрес и рабочий адрес. Поле домашний адрес может, например, содержать улицу домашнего адреса, номер домашнего адреса, почтовый индекс домашнего адреса, город домашнего адреса, тогда как поле рабочий адрес может, например, содержать поля улица рабочего адреса, номер рабочего адреса, почтовый индекс рабочего адреса, город рабочего адреса и т.д. Отдельная модель может быть построена для каждого поля. Например, отдельная, соответствующая модель может быть предусмотрена для каждого из полей. В варианте осуществления, одна и та же модель может применяться для подмножества полей, например к полям город рабочего адреса и город домашнего адреса может применяться одна и та же модель.A network protocol may contain the definition of protocol messages, also known as Protocol Data Units (PDUs). A protocol message (PDU) in turn may contain one or more fields. There can be many types of fields. The field may contain either another PDU or an elementary data object (eg, a number, a string, or a binary opaque object). As will be described in more detail below, a network protocol may be organized as a tree in which the nodes are PDUs and the leaves of the tree are elementary data objects (fields). A separate model can be provided for each field (or each significant field). As an example, suppose a protocol message contains a person's personal data (containing, for example, name, address, and personal settings): then a protocol message that conveys personal data may contain name, address, and personal settings fields. The first name field may, for example, in turn contain the last name, first name, login name, and so on fields. The address field may, for example, contain the home address and work address fields. The home address field can, for example, contain the home address street, home address number, home address zip code, home address city, while the work address field can, for example, contain the work address street, work address number, work address zip code, city business address, etc. A separate model can be built for each field. For example, a separate, corresponding model may be provided for each of the fields. In an embodiment, the same model may be applied to a subset of the fields, for example, the same model may be applied to the business address city and home address city fields.

Понятие трафик данных может пониматься таким образом, что оно содержит любые данные, передача которых осуществляется через сеть, такие как поток данных, пакеты данных и т.д. Понятие сеть данных может пониматься таким образом, что оно содержит любую организацию для передачи данных, которая обеспечивает передачу (например, цифровых) данных. Сеть может содержать или быть связана с открытой сетью, такой как Интернет, и/или может содержать частную сеть или виртуальную частную сеть, доступ к которой разрешен только авторизованным пользователям или авторизованному оборудованию. Передача может осуществляться через проводное соединение, оптоволоконное соединение, беспроводное соединение и/или любое другое соединение. Понятие модель может пониматься таким образом, что оно содержит правило или набор правил, который применяется к полю протокола, для того чтобы оценить это поле протокола. Модель может описывать, нормальные, допустимые или свободные от вторжения сообщения протокола. Следует понимать, что чем большее количество сообщений протокола используется на фазе обучения, тем лучше модель может описывать нормальные, допустимые или свободные от вторжения сообщения протокола.The concept of data traffic can be understood in such a way that it includes any data that is transmitted through the network, such as data flow, data packets, and so on. The concept of a data network can be understood in such a way that it includes any organization for data transmission, which provides the transmission of (eg, digital) data. The network may contain or be connected to an open network such as the Internet and/or may contain a private network or virtual private network that is restricted to authorized users or authorized equipment. The transmission may be via a wired connection, an optical fiber connection, a wireless connection, and/or any other connection. The concept of a model can be understood to include a rule or set of rules that is applied to a protocol field in order to evaluate that protocol field. The model may describe normal, valid, or intrusion-free protocol messages. It should be understood that the more protocol messages used in the training phase, the better the model can describe normal, valid, or intrusion-free protocol messages.

Понятие вторжение может пониматься таким образом, что оно содержит любые данные, которые могут быть нежелательными, возможно вредными для компьютерной системы, которая принимает данные, возможно вредными для приложения, запущенного на компьютерной системе, соединенной с сетью данных, или возможно вредными для работы устройства, инсталляции, аппаратуры и т.д., соединенной с сетью данных.The concept of intrusion can be understood in such a way that it contains any data that may be unwanted, possibly harmful to the computer system that receives the data, possibly harmful to an application running on a computer system connected to a data network, or possibly harmful to the operation of a device, installation, apparatus, etc. connected to the data network.

В варианте осуществления, набор моделей содержит соответствующую модель для каждого поля протокола из набора полей протокола. Таким образом, могут быть получены более точные результаты, поскольку для каждого поля протокола может применяться специально подогнанная для этого поля протокола модель.In an embodiment, the model set contains a corresponding model for each protocol field of the protocol field set. In this way, more accurate results can be obtained since a custom-tailored model for that protocol field can be applied for each protocol field.

В варианте осуществления, набор моделей содержит две модели для одного поля протокола, причем конкретная одна из двух моделей для одного поля протокола выбирается на основании значения другого поля, так чтобы возможно дополнительно увеличить точность моделей.In an embodiment, the set of models contains two models for one protocol field, with the specific one of the two models for one protocol field being selected based on the value of the other field, so that it is possible to further increase the accuracy of the models.

Аналогичным образом, в варианте осуществления может выполняться анализ временной последовательности по полю протокола, при этом набор моделей, содержит по меньшей мере две модели для одного поля протокола, причем первая одна из двух моделей ассоциируется с первым временным интервалом, в который наблюдается трафик данных, а вторая одна из моделей ассоциируется со вторым временным интервалом, в который наблюдается трафик данных, причем второй временной интервал, например, не накладывается с первым временным интервалом.Similarly, in an embodiment, time sequence analysis can be performed on a protocol field, with the set of models containing at least two models for one protocol field, the first one of the two models being associated with the first time slot in which data traffic is observed, and the second one of the models is associated with a second time slot in which data traffic is observed, the second time slot, for example, not overlapping with the first time slot.

В варианте осуществления, модель для поля, определяется в фазе обучения, причем фазе обучения, содержащей синтаксический анализ трафика данных для извлечения по меньшей мере одного поля протокола для протокола, применяемого в трафике данных;In an embodiment, a model for a field is determined in a learning phase, the learning phase comprising parsing the data traffic to extract at least one protocol field for the protocol used in the data traffic;

- 2 042211 ассоциирование извлеченного поля протокола с моделью для данного поля протокола, причем модель выбирается из набора моделей; и обновление модели для извлеченного поля протокола с использованием содержимого извлеченного поля протокола.- 2 042211 associating the extracted protocol field with a model for the given protocol field, the model being selected from a set of models; and updating the model for the extracted protocol field using the contents of the extracted protocol field.

Таким образом, наблюдение за трафиком данных может осуществляться на фазе обучения, и содержимое извлеченных полей протокола может быть применено для обновления соответствующих моделей, с которыми ассоциированы поля протокола. Если между извлеченным полем протокола и одной из моделей ассоциация не может быть создана, то для извлеченного поля протокола может быть создана и добавлена в набор моделей новая модель.Thus, data traffic can be observed during the training phase, and the contents of the extracted protocol fields can be used to update the corresponding models with which the protocol fields are associated. If an association cannot be created between the extracted protocol field and one of the models, then a new model can be created for the extracted protocol field and added to the model set.

Следовательно, могут быть выделены две фазы: фаза обучения, на которой строится модель сообщений протокола. Эти сообщения протокола на фазе обучения могут быть созданы на основании протокола связи или могут быть извлечены из трафика данных в сети передачи данных.Therefore, two phases can be distinguished: the learning phase, on which the protocol message model is built. These protocol messages in the learning phase may be generated based on the communication protocol or may be derived from data traffic in the data network.

Поскольку сообщения протокола могут быть описаны посредством их структуры и значения полей протокола, то модель может касаться полей протокола на фазе обучения и их значений. Разные поля протокола на фазе обучения могут иметь разные типы данных, т.е. их значение может быть числом (таким как целое число, число с плавающей точкой и т.д.), строкой, логическим или двоичным значением. Это может быть определено протоколом связи. Модель может строиться в соответствии с типом данных по меньшей мере одного поля протокола.Since protocol messages can be described in terms of their structure and the meaning of the protocol fields, the model can refer to the protocol fields in the learning phase and their values. Different protocol fields in the learning phase can have different data types, i.e. their value can be a number (such as an integer, a float, etc.), a string, a boolean, or a binary value. This may be determined by the communication protocol. The model may be built according to the data type of at least one protocol field.

Определенное поле протокола и/или определенное значение упомянутого поля протокола сравнивается с моделью и классифицируется на основании сравнения. Сообщение протокола может быть классифицировано как аномалия, т.е. находящееся за пределами безопасной области, которая определяется моделью (и, следовательно, в качестве возможной опасности) на основании сравнения.The determined protocol field and/or the determined value of said protocol field is compared with the model and classified based on the comparison. A protocol message can be classified as an anomaly, i.e. outside the safe area, which is determined by the model (and therefore as a possible hazard) based on the comparison.

На фазе обучения, сообщения протокола, которые применяются для обучения модели, могут быть получены из трафика данных в сети. Альтернативно, или в дополнение к этому, могут быть применены данные симуляции. На фазе обучения, сообщения протокола с возможным вторжением могут выявляться статистическими способами, т.е. нечасто используемые сообщения протокола или сообщения протокола с редким содержимым, могут удаляться перед использованием сообщений протокола для обучения модели(ей). В дополнение, или вместо этого, оператор может идентифицировать некоторые сообщения протокола как являющиеся вторжением, и такие сообщения протокола могут либо удаляться до обучения, либо модели соответствующим образом корректируются.During the training phase, the protocol messages that are used to train the model can be obtained from the data traffic on the network. Alternatively, or in addition, simulation data may be applied. During the training phase, protocol messages with a possible intrusion can be detected in statistical ways, i.e. infrequently used protocol messages, or protocol messages with rare content, may be removed before using the protocol messages to train the model(s). In addition, or instead, the operator may identify some protocol messages as intrusions, and such protocol messages may either be removed prior to training or the models adjusted accordingly.

Могут применяться альтернативы обучению (т.е. обучению) модели(ей), отличные от вышеописанной фазы обучения. Например, модель может быть получена посредством инспектирования протокола и приложения, создания набора, например, ожидаемых сообщений протокола, их полей и/или значений полей, из него, и построения модели, или набора моделей, из него. Также может применяться сочетание такого построения модели(ей) посредством инспектирования, с обучением модели(ей): например, сначала обучение модели(ей) на фазе обучения, и затем адаптация полученной в результате обучения модели(ей) на основании знаний об известном поведении и косвенном явлении и/или содержимом сообщений протокола, их полей и/или значений полей.Alternatives to training (ie, training) the model(s) other than the training phase described above may be applied. For example, a model may be obtained by inspecting the protocol and application, creating a set of, for example, expected protocol messages, their fields and/or field values, from it, and building a model, or set of models, from it. A combination of building the model(s) through inspection, with training the model(s) can also be used: for example, first training the model(s) in the training phase, and then adapting the resulting model(s) based on knowledge of known behavior and indirect phenomenon and/or content of protocol messages, their fields and/or field values.

В варианте осуществления, сигнал обнаружения вторжения дополнительно генерируется, когда синтаксический анализ не может установить поле как согласующееся с протоколом, так что действие может быть также выполнено в том случае, когда обнаруживается поле, которое не согласуется с протоколом (например, деформированное сообщение протокола).In an embodiment, an intrusion detection signal is further generated when the parse fails to set the field as protocol compliant, so that an action can also be taken when a field is detected that is not protocol compliant (eg, a malformed protocol message).

В варианте осуществления, сигнал обнаружения вторжения дополнительно генерируется, когда извлеченное поле не может быть ассоциировано ни с одной моделью из набора моделей, так что действие может быть также выполнено в случае, когда извлеченное поле возможно согласуется с протоколом, но для которого не предусмотрена подходящая модель. Часто, используется только подмножество возможных полей протокола, например, в приложениях управления, что позволяет, например, вызывать оповещение о тревоге тогда, когда извлечено поле протокола, которое согласуется с протоколом, но которое обычно не применяется.In an embodiment, an intrusion detection signal is further generated when the extracted field cannot be associated with any model in the set of models, so that an action can also be taken in the case where the extracted field possibly conforms to the protocol, but for which no suitable model is provided. . Often, only a subset of the possible protocol fields are used, for example in management applications, allowing, for example, an alarm to be triggered when a protocol field is retrieved that is consistent with the protocol but that is not normally used.

Способ может быть применен к разнообразным уровням протокола. Например, протокол может быть по меньшей мере одним из протоколов: протоколом прикладного уровня, протоколом уровня сеанса, протоколом транспортного уровня или даже протоколом более низких уровней из стека протоколов. Прикладной уровень сети передачи данных может быть определен моделью Взаимодействия Открытых Систем (модель OSI), которая была определена Международной Организацией по Стандартизации. На прикладном уровне, программное обеспечение, запущенное на компьютерах или серверах, может осуществлять связь друг с другом посредством отправки сообщений протокола. Сообщения протокола могут быть сообщениями протокола сетей SCADA или Управления Производственным Процессом, Windows, сообщениями протокола сетей автоматизации делопроизводства, сообщениями протокола HTTP, и т.д.The method can be applied to a variety of protocol layers. For example, the protocol may be at least one of an application layer protocol, a session layer protocol, a transport layer protocol, or even a lower layer protocol from the protocol stack. The data network application layer can be defined by the Open Systems Interconnection (OSI) model, which has been defined by the International Organization for Standardization. At the application level, software running on computers or servers can communicate with each other by sending protocol messages. The protocol messages can be SCADA or Workflow Control, Windows protocol messages, office automation network protocol messages, HTTP protocol messages, and so on.

Осуществление связи между программным обеспечением может придерживаться некоторого протокола связи, в котором определены структура и возможные значения (части) сообщений протокола. Структура сообщения протокола может быть дополнительно описана полями протокола в сообщениях протокола. Программное обеспечение может быть выполнено неспособным обрабатывать сообщенияCommunication between software may follow some communication protocol, which defines the structure and possible values (parts) of the protocol messages. The structure of a protocol message may be further described by protocol fields in protocol messages. The software may be rendered unable to process messages

- 3 042211 протокола, которые созданы не в соответствии с протоколом связи.- 3 042211 protocols that are not created in accordance with the communication protocol.

В варианте осуществления, в ответ на генерирование сигнала обнаружения вторжения, способ дополнительно содержит по меньшей мере один из этапов, на которых удаляют поле протокола или пакет данных, содержащий поле протокола; и вызывают и выводят сообщение оповещения о вторжении. Может быть применено любое другое действие обнаружения вторжения, такое как, например, изолирование поля протокола или пакета данных, содержащего поле протокола, и т.д.In an embodiment, in response to generating an intrusion detection signal, the method further comprises at least one of removing the protocol field or the data packet containing the protocol field; and calling and outputting an intrusion alert message. Any other intrusion detection action may be applied, such as, for example, isolating the protocol field or the data packet containing the protocol field, etc.

В варианте осуществления, модель для поля протокола содержит по меньшей мере одно из следующего:In an embodiment, the model for the protocol field contains at least one of the following:

набор приемлемых значений поля протокола и определение диапазона приемлемых значений поля протокола. В случае, когда поле протокола содержит численное значение, то в связи с этим может быть предусмотрена простая модель, которая может позволить обеспечить тестирование поля протокола с низкой нагрузкой по обработке данных.a set of acceptable protocol field values; and defining a range of acceptable protocol field values. In the case where the protocol field contains a numerical value, a simple model can therefore be provided that can allow testing of the protocol field with a low data processing load.

В варианте осуществления, модель для поля протокола содержит определение приемлемых букв, цифр, символов, и сценариев. В случае, когда поля протокола содержит знак или строку, то в связи с этим может быть предусмотрена простая модель, которая может позволить обеспечить тестирование поля протокола с низкой нагрузкой по обработке данных.In an embodiment, the model for the protocol field contains the definition of acceptable letters, numbers, symbols, and scripts. In the case where the protocol fields contain a character or a string, a simple model can therefore be provided that can allow testing of the protocol field with a low data processing load.

В варианте осуществления, модель для поля протокола содержит набор предварительно определенных сигнатур вторжения, таким образом, что могут учитываться знания об известных атаках. Сочетание модели, как описано выше, (содержащей, например, набор приемлемых значений поля протокола, определение диапазона приемлемых значений поля протокола, определение приемлемых букв, цифр, символов, и сценариев) с набором предварительно определенных сигнатур вторжения может быть высоко эффективным, поскольку для каждого конкретного поля может применяться модель со своим нормальным содержимым в сочетании с одной или более конкретными сигнатурами вторжения для данного поля.In an embodiment, the model for the protocol field contains a set of predefined intrusion signatures such that knowledge of known attacks can be taken into account. Combining a model as described above (comprising, for example, a set of acceptable protocol field values, defining a range of acceptable protocol field values, defining acceptable letters, numbers, symbols, and scripts) with a set of predefined intrusion signatures can be highly effective because for each a particular field, a model with its normal content may be applied in combination with one or more specific intrusion signatures for that field.

В варианте осуществления, протокол содержит примитивные поля протокола и составные поля протокола, причем составные поля протокола в свою очередь содержат по меньшей мере одно примитивное поле протокола, при этом соответствующая модель предусмотрена в наборе моделей для каждого примитивного поля протокола. Следовательно, может быть обеспечено эффективное обнаружение вторжения, поскольку поля протокола, которые являются составными (т.е. поля протокола, которые сами по себе содержат поля протокола, как, например, адрес, содержащий название улицы, номер, почтовый индекс и город), могут быть разбиты на их элементарные (примитивные) поля протокола, позволяя применять подходящую модель для каждого примитивного поля протокола.In an embodiment, the protocol comprises protocol primitive fields and protocol composite fields, the protocol composite fields in turn containing at least one protocol primitive field, with a corresponding model provided in the model set for each protocol primitive field. Therefore, effective intrusion detection can be provided because protocol fields that are composite (i.e., protocol fields that themselves contain protocol fields, such as an address containing a street name, number, postal code, and city) can be broken down into their elementary (primitive) protocol fields, allowing a suitable model to be applied for each primitive protocol field.

Поскольку модель по меньшей мере для одного поля протокола на фазе обучения и/или для значения по меньшей мере одного поля протокола на фазе обучения, может быть построена в соответствии с типом данных по меньшей мере одного поля протокола на фазе обучения, то модель может быть более точной при описании нормальных, допустимых или свободных от вторжения сообщений протокола, чем модель, которая не учитывает тип данных полей протокола.Since the model for at least one protocol field in the learning phase and/or for the value of at least one protocol field in the learning phase can be built according to the data type of at least one protocol field in the learning phase, the model can be more accurate in describing normal, valid, or intrusion-free protocol messages than a model that does not take into account the data type of the protocol fields.

Может быть так, что модель, оптимизированная для описания поля протокола с числовым типом данных, может быть менее точна (или не применима) для описания поля протокола со строковым или двоичным типом данных. Подобным образом, модель, оптимизированная для описания поля протокола со строковым типом данных, может быть менее точна при описании поля протокола с числовым или двоичным типом данных. Вследствие этого, точность модели может быть повышена посредством учета типа данных поля протокола при построении модели.It may be that a model that is optimized to describe a protocol field with a numeric data type may be less accurate (or not applicable) for describing a protocol field with a string or binary data type. Similarly, a model that is optimized to describe a protocol field with a string data type may be less accurate when describing a protocol field with a numeric or binary data type. As a result, the accuracy of the model can be improved by considering the data type of the protocol field when building the model.

В варианте осуществления, предоставляется множество типов модели, при этом тип модели для извлеченного поля протокола выбирается на фазе обучения из множества типов модели на основании характеристики извлеченного поля протокола, и на основании выбранного типа модели строится модель для извлеченного поля протокола.In an embodiment, a plurality of model types are provided, wherein a model type for the extracted protocol field is selected in the training phase from among the plurality of model types based on a characteristic of the extracted protocol field, and a model for the extracted protocol field is built based on the selected model type.

Для того чтобы получить модель для конкретного поля протокола, может быть выполнено несколько этапов. Как объяснено выше, может применяться множество разных типов модели. Сначала, должен быть выбран некоторый тип модели из набора доступных типов модели для конкретного поля протокола. Как только определен тип модели для некоторого поля протокола, для данного поля протокола может быть построена модель. Как описано в данном документе, модель может быть построена, например, с использованием анализа трафика данных на фазе обучения. Характеристика поля протокола может быть любой подходящей характеристикой данных в самом поле протокола, его смыслом в контексте протокола и т.д. Некоторые примеры будут описаны ниже. Посредством использования разных типов модели можно как применять методики моделирования, которые являются конкретными для типа разных значений поля, так и использовать безопасную область значений таким образом, при котором она является ограничивающей в большей или меньшей степени в соответствии со смыслом, ролью и важностью поля протокола в протоколе или контексте, в котором применяется протокол. В целом, разные типы модели могут применять разные типы критерия для того чтобы установить, может или нет конкретное значение поля протокола являться вторжением. Например, разные типы моделей могут применять один из критериев: диапазон значений, числовое распределение значений, набор значений, набор операторов, наборIn order to obtain a model for a particular protocol field, several steps can be performed. As explained above, many different types of model can be applied. First, some model type must be selected from the set of available model types for a particular protocol field. Once a model type is defined for a protocol field, a model can be built for that protocol field. As described in this document, the model can be built, for example, using data traffic analysis in the training phase. The protocol field characteristic can be any suitable data characteristic in the protocol field itself, its meaning in the context of the protocol, and so on. Some examples will be described below. Through the use of different model types, it is possible to both apply modeling techniques that are specific to the type of different field values, and use the safe value domain in a way that is more or less restrictive according to the meaning, role, and importance of the protocol field in the protocol or the context in which the protocol is applied. In general, different types of model may apply different types of criteria in order to establish whether or not a particular protocol field value may or may not be an intrusion. For example, different types of models can apply one of the criteria: range of values, numerical distribution of values, set of values, set of operators, set

- 4 042211 текстовых значений, набор описаний состояния, набор или диапазон текстовых знаков, набор/диапазон текстовых кодировок и т.д. Следовательно, под понятием тип модели может пониматься набор операций, который разрешен над конкретным типом значения, совместно с эвристическим правилом для определения безопасной области для значений некоторого типа и критерием для определения того, находится ли некоторое значение в пределах безопасной области.- 4 042211 text values, state description set, text character set or range, text encoding set/range, etc. Therefore, a model type can be understood as a set of operations that are allowed on a particular value type, together with a heuristic rule for determining the safe area for values of a certain type, and a criterion for determining whether a certain value is within the safe area.

Выбор типа модели может выполняться в любой момент времени: во время фазы обучения, как впрочем, и во время отслеживания и обнаружения вторжения. Во время фазы обучения, тип модели может быть выбран как часть процесса построения модели для конкретного поля протокола. Во время обнаружения, если выявляется, что модель для конкретного поля протокола не обеспечивает не противоречащего результата, может быть выбран другой тип модели.Model type selection can be performed at any time: during the training phase, as well as during tracking and intrusion detection. During the training phase, a model type can be selected as part of the model building process for a particular protocol field. During discovery, if it is found that the model for a particular protocol field does not provide a consistent result, a different model type may be selected.

Выбор типа модели может быть выполнен, с использованием типа данных значения(ий) поля протокола, и/или семантики синтаксически проанализированного поля(ей) протокола. В варианте осуществления, характеристика поля протокола содержит тип данных поля протокола, причем способ содержит этапы, на которых определяют тип данных извлеченного поля протокола и выбирают тип модели с использованием определенного типа данных.Model type selection may be performed using the data type of the value(s) of the protocol field, and/or the semantics of the parsed protocol field(s). In an embodiment, the protocol field characteristic comprises the data type of the protocol field, the method comprising determining the data type of the extracted protocol field and selecting a model type using the determined data type.

Тип данных значений поля протокола (такой как число, строка, массив, набор и т.д.) может, например, быть извлечен из спецификаций протокола. В качестве альтернативы, тип данных значений поля протокола может, например, предполагаться из наблюдения за сетевым трафиком. В одном варианте осуществления, значения поля предполагается посредством регулярных выражений. Например, регулярное выражение ^Л[0-9]+$ может быть использовано для идентификации числовых целых значений поля. Посредством выбора соответствующего типа модели для сопоставления типа данных значений поля протокола, может быть получена модель, которая может привести к более достоверным результатам обнаружения.The data type of protocol field values (such as number, string, array, set, etc.) can, for example, be derived from protocol specifications. Alternatively, the data type of the protocol field values may, for example, be inferred from observing network traffic. In one embodiment, the field values are assumed through regular expressions. For example, the regular expression ^L [0-9]+$ can be used to identify numeric integer field values. By selecting an appropriate model type to map the data type of protocol field values, a model can be obtained that can lead to more reliable detection results.

Выбор типа модели может дополнительно быть или вместо того чтобы быть основанным на типе данных значения поля протокола, быть основан на семантике синтаксически проанализированного поля протокола. Следовательно, в варианте осуществления, характеристика поля протокола содержит семантику поля протокола, при этом способ содержит этапы, на которых определяют семантику извлеченного поля протокола и выбирают тип модели с использованием определенной семантики.The model type selection may additionally be, or instead of being based on the data type of the protocol field value, based on the semantics of the parsed protocol field. Therefore, in an embodiment, the protocol field characteristic contains the semantics of the protocol field, wherein the method comprises determining the semantics of the extracted protocol field and selecting a model type using the determined semantics.

Семантика может быть назначена синтаксически проанализированному полю протокола. Назначение семантики может выполняться разнообразными способами: вручную во время фазы обучения, предполагая по наблюдаемым сетевым данным, посредством извлечения информации из спецификации протокола, и т.д. Семантика может применяться для выбора наиболее соответствующего типа модели, например, в случае, когда несколько типов модели доступно для некоторого типа значения поля протокола. Например, для значения поля протокола числового типа, можно воспользоваться типом модели, которая содержит диапазон таких значений поля протокола, типом модели, которая содержит набор значений поля протокола, и т.д. Учет семантики, при этом предпочтительно учет как типа значения поля протокола, так и семантики, может позволить осуществлять назначение соответствующего типа модели, который больше всего подходит для данного конкретного поля протокола.Semantics can be assigned to a parsed protocol field. Assigning semantics can be done in a variety of ways: manually during the training phase, guessing from observed network data, by extracting information from a protocol specification, and so on. Semantics can be applied to select the most appropriate model type, for example, in the case where multiple model types are available for some protocol field value type. For example, for a protocol field value of a numeric type, you could use a model type that contains a range of such protocol field values, a model type that contains a set of protocol field values, and so on. Considering semantics, preferably considering both the value type of the protocol field and the semantics, may allow the assignment of the appropriate model type that is most appropriate for that particular protocol field.

Примером использования семантики может служить случай определения того, каким образом задать строгий числовой диапазон на основании важности поля. Другими словами, если семантика поля протокола предполагает, что данное поле является важным с точки зрения безопасности, то может применяться более строгий числовой диапазон, чем в противоположном случае, при котором будет применяться более свободный диапазон (например, удвоенное максимальное значение и половина минимального значения, наблюдаемого во время фазы обучения).An example of using semantics would be to determine how to set a strict numeric range based on the importance of a field. In other words, if the semantics of a protocol field suggests that the field is important from a security point of view, then a stricter numerical range can be applied than otherwise, which would apply a looser range (e.g., twice the maximum value and half the minimum value, observed during the learning phase).

Посредством назначения полю протокола типа модели в соответствии с типом значения поля протокола и/или семантики протокола, тип модели может быть назначен таким образом, что он учитывает содержимое данных в поле протокола, и, следовательно, позволяет подогнать модель в соответствии с содержимым поля протокола. Например, если типом поля является числовой целый тип, а семантика говорит о том, что данное поле содержит длину другого поля, то может быть выбрана модель типа числового распределения. С другой стороны, если типом поля является числовой целый тип, а семантика говорит о том, что поле является полем типа сообщения, тогда может быть выбрана модель типа числового набора. В качестве третьего примера, если типом поля является числовой целый тип, а семантика говорит о том, что поле является частотой вращения двигателя, тогда может быть применена модель типа строгого числового диапазона.By assigning a model type to a protocol field according to the value type of the protocol field and/or the semantics of the protocol, the model type can be assigned in such a way that it takes into account the content of the data in the protocol field, and therefore allows the model to be tailored according to the content of the protocol field. For example, if the type of a field is a numeric integer type, and the semantics say that the given field contains the length of another field, then a numeric distribution type model can be chosen. On the other hand, if the field type is a numeric integer type and the semantics says that the field is a message type field, then the numeric set type model can be chosen. As a third example, if the field type is a numeric integer type and the semantics say that the field is engine speed, then a strict numeric range type model can be applied.

В варианте осуществления, набор моделей содержит модель для поля протокола оператора и модель для поля протокола аргумента, причем ассоциирование и оценка выполняются для поля протокола оператора и поля протокола аргумента. Протокол может содержать поля протокола, содержащие операторы (такие как инструкции, вызовы и т.д.), и поля протокола, содержащие операнды (т.е. аргументы), к которым применяются операторы. Следует отметить, что в соответствии с вариантом осуществления изобретения, соответствующая модель может быть ассоциирована с полями протокола, содержащимиIn an embodiment, the set of models comprises a model for the operator protocol field and a model for the argument protocol field, with association and evaluation performed for the operator protocol field and the argument protocol field. A protocol may contain protocol fields containing operators (such as instructions, calls, etc.) and protocol fields containing operands (ie arguments) to which the operators apply. It should be noted that, in accordance with an embodiment of the invention, the corresponding model may be associated with protocol fields containing

- 5 042211 операторы, как впрочем, и с полями протокола, содержащими аргументы. Таким образом, с одной стороны, могут быть распознаны не только являющиеся вторжением значения аргументов, но также возможно являющиеся вторжением операторы. Также, учет оператора позволяет назначать наиболее соответствующий тип модели, тем самым позволяя повысить точность обнаружения вторжения, поскольку, как правило, за оператором будет следовать один или более аргументы, содержащие некоторый предварительно определенный тип данных.- 5 042211 operators, as well as with protocol fields containing arguments. In this way, on the one hand, not only intruding argument values can be recognized, but also possibly intruding operators. Also, operator consideration allows the most appropriate model type to be assigned, thereby allowing intrusion detection to be more accurate, since the operator will typically be followed by one or more arguments containing some predefined data type.

Кроме того, под сообщением протокола может подразумеваться спецификация операции, которая должна быть выполнена на стороне принимающего сетевого узла(ов), как того требует отправляющий сетевой узел. Соответственно, сообщение протокола может содержать поля оператора (т.е. спецификацию того, какая требуется операция), поля аргумента (т.е. спецификацию того, каким образом должна выполняться операция) и поля маршалинга (т.е. поля, которые непосредственно не относятся к требуемой операции, однако содержат параметр, который требуется сетевым узлам для корректного приема и интерпретации сообщения или, в общем, для обработки сетевой связи). Под маршалингом может пониматься процесс преобразования представления в памяти объектов в формат данных, подходящий для хранения или передачи, и он, как правило, используется, когда данные должны перемещаться между разными частями компьютерной программы или от одной программы другой.In addition, a protocol message may refer to the specification of an operation to be performed on the side of the receiving network node(s) as requested by the sending network node. Accordingly, a protocol message may contain operator fields (i.e., a specification of what operation is required), argument fields (i.e., a specification of how the operation is to be performed), and marshaling fields (i.e., fields that are not directly refer to the required operation, but contain a parameter that is required by the network nodes to correctly receive and interpret the message or, in general, to process the network communication). Marshaling can be understood as the process of converting the memory representation of objects into a data format suitable for storage or transmission, and is typically used when data must be moved between different parts of a computer program or from one program to another.

Например, запрос HTTP содержит поле способа (например, GET, POST, PUT и т.д.), указывающее оператора; поле URL, которое содержит аргументы для способа (например, /index.php?id=3) и некоторое количество полей заголовка (например, Content-length: 100), которые содержат информацию, которая не относится к самой операции, однако используется сетевыми узлами для осуществления связи (например, заголовок Content-length: 100 указывает на то, что тело сообщения запроса составляет в длину 100 байт).For example, an HTTP request contains a method field (eg GET, POST, PUT, etc.) that specifies the operator; a URL field that contains arguments for the method (for example, /index.php?id=3) and a number of header fields (for example, Content-length: 100) that contain information that is not related to the operation itself, but is used by network nodes to communicate (for example, a Content-length: 100 header indicates that the body of the request message is 100 bytes long).

В качестве другого примера, сообщение запроса Modbus/TCP содержит поле кода функции, идентифицирующее то, какая операция должна быть выполнена на принимающем PLC/RTU устройстве, доступное количество регистров данных, указывающее аргументы требуемой операции, и некоторое количество других полей, которые непосредственно не относятся к операции (например, поле счета регистра, поля длины данных и т.д.), которые требуются принимающему сетевому узлу для понимания того, каким образом синтаксически анализировать сообщение (например, какое количество регистров было отправлено).As another example, a Modbus/TCP request message contains a function code field identifying what operation is to be performed on the receiving PLC/RTU device, the available number of data registers indicating the arguments of the requested operation, and a number of other fields that are not directly related to to the operation (eg, register count field, data length fields, etc.) that the receiving network node needs to understand how to parse the message (eg, how many registers were sent).

Атаки или попытки вторжения могут выполняться посредством внедрения вредоносных данных в каждое из этих разных полей. Аналогичным образом, такие атаки или попытки вторжения могут быть обнаружены благодаря тому, что значения разных полей отличаются от нормальных. Инспектирование полей оператора и маршалинга может повысить точность при обнаружении атак или попыток вторжения. Соответственно, в варианте осуществления, набор моделей дополнительно содержит модель для поля протокола маршалинга, причем ассоциирование и оценка, кроме того, выполняются для поля протокола маршалинга.Attacks or intrusion attempts can be carried out by injecting malicious data into each of these different fields. Similarly, such attacks or intrusion attempts can be detected due to the fact that the values of different fields are different from normal. Inspection of operator and marshaling fields can improve accuracy when detecting attacks or intrusion attempts. Accordingly, in an embodiment, the set of models further comprises a model for the marshaling protocol field, wherein association and evaluation are further performed for the marshaling protocol field.

Например, атака, направленная на переполнение буфера, может быть выполнена посредством внедрения в строковое поле большего количества знаков, чем то, на которое выделяется буфер принимающего сетевого узла. Такая атака может быть обнаружена благодаря тому, что строковое поле содержит необычные значения знака. С другой стороны, может быть выполнена успешная атака, которая использует только совершенно действительные текстовые знаки в качестве вредоносной полезной нагрузки. Та же атака тогда может быть обнаружена благодаря другому полю, указывающему на то, что длина строки больше нормальной: это обязательно должно быть истинным, поскольку максимальное разрешенное значение для допустимой длины строки будет размером буфера, который выделяется принимающим сетевым узлом.For example, a buffer overflow attack can be performed by embedding more characters in a string field than the one for which the receiving host's buffer is allocated. Such an attack can be detected because the string field contains unusual sign values. On the other hand, a successful attack can be performed that uses only perfectly valid text characters as a malicious payload. The same attack can then be detected thanks to another field indicating that the length of the string is greater than normal: this must necessarily be true, since the maximum allowed value for the allowed length of the string will be the size of the buffer that the receiving network node allocates.

Дополнительно, разные, конкретные типа модели могут быть использованы для полей оператора, полей аргумента и полей маршалинга для того чтобы дополнительно повысить точность обнаружения или сократить количество нерелевантных генерируемых оповещений о тревоге. Для разных полей оператора, могут использоваться разные модели (одинаковых или разных типов модели). Для разных полей аргумента могут использоваться разные модели (одинаковых или разных типов модели). Для разных полей маршалинга могут использоваться разные модели (одинаковых или разных типов модели). Типы модели могут быть выбраны на основании, например, типа данных и семантики, как описано выше.Additionally, different, specific model types can be used for operator fields, argument fields, and marshaling fields in order to further improve detection accuracy or reduce the number of irrelevant alarms generated. For different operator fields, different models (same or different model types) can be used. Different fields of the argument can use different models (same or different model types). Different marshaling fields can use different models (same or different model types). Model types may be selected based on, for example, data type and semantics, as described above.

Следует отметить, что система и способ обнаружения вторжения в соответствии с изобретением могут применяться к любому типу трафика данных, такому как текстовый трафик данных (т.е. текстовый протокол) или двоичный трафик данных (т.е. двоичный протокол). В целом, спецификация текстовых протоколов не несет в себе описание типа большей части его значений полей. Например, спецификация протокола HTTP не ассоциирует тип со значениями заголовка или значениями параметра, которые должны синтаксически анализироваться в качестве текстовых строк. В таких случаях, может потребоваться строить предположение о типе поля посредством инспектирования трафика. С другой стороны, данное поведение не присутствует в двоичных протоколах, в которых спецификациям требуется включать тип всех полей протокола для того чтобы обеспечить правильный синтаксический анализ. По этой причине, применение настоящей методики к двоичному протоколу может давать даже более точный результат, чем применение ее к текстовому протоколу, поскольку для двоичных протоколов отсутствует неопредеIt should be noted that the intrusion detection system and method according to the invention can be applied to any type of data traffic such as text data traffic (ie text protocol) or binary data traffic (ie binary protocol). In general, the text protocol specification does not carry a description of the type of most of its field values. For example, the HTTP protocol specification does not associate a type with header values or parameter values that should be parsed as text strings. In such cases, it may be necessary to infer the type of the field through traffic inspection. On the other hand, this behavior is not present in binary protocols, where specifications are required to include the type of all protocol fields in order to ensure correct parsing. For this reason, applying the present technique to a binary protocol may give an even more accurate result than applying it to a text protocol, since there is no undefined value for binary protocols.

- 6 042211 ленность предположения типов значения поля. В частности, когда учитывается тип данных и семантика синтаксически проанализированного поля протокола, то потоку двоичных данных может придаваться смысл, в том смысле, что синтаксический анализ и выбор подходящего типа модели для каждого поля протокола, основанный на типе данных и/или семантике, позволяет учитывать содержимое двоичных данных. В двоичном протоколе, под понятием тип данных поля протокола должно пониматься то, какие данные представлены (двоичными) данными в поле протокола: двоичные данные, например, представляющие собой другой тип данных, такой как число, строка и т.д.- 6 042211 laziness of the assumption of field value types. In particular, when the data type and semantics of the parsed protocol field are taken into account, then a binary data stream can be given meaning, in the sense that parsing and selecting an appropriate model type for each protocol field based on the data type and/or semantics allows one to take into account binary data content. In a binary protocol, the data type of the protocol field should be understood as what data is represented by (binary) data in the protocol field: binary data, for example, representing another data type such as a number, a string, and so on.

В целом, сообщение протокола может содержать примитивные поля протокола и составные поля протокола. Составное поле протокола содержит два или более подполя протокола, каждое из которых может быть примитивным полем протокола или составным полем протокола. Модель для составных полей протокола может содержать счетчик экземпляров поля протокола, наблюдаемых на фазе обучения. В случае, когда поле наблюдалось меньше заданного количества раз (порогового значения), наблюдение составного поля протокола во время фазы обнаружения может вызывать генерирование сигнала обнаружения вторжения. В соответствии с семантикой составного поля протокола, его важность в отношении безопасности может меняться. Вследствие этого, семантика может использоваться для указания другого типа модели или другой чувствительности модели в соответствии с, например, важностью поля в отношении безопасности. Например, в случае составного поля, которое не имеет отношения к безопасности, пороговое значение наблюдаемых экземпляров может быть изменено, чтобы ограничить объем нерелевантных генерируемых сигналов обнаружения вторжения, и, следовательно, повышена простота использования. Кроме того, семантика составного поля может распространяться на его подполя, чтобы обеспечить более точный выбор типов модели и настроек модели. Например, базовое поле числового типа, которое содержится в составном поле, которое очень актуально для безопасности, может быть ассоциировано с типом числового набора, который может определять более строгую безопасную область значений, чем модель типа численного диапазона, и, следовательно, повышать точность обнаружения вторжения.In general, a protocol message may contain primitive protocol fields and compound protocol fields. A composite protocol field contains two or more protocol subfields, each of which can be a primitive protocol field or a composite protocol field. The model for composite protocol fields may contain a count of protocol field instances observed during the training phase. In the case where the field has been observed less than a predetermined number of times (threshold), observation of the composite protocol field during the detection phase may cause an intrusion detection signal to be generated. According to the semantics of a protocol composite field, its security importance may vary. As a consequence, semantics can be used to indicate a different model type or a different model sensitivity, according to, for example, the importance of a field with respect to security. For example, in the case of a composite field that is not security related, the observed instances threshold can be changed to limit the amount of irrelevant intrusion detection signals generated, and therefore ease of use is improved. In addition, the semantics of a composite field can be propagated to its subfields to allow more precise selection of model types and model settings. For example, a base field of a numeric type that is contained in a composite field that is highly relevant to security can be associated with a numeric set type that can define a stricter safe range than the numeric range type model and therefore improve intrusion detection accuracy. .

В соответствии с другим аспектом изобретения, предоставляется система обнаружения вторжения для обнаружения вторжения в трафик данных в сети передачи данных, причем система содержит блок синтаксического анализа для синтаксического анализа трафика данных для извлечения по меньшей мере одного поля протокола сообщения протокола трафика данных;According to another aspect of the invention, an intrusion detection system is provided for detecting an intrusion into data traffic in a data network, the system comprising a parser for parsing the data traffic to extract at least one protocol field of a data traffic protocol message;

машину для ассоциирования извлеченного поля протокола с соответствующей моделью для данного поля протокола, причем модель выбирается из набора моделей;a machine for associating the extracted protocol field with a corresponding model for that protocol field, the model being selected from a set of models;

блок обработки модели для анализа того, находится ли содержимое извлеченного поля протокола в безопасной области, как определяется моделью; и исполнительный блок для генерирования сигнала обнаружения вторжения в случае, когда установлено что содержимое извлеченного поля протокола, находится за пределами безопасной области.a model processing unit for analyzing whether the content of the extracted protocol field is in a safe area as determined by the model; and an execution unit for generating an intrusion detection signal when the content of the retrieved protocol field is determined to be outside the safe area.

С помощью системы в соответствии с вариантом осуществления, могут быть получены точно такие же или подобные эффекты, как с помощью способа в соответствии с изобретением. Также, могут быть предоставлены точно такие же или подобные варианты осуществления, как те, что описаны со ссылкой на способ в соответствии с изобретением, при достижении точно таких же или подобных эффектов. Блок синтаксического анализа, машина, блок обработки модели и исполнительный блок могут быть реализованы посредством подходящих инструкций программного обеспечения, которые должны исполняться устройством обработки данных. Они могут быть реализованы в той же самой программе программного обеспечения, которая должна исполняться тем же самым устройством обработки данных, или может исполняться двумя или более отличными устройствами обработки данных. Например, блок синтаксического анализа может исполняться локально в местоположении, где проходит трафик данных, в то время как машина, блок обработки модели и исполнительный блок могут быть расположены удаленно, например, в безопасном местоположении. Также, могут отслеживаться данные с различных сайтов, и таким образом, блок синтаксического анализа может быть предусмотрен на каждом сайте, причем выходные данные от каждого блока синтаксического анализа отправляются одной машине, блоку обработки модели и исполнительному блоку.With the system according to the embodiment, exactly the same or similar effects can be obtained as with the method according to the invention. Also, exactly the same or similar embodiments as those described with reference to the method according to the invention can be provided while achieving exactly the same or similar effects. The parser, machine, model processing unit, and execution unit may be implemented by appropriate software instructions to be executed by the data processing device. They may be implemented in the same software program to be executed by the same data processing device, or may be executed by two or more different data processing devices. For example, the parser may execute locally at a location where the data traffic flows, while the machine, model processing unit, and execution unit may be located remotely, such as in a secure location. Also, data from different sites can be tracked, and thus a parser can be provided at each site, with the output from each parser being sent to the same machine, model processing unit, and execution unit.

Следует отметить, что описанный выше способ и система могут применяться не только для обнаружения вторжения. Вместо этого, или в дополнение к данной цели, описанный способ и система могут применяться в целях отслеживания. Например, может отслеживаться трафик данных в сети данных объекта, такого как предприятие, центр обработки данных и т.д. Для каждого или для некоторых полей протокола может быть определена модель, которая представляет собой безопасное и требуемое рабочее состояние. Альтернативно, вместо определения безопасного или требуемого рабочего состояния заранее, система и/или способ, как описано в данном документе, может применяться на фазе обучения, таким образом, модели, полученные на фазе обучения, позволяют получить описание работы, в соответствии с тем, как она отслеживается. Пересылаемые данные могут содержать информацию, из которой может быть получено рабочее состояние, при этом такие данные применяются для обучения моделей для соответствующих полей протокола. Например, в сети данных предприятия, может пересылаться информация управления, которая относится к частоте вращения двигателей, температуре реакторов, гидравлическому давлению, как впрочем, и сообщения об ошибках, вызовы процедуры, и т.д. Такие данные могут бытьIt should be noted that the method and system described above can be applied to more than just intrusion detection. Instead, or in addition to this purpose, the described method and system may be used for tracking purposes. For example, data traffic on the data network of an entity such as an enterprise, a data center, and so on, may be monitored. For each or some fields of the protocol, a model can be defined that represents a safe and required operating state. Alternatively, instead of determining a safe or required operating state in advance, the system and/or method as described herein can be applied in the learning phase, so that the models generated in the learning phase allow a job description to be obtained, in accordance with how she is tracked. The transmitted data may contain information from which the operating state can be obtained, and such data is used to train models for the corresponding fields of the protocol. For example, in an enterprise data network, control information related to engine speed, reactor temperature, hydraulic pressure, as well as error messages, procedure calls, and so on, can be sent. Such data may be

- 7 042211 использованы, либо для сравнения с предварительно определенными моделями, которые определяют требуемое или безопасное рабочее состояние, либо для обучения моделей, и отсюда получения статуса из моделей по мере обучения. Отслеживание может содержать проверку состояния работоспособности промышленного предприятия или компьютерной сети посредством наблюдения за значениями некоторых полей протокола (или сочетанием полей протокола), которые являются важными для администраторов системы/сети, и может определять интересные события компьютерной сети или процесса промышленного производства, и т.д. Следовательно, там, где в данном документе применяется понятие обнаружения вторжения, оно также может пониматься как относящееся к отслеживанию.- 7 042211 are used either for comparison with pre-defined models that define a required or safe operating state, or for training the models, and hence deriving status from the models as they are trained. Monitoring may include checking the health status of an industrial plant or computer network by observing the values of certain log fields (or a combination of log fields) that are important to system/network administrators, and may identify interesting events in the computer network or industrial process, etc. . Therefore, where the concept of intrusion detection is used in this document, it can also be understood to refer to tracking.

Краткое описание чертежейBrief description of the drawings

Дополнительные эффекты и признаки изобретения будут описаны, лишь в качестве примера, со ссылкой на представленное ниже описание и сопроводительные схематичные чертежи, в которых раскрываются не накладывающие ограничений варианты осуществления, при этом:Additional effects and features of the invention will be described, by way of example only, with reference to the following description and the accompanying schematic drawings, which disclose non-limiting embodiments, while:

фиг. 1 схематично изображает пример сети передачи данных, содержащей систему обнаружения вторжения в соответствии с вариантом осуществления изобретения;fig. 1 schematically depicts an example of a data communication network comprising an intrusion detection system in accordance with an embodiment of the invention;

фиг. 2 схематично изображает общий вид системы обнаружения вторжения в соответствии с вариантом осуществления изобретения;fig. 2 is a schematic perspective view of an intrusion detection system according to an embodiment of the invention;

фиг. 3 схематично изображает общий вид фазы обучения способа в соответствии с вариантом осуществления изобретения;fig. 3 schematically shows a general view of the learning phase of a method according to an embodiment of the invention;

фиг. 4 схематично изображает общий вид фазы обнаружения вторжения способа в соответствии с вариантом осуществления изобретения;fig. 4 is a schematic overview of the intrusion detection phase of a method according to an embodiment of the invention;

фиг. 5 схематично изображает структурную схему, для того чтобы проиллюстрировать систему и способ обнаружения вторжения в соответствии с вариантом осуществления изобретения.fig. 5 is a schematic block diagram to illustrate an intrusion detection system and method according to an embodiment of the invention.

Подробное описание изобретенияDetailed description of the invention

На фиг. 1 изображен схематичный общий вид примера сети передачи данных с системой обнаружения вторжения для классифицирования сообщения протокола в соответствии с вариантом осуществления изобретения. В данной сети персональные компьютеры 14 и 15 (или рабочие станции) соединены с сервером 13. Сеть может быть соединена с интернет 16 через межсетевой экран 17.In FIG. 1 is a schematic overview of an example data network with an intrusion detection system for classifying a protocol message in accordance with an embodiment of the invention. In this network, personal computers 14 and 15 (or workstations) are connected to a server 13. The network can be connected to the Internet 16 through a firewall 17.

В сети передачи данных вторжение или атака может исходить от Интернет 16 или от персонального компьютера 14, когда он инфицирован вредоносным программным обеспечением.On a data network, an intrusion or attack may come from the Internet 16 or from a personal computer 14 when it is infected with malware.

Сеть передачи данных может быть сетью SCADA или иной сетью Управления Производственным Процессом. В такой сети, управление машинным оборудованием 12 может осуществляться посредством программного обеспечения, запущенного на удаленном терминальном блоке 11 (RTU), или на программируемом логическом контроллере (PLC). Программное обеспечение, запущенное на сервере 13 может отправлять сообщения протокола программному обеспечению, запущенному на RTU 11. Программное обеспечение на RTU 11 может отправлять сообщения протокола машинному оборудованию, на котором также может быть запущено программное обеспечение.The data network may be a SCADA network or other Process Control network. In such a network, the machinery 12 may be controlled by software running on a remote terminal unit 11 (RTU) or on a programmable logic controller (PLC). Software running on server 13 may send protocol messages to software running on RTU 11. Software on RTU 11 may send protocol messages to machinery that may also be running software.

Пользователь может осуществлять связь с сервером 13 через программное обеспечение, запущенное на персональном компьютере 14 или рабочей станции 15 посредством осуществления обмена сообщениями протокола между программным обеспечением, запущенным на персональном компьютере 14 или рабочей станции 15, и программным обеспечением, запущенным на сервере 13.The user can communicate with the server 13 through the software running on the personal computer 14 or workstation 15 by exchanging protocol messages between the software running on the personal computer 14 or workstation 15 and the software running on the server 13.

Система 10 обнаружения вторжения может быть размещена между RTU 11 и оставшейся частью сети, как показано на фиг. 1, или между RTU 11 и машинным оборудованием 12 (не показано). Система 10 обнаружения вторжения может извлекать сообщения протокола из сети передачи данных, обмен которыми может осуществляться между программным обеспечением, запущенным на персональном компьютере 14 или рабочей станции 15, и программным обеспечением, запущенным на сервере 13, между программным обеспечением, запущенным на сервере 13 и программным обеспечением, запущенным на RTU 11, или между программным обеспечением, запущенным на RTU 11 и программным обеспечением, запущенным на устройстве обработки данных машинного оборудования 12.The intrusion detection system 10 may be placed between the RTU 11 and the rest of the network, as shown in FIG. 1, or between RTU 11 and machinery 12 (not shown). The intrusion detection system 10 can retrieve protocol messages from the data network that can be exchanged between software running on the personal computer 14 or workstation 15 and software running on the server 13, between software running on the server 13 and the software software running on RTU 11 or between software running on RTU 11 and software running on machine data processing unit 12.

Протокол связи может быть определен в качестве формального описания цифровых форматов сообщения протокола и правил для обмена этими сообщениями в или между (программным обеспечением, запущенным на) вычислительными системами. Протокол связи может включать в себя описания для синтаксиса, семантики, и синхронизации связи. Сообщения протокола на прикладном уровне в сети передачи данных могут содержать одно или более поля, которые могут характеризоваться их типами данных. Например, поле может представлять собой всю длину сообщения, с числовым значением или строковым значением.A communication protocol can be defined as a formal description of the protocol's digital message formats and the rules for exchanging those messages in or between (software running on) computer systems. The communication protocol may include descriptions for syntax, semantics, and communication timing. Protocol messages at the application layer in a data network may contain one or more fields, which may be characterized by their data types. For example, a field can be the entire length of the message, with a numeric value or a string value.

Чем больше информации о сообщениях протокола, тем модель, описывающая нормальное, допустимое или свободное от вторжения сообщение протокола, может включать в себя больше информации о нормальных или допустимых значениях каждого поля протокола каждого сообщения протокола, обмен которым осуществляется в сети передачи данных. Затем модель может быть использована (например, в режиме реального времени) для классифицирования сообщений протокола из живого трафика данных в сети передачи данных для обнаружения аномалий, т.е. чего-нибудь, что отклоняется от нормального поведения сети передачи данных, как оно описывается моделью.The more information about the protocol messages, the more information about the normal or allowed values of each protocol field of each protocol message exchanged in the data network can be included in the model describing the normal, valid or intrusion-free protocol message. The model can then be used (e.g., in real time) to classify protocol messages from live data traffic in the data network for anomaly detection, i.e. anything that deviates from the normal behavior of the data network as described by the model.

- 8 042211- 8 042211

Фиг. 2 показывает схематичный общий вид варианта осуществления системы 10 обнаружения вторжения в соответствии с вариантом осуществления изобретения. Система 10 обнаружения вторжения содержит блок 21 синтаксического анализа сетевого протокола, выполненного с возможностью извлечения по меньшей мере одного поля протокола из сообщения протокола (например) прикладного уровня сети передачи данных. На фазе обучения, сообщения протокола могут быть получены из сети через ввод 25. Блок 21 синтаксического анализа сетевого протокола может быть использован во время опциональной фазы обучения, как впрочем, и во время обычной работы системы обнаружения вторжения. Информация об извлеченном сообщении протокола может быть переслана машине 23.Fig. 2 shows a schematic perspective view of an embodiment of an intrusion detection system 10 in accordance with an embodiment of the invention. The intrusion detection system 10 includes a network protocol parser 21 configured to extract at least one protocol field from a protocol message (for example) of the data network application layer. During the learning phase, protocol messages can be received from the network via input 25. The network protocol parser 21 can be used during the optional learning phase, as well as during normal operation of the intrusion detection system. Information about the retrieved protocol message may be forwarded to machine 23.

Система обнаружения вторжения дополнительно содержит машину 23, набор 26 моделей и блок 25 обработки модели. Машина 23 выполнена с возможностью ассоциирования извлеченного поля протокола с моделью некоторого типа модели, выбранной на основании типа данных и/или семантики поля протокола. Для этой цели, машина содержит или имеет доступ к набору 26 моделей. Машина ассоциирует извлеченное поле протокола с моделью, которая является конкретной для данного поля протокола, например конкретной для типа поля данных и/или семантики. С этой целью, набор 26 моделей содержит разные модели, причем каждая модель для конкретного одного (или более) из полей протокола. На фазе обучения, машина может, в случае, когда для извлеченного поля протокола еще отсутствует доступная модель, создавать модель для извлеченного поля протокола и добавлять ее в набор моделей. Информация об извлеченном поле протокола может быть переслана блоку 2 4 обработки.The intrusion detection system further comprises a machine 23, a set of models 26 and a model processing unit 25. Engine 23 is configured to associate the extracted protocol field with a model of some type of model selected based on the data type and/or semantics of the protocol field. For this purpose, the machine contains or has access to a set of 26 patterns. The engine associates the extracted protocol field with a model that is specific to that protocol field, such as specific to the data field type and/or semantics. To this end, a set of 26 models contains different models, with each model for a particular one (or more) of the protocol fields. In the training phase, the machine may, in the case where there is not yet an available model for the extracted protocol field, create a model for the extracted protocol field and add it to the model set. The information about the retrieved protocol field can be sent to the processing unit 2 4 .

Блок 24 обработки затем делает оценку того, согласуется или нет извлеченное поле протокола с моделью, с тем, чтобы оценить, может или нет считаться вторжением содержимое извлеченного поля протокола. На фазе обучения, модель может быть обновлена с использованием содержимого извлеченного поля протокола. Блок обработки может выводить сообщения через выход 27.The processing unit 24 then evaluates whether or not the extracted protocol field matches the model, so as to evaluate whether or not the content of the extracted protocol field can be considered an intrusion. During the training phase, the model can be updated using the contents of the retrieved protocol field. The processing unit can output messages via output 27.

Система обнаружения вторжения может дополнительно содержать исполнительный блок 22 для генерирования сигнала обнаружения вторжения в случае, когда (значение) поле протокола было идентифицировано в качестве вторжения, т.е. как находящееся за пределами безопасной области, определяемой ассоциированной моделью. В ответ на генерирование сигнала обнаружения вторжения, может быть выполнено действие обнаружения вторжения, например, содержащее вызов оповещения о тревоге, фильтрацию пакета данных или поля протокола (тем самым, например, удаляя пакет данных или поле протокола). Сигнал обнаружения вторжения также может быть сгенерирован в случае, когда блок синтаксического анализа не может идентифицировать поле протокола (что подразумевает, что пакет данных не согласуется с протоколом), и/или в случае, когда блок обработки модели во время операции обнаружения вторжения не может ассоциировать извлеченное поле протокола с моделью из набора (что предполагает, что пакет данных не содержит поля протокола, которые передаются нормальным образом).The intrusion detection system may further comprise an execution unit 22 for generating an intrusion detection signal in the event that the (value) protocol field has been identified as an intrusion, i. as being outside the safe area defined by the associated model. In response to generating an intrusion detection signal, an intrusion detection action may be performed, eg, including calling an alarm, filtering the data packet or protocol field (thereby, for example, removing the data packet or protocol field). An intrusion detection signal can also be generated in the case where the parser fails to identify the protocol field (which implies that the data packet does not conform to the protocol), and/or in the case when the model processing unit during the intrusion detection operation cannot associate the extracted protocol field with the model from the set (which assumes that the data packet does not contain protocol fields that are normally transmitted).

Для каждого поля протокола, используется конкретная модель, предпочтительно с использованием отличной модели для каждого отличного поля протокола, таким образом, что наиболее оптимальная оценка может быть выполнена для каждого поля протокола, так что модель, которая специально предназначена для данного поля протокола, может быть использована для оценки поля протокола.For each protocol field, a specific model is used, preferably using a different model for each different protocol field, such that the most optimal estimate can be made for each protocol field, so that a model that is specific to that protocol field can be used. to evaluate the protocol field.

В варианте осуществления, модели были построены с использованием по меньшей мере двух типов модели, при этом первый тип модели по меньшей мере из двух типов модели оптимизирован для (или работает только для) поля протокола с первым типом данных и при этом второй тип модели по меньшей мере из двух типов модели оптимизирован для поля протокола со вторым типом данных. Это может быть случаем, когда первый тип модели оптимизирован для поля протокола с одним из типов: числовой тип данных, строковый тип данных или двоичный тип данных, а второй тип модели оптимизирован для поля протокола с другим из типов: числовым типом данных, строковым типом данных или двоичным типом данных.In an embodiment, the models were built using at least two model types, wherein the first model type of the at least two model types is optimized for (or only works for) a protocol field with the first data type, and where the second model type is at least least of the two model types is optimized for the protocol field with the second data type. This may be the case when the first model type is optimized for a protocol field with one of the types: numeric data type, string data type, or binary data type, and the second model type is optimized for a protocol field with another of the types: numeric data type, string data type or binary data type.

Например, для значения поля A1 протокола с числовым типом данных, может быть построена модель М-1-А1, которая предназначена для описания числовых значений. Для значения поля A2 протокола с числовым типом данных, может быть построена модель M-1-A2, которая подобным образом предназначена для описания числовых значений. Для значения поля A3 протокола со строковым типом данных может быть построена модель M-S-A3, которая оптимизирована для или подогнана для описания строковых значений. Модели для разных полей протокола, которые имеют одинаковый тип данных, например модели М-1-А1 и M-1-A2, могут быть построены, с использованием одинаковой архитектуры модели, но с разным содержанием (например, другой допустимый диапазон, другой набор или набор допустимых значений, и т.д.), с тем чтобы выразить различия между полями A1 и А2 протокола.For example, for the value of field A1 of the protocol with a numeric data type, the model M-1-A1 can be built, which is designed to describe numeric values. For a protocol field value A2 with a numeric data type, the M-1-A2 model can be constructed, which is similarly designed to describe numeric values. For a protocol field value A3 with a string data type, an M-S-A3 model can be constructed that is optimized for or fitted to describe string values. Models for different protocol fields that have the same data type, such as M-1-A1 and M-1-A2 models, can be built using the same model architecture but with different content (for example, a different allowable range, different set, or set of allowed values, etc.) in order to express differences between fields A1 and A2 of the protocol.

Следует понимать, что модель с типом модели для описания числовых значений и модель с типом модели, описывающим строковые значения, могут быть лучше или более точны при описании значений сообщения протокола, содержащего как числовые значения, так и строковые значения в своих полях протокола, чем единая модель, которая будет оптимизирована для описания всех значений, как числовых значений, так и строковых значений, сообщения протокола.It should be understood that a model with a model type to describe numeric values and a model with a model type to describe string values may be better or more accurate at describing the values of a protocol message containing both numeric values and string values in its protocol fields than a single a model that will be optimized to describe all values, both numeric values and string values, of the protocol message.

Система 10 обнаружения вторжения может быть выполнена с возможностью построения модели во время фазы обучения. Работа системы 10 обнаружения вторжения и способ в соответствии с вариантами осуществления изобретения будут дополнительно описаны со ссылкой на фиг. 3 и 4. Фиг. 3 схематичноThe intrusion detection system 10 may be configured to build the model during the training phase. The operation of the intrusion detection system 10 and the method according to the embodiments of the invention will be further described with reference to FIGS. 3 and 4. FIG. 3 schematically

- 9 042211 иллюстрирует фазу обучения, а фиг. 4 схематично иллюстрирует фазу обнаружения вторжения.- 9 042211 illustrates the learning phase, and FIG. 4 schematically illustrates the intrusion detection phase.

На фиг. 3, были схематично изображены этапы фазы обучения:In FIG. 3, the stages of the learning phase were schematically depicted:

Этап a1: синтаксический анализ трафика данных для извлечения по меньшей мере одного поля протокола для протокола, применяемого в трафике данных;Step a1: parsing the data traffic to extract at least one protocol field for the protocol used in the data traffic;

этап а2: ассоциирование извлеченного поля протокола с моделью для поля протокола, причем моделью выбранной из набора моделей;step a2: associating the extracted protocol field with a model for the protocol field, the model selected from the set of models;

этап а3: в случае, когда невозможно выполнить ассоциирование с существующими моделями из набора моделей, создание новой модели для извлеченного поля протокола и добавление новой модели в набор моделей;step a3: in the case where it is not possible to perform association with existing models from the model set, creating a new model for the retrieved protocol field and adding the new model to the model set;

этап а4: обновление модели для извлеченного поля протокола с использованием содержимого извлеченного поля протокола.step a4: updating the model for the extracted protocol field using the contents of the extracted protocol field.

В целом, сообщение протокола может содержать примитивные поля протокола и составные поля протокола. Составное поле протокола содержит два или более подполя протокола, каждое из которых может быть примитивным полем протокола или составным полем протокола. Таким образом, можно сказать, что сообщение протокола содержит древовидную структуру полей протокола. Например, в сообщении протокола составное поле протокола msg_body (тело сообщения) содержит примитивное поле протокола msg_len (длина сообщения) и составное поле протокола msg_data (данные сообщения). Составное поле протокола msg_data может содержать примитивные поля протокола msg_typeA (сообщение типа А) и msg_typeB (сообщение типа В). В данном документе понятие поле протокола может относиться к любому примитивному полю протокола на любом уровне такой древовидной структуры.In general, a protocol message may contain primitive protocol fields and compound protocol fields. A composite protocol field contains two or more protocol subfields, each of which can be a primitive protocol field or a composite protocol field. Thus, a protocol message can be said to contain a tree structure of protocol fields. For example, in a protocol message, the protocol composite field msg_body (message body) contains the primitive protocol field msg_len (message length) and the protocol composite field msg_data (message data). The msg_data composite protocol field may contain the primitive protocol fields msg_typeA (message type A) and msg_typeB (message type B). As used herein, the term protocol field may refer to any primitive protocol field at any level of such a tree structure.

Разные типы модели могут быть использованы. Например, тип модели поля протокола может, например, быть одним из типов: числовым типом модели, строковым типом модели или двоичным типом модели. В случае, когда обнаруживается, что извлеченное поле протокола содержит числовое значение, числовой тип модели может быть применен для поля протокола. В случае, когда обнаруживается, что извлеченное поле протокола содержит строковое значение, строковой тип модели может быть применен для данного поля протокола. Возможен случай, что (например, в текстовом протоколе), когда на фазе обучения блок синтаксического анализа сетевого протокола неспособен установить, каким типом данных является поле протокола, числовым типом данных или строковым типом данных, то модель двоичного типа данных применяется в качестве более универсального типа модели.Different types of models can be used. For example, the protocol field model type may, for example, be one of a numeric model type, a string model type, or a binary model type. In the case where the retrieved protocol field is found to contain a numeric value, the numeric model type can be applied to the protocol field. In the case where the retrieved protocol field is found to contain a string value, the string model type can be applied to that protocol field. It is possible that (for example, in a text protocol), when during the learning phase the network protocol parser is unable to determine whether the data type of the protocol field is a numeric data type or a string data type, then the binary data type model is used as a more generic type. models.

Как объяснено выше, набор моделей может содержать соответствующую модель для каждого поля протокола. Модель для поля протокола с числовым типом данных может быть построена по-другому (т.е. может быть другого вида или с другой архитектурой модели), чем модель для поля протокола со строковым типом данных. Поскольку модели могут быть оптимизированы для каждого типа данных, то модель может быть более точной при описании нормальных, допустимых или свободных от вторжения сообщений протокола, чем модели, которые не учитывают тип данных полей протокола.As explained above, a set of models may contain a corresponding model for each protocol field. The model for a protocol field with a numeric data type may be built differently (ie, may be of a different form or with a different model architecture) than the model for a protocol field with a string data type. Because models can be optimized for each data type, the model can be more accurate in describing normal, valid, or intrusion-free protocol messages than models that do not consider the data type of protocol fields.

Примеры разных видов типов модели для разных видов типов данных объясняются ниже. Для числовых типов данных могут быть применены два типа модели, первый для полей протокола, представляющих собой длинны, а второй для полей протокола, представляющих собой перечисления.Examples of different kinds of model types for different kinds of data types are explained below. For numeric data types, two model types can be applied, the first for protocol fields that are lengths, and the second for protocol fields that are enums.

Если поле протокола представляет собой перечисление (например, набор значений), то модель может содержать набор S со всеми значениями поля протокола, которые были извлечены на фазе обучения. После начала с пустым набором, во время фазы обучения, каждое значение, которое идентифицировано для поля протокола, может быть добавлено в набор. На фазе обнаружения вторжения, сообщение протокола может быть классифицировано в качестве аномального, когда значение соответствующего определенного поля протокола не является, например, частью набора S.If the protocol field is an enumeration (for example, a set of values), then the model may contain a set S with all the values of the protocol field that were extracted during the training phase. After starting with an empty set, during the training phase, each value that is identified for a protocol field can be added to the set. In the intrusion detection phase, a protocol message can be classified as anomalous when the value of the corresponding defined protocol field is not, for example, part of the set S.

Если поле протокола представляет собой длину, то модель может быть построена на аппроксимации распределения значений поля протокола во время фазы обучения. Во время фазы обучения, среднее μ и дисперсия σ² аппроксимации распределения могут быть вычислены на основании выборочного среднего или выборочной дисперсии из всех значений, которые были определены в качестве содержимого данного поля протокола. С помощью среднего μ и дисперсии σ² аппроксимации распределения, может быть вычислена вероятность для всех значений. Во время фазы обнаружения вторжения, когда вероятность определенного значения поля протокола меньше заданного порогового значения, сообщение протокола с данным значением может быть классифицировано как аномальное.If the protocol field is a length, then the model can be built by approximating the distribution of protocol field values during the training phase. During the training phase, the mean μ and variance σ ² of the distribution approximation may be computed based on the sample mean or sample variance of all values that have been defined as the contents of a given protocol field. Using the mean μ and variance σ ² of the distribution approximation, the probability for all values can be calculated. During the intrusion detection phase, when the probability of a particular protocol field value is less than a given threshold value, a protocol message with that value may be classified as anomalous.

Модуль для поля протокола логического типа может, например, отслеживать логическое значение, усредненное по некоторому количеству образцов и сравнивать усредненное значение с предварительно определенным пороговым значением. Пример такой модели описывается ниже.A module for a boolean protocol field may, for example, keep track of a boolean value averaged over a number of samples and compare the averaged value against a predetermined threshold value. An example of such a model is described below.

Во время фазы обучения вычисляется вероятность Pt того, что значение поля соответствует значению истина, и вычисляется вероятность Pf(1-Pt) того, что значение поля соответствует значению ложь.During the learning phase, the probability Pt that the field value is true is calculated, and the probability Pf(1-Pt) that the field value is false is calculated.

- Во время обнаружения вторжения рассматривается последовательность из n образцов для значения поля и затем вычисляется биноминальная вероятность наблюдения такой последовательности значений, для заданного Pt и Pf. Затем вероятность сравнивается с некоторым пороговым значением t, и вызы-- During intrusion detection, a sequence of n samples for a field value is considered and then the binomial probability of observing such a sequence of values is calculated, for a given Pt and Pf. Then the probability is compared with some threshold value t, and the call

- 10 042211 вается оповещение о тревоге, если р_образца<Т Например, предположим, что во время фазы обучения мы наблюдали одинаковое количество значений истина и ложь. Тогда Pt~1/2 и Pf~1/2. Мы установили пороговое значение вероятности для последовательностей из 5 значений равной 0,1. Теперь, предположим, что во время фазы обнаружения вторжения мы наблюдаем последовательность [ложь, ложь, ложь, ложь, ложь]. Биноминальная вероятность Р_образец=Р{истина=0)=003125<01. В данном случае мы вызываем оповещение о тревоге. Пример типа модели для строк, который может обрабатывать строки ASCII и Юникод описываются ниже. Сначала, описывается тип модели для строк ASCII.- 10 042211 an alarm is raised if p_sample<T For example, suppose we observed the same number of true and false values during the training phase. Then Pt~1/2 and Pf~1/2. We set the probability threshold for sequences of 5 values to 0.1. Now, suppose that during the intrusion detection phase we observe the sequence [false, false, false, false, false]. Binomial probability P_sample=P{true=0)=003125<01. In this case, we trigger an alarm notification. An example of a model type for strings that can handle ASCII and Unicode strings is described below. First, the model type for ASCII strings is described.

Тип модели для строки ASCII содержит два логических значения и список. Первое логическое значение (буквы) принимает значение истина, если мы видим буквы, второе логическое значение (цифры) принимает значение истина, если мы видим цифры, а набор (символы) следит за всеми символами, которые мы видим. Для заданного строкового поля s, определяется функция f(s), которая дает ответ на то, содержит ли строка буквы, числа и какие символы. Например, для строки пользователяИмя?#! мы имеем / ( пользователяИмя 'Ж') = буквы цифры истина ложь символыThe model type for an ASCII string contains two booleans and a list. The first boolean (letters) is true if we see letters, the second boolean (numbers) is true if we see numbers, and the set (characters) keeps track of all the characters we see. Given a string field s, a function f(s) is defined that gives an answer to whether the string contains letters, numbers, and what characters. For example, for the string UserName?#! we have / ( username 'W') = letters numbers true false characters

Во время фазы обучения, заданная строка s модели M обновляется следующим образом:During the training phase, the given row s of the model M is updated as follows:

букеы цифры символыbouquets numbers symbols

Мбуквы vf (s) буквыM letters vf (s) letters

М цифры ^f(s) цифрыM digits ^f(s) digits

Мсимволы uf(s) символыMcharacters uf(s) characters

Знаки строки оцениваются один за другим. Для каждого знака машина проверяет тип, и в случае, когда знак является либо буквой, либо цифрой, машина обновляет модель соответствующим образом посредством установки соответствующего флага в значение истина. В случае, когда текущий знак является символом, он добавляется в текущий набор символов. В случае, когда символ уже присутствует, он дважды не добавляется.The characters of the string are evaluated one by one. For each character, the machine checks the type, and in case the character is either a letter or a digit, the machine updates the model accordingly by setting the appropriate flag to true. In case the current character is a character, it is added to the current character set. In case the character is already present, it is not added twice.

Во время фазы обнаружения вторжения, для заданной строки s, может вызываться оповещение о тревоге, если (/(в) буквы Л -М букеы) V (/(s) цифры А -М цифры) V (f (s) символьном символы)During the intrusion detection phase, for a given string s, an alarm can be raised if (/(s) letters L-M letters) V (/(s) digits A-M digits) V (f (s) character characters)

Знаки строки вновь оцениваются один за другим. Процесс проверки является прямым. Если текущий знак является либо буквой (либо цифрой), машина проверяет, наблюдались ли ранее знаки буквы (или цифры) для заданного поля. Когда проверка проваливается, вызывается оповещение о тревоге. В случае, когда знаком является символ, машина проверяет, что заданный символ наблюдался до этого.The characters of the string are again evaluated one by one. The verification process is straightforward. If the current character is either a letter (or a digit), the engine checks to see if any letter (or digit) characters have been previously observed for the given field. When the check fails, an alert is raised. In the case where the sign is a character, the machine checks that the given character has been observed before.

Когда данная проверка проваливается, вызывается оповещение о тревоге.When this check fails, an alert is raised.

В начале, модель M определяется следующим образом:At the beginning, the model M is defined as follows:

отташ цифры символы ложь ложь 0ottash numbers symbols false false 0

Другой пример типа модели для строк, которая может быть использована для строк Юникод, описывается ниже. Для строк Юникод, методика моделирования и обнаружения может быть аналогична моделированию для строк ASCII. Знаки Юникод, которые не являются ASCII, рассматриваются как буквы ASCII, т.е. если строка содержит знак Юникод, логическое значение буквы устанавливается в значение истина. В дополнение запоминается набор сценариев Юникод (например, Латинский, Кириллический, Арабский) как видно во время фазы обучения. С помощью данной дополнительной информации обнаруживается, например, присутствуют ли в строке странные знаки Юникод (которые возможно принадлежат к другому сценарию, чем те, которые видели на фазе обучения).Another example of a model type for strings that can be used for Unicode strings is described below. For Unicode strings, the modeling and detection technique can be similar to modeling for ASCII strings. Unicode characters that are not ASCII are treated as ASCII letters, i.e. if the string contains a Unicode character, the boolean value of the letter is set to true. In addition, a set of Unicode scripts (eg Latin, Cyrillic, Arabic) is memorized as seen during the learning phase. With this additional information, it is detected, for example, whether there are strange Unicode characters in the string (which possibly belong to a different scenario than those seen in the training phase).

Более подробно, для заданного строкового поля s Юникод, мы определяем функцию f(s), которая говорит о том, содержит ли строка буквы, цифры, какие символы и какие сценарии Юникод. Например, для строки mu3soafa?#! мы имеем истина пож:ъ Фа ή ' {латинский}In more detail, given a Unicode string field s, we define a function f(s) that tells whether the string contains letters, numbers, which characters, and which Unicode scripts. For example, for the string mu3soafa?#! we have true pl:ъ fa ή '{Latin}

Для строк Юникод модель M инициализируется и обновляется посредством выполнения точно таких же или подобных операций, что и для строк ASCII и посредством обработки дополнительного поля сценарии, подобно полю символы.For Unicode strings, the model M is initialized and updated by performing exactly the same or similar operations as for ASCII strings and by processing an additional script field, like the characters field.

Некоторый дополнительный пример для типа модели двоичных полей протокола предоставляется ниже.Some additional example for a protocol binary field model type is provided below.

Для двоичного типа данных может быть применена модель из известных основанных на аномалииFor a binary data type, a model from known anomaly-based

Ьуксы цифры символы сценарииLux numbers characters scripts

- 11 042211 систем обнаружения вторжения, основанная на анализе полезной нагрузки.- 11 042211 intrusion detection systems based on payload analysis.

Пример двоичной модели основан на анализе 1-грамм. n-грамм является последовательностью n последовательных байт.The binary model example is based on 1-gram analysis. An n-gram is a sequence of n consecutive bytes.

Для заданного двоичного поля b длиной l байт мы сначала вычисляем вектор f, содержащий частоту каждого байта. Другими словами, для заданного значения ν байта элемент f, соответствующий ν, имеет видGiven a binary field b of length l bytes, we first compute a vector f containing the frequency of each byte. In other words, for a given byte value ν, the element of f corresponding to ν is

Во время фазы обучения, вектор частоты применяется для вычисления среднего и стандартного отклонения каждого значения байта. С этой целью, для заданной последовательности из n двоичных полей b1 bn, и их ассоциированных векторов частоты байта (fl fn), вычисляются два вектора μ и σ, которые содержат соответственно среднее и стандартное отклонение каждого значения байта (от 0 до 255). Эти два вектора в данном примере образуют двоичную модель.During the training phase, a frequency vector is applied to calculate the mean and standard deviation of each byte value. To this end, for a given sequence of n binary fields b1 bn, and their associated byte frequency vectors (fl fn), two vectors μ and σ are calculated, which contain the mean and standard deviation of each byte value (from 0 to 255), respectively. These two vectors in this example form a binary model.

Во время фазы тестирования, для заданного значения s двоичного поля, сначала вычисляется ассоциированный вектор из частот fs. Затем, применяется соответствующая функция F (например, нормализованное евклидово расстояние) для определения расстояния между fs и моделью, построенной во время фазы обучения. Если результирующее расстояние превышает предварительно определенное пороговое значение, то может быть вызвано оповещение о тревоге.During the testing phase, for a given binary field value s, the associated frequency vector fs is first computed. Then, an appropriate F function (eg, normalized Euclidean distance) is applied to determine the distance between fs and the model built during the training phase. If the resulting distance exceeds a predetermined threshold, then an alarm can be triggered.

Более точная версия описанной выше модели может быть получена посредством разбиения набора значений обучения b1 bn на подмножества. Для разбиения набора обучения на подмножества может быть применен алгоритм кластеризации, такой как Самоорганизующаяся Карта (SOM), к входным значениям (b1 bn). Затем для каждого подмножества может быть построена отдельная модель (т.е. пара массивов μ, σ).A more accurate version of the model described above can be obtained by splitting the set of training values b1 bn into subsets. To partition the training set into subsets, a clustering algorithm such as Self Organizing Map (SOM) can be applied to the input values (b1 bn). Then, for each subset, a separate model (i.e., a pair of arrays μ, σ) can be built.

Во время фазы обнаружения вторжения, алгоритм кластеризации работает по значению (s) двоичного поля. Тест, как описано выше, затем может быть применен к модели, ассоциированной с результирующим кластером.During the intrusion detection phase, the clustering algorithm operates on the value (s) of the binary field. The test, as described above, can then be applied to the model associated with the resulting cluster.

Третьим примером двоичной модели является так называемый эмулятор сети. Эмулятор сети является алгоритмом, который выполнен с возможностью определения того, содержатся ли опасные исполняемые инструкции внутри набора байтов. Для заданной последовательности байтов, алгоритм сначала переводит существующие значения байта в соответствующие инструкции по сборке (трансляция на язык ассемблера). Затем, он пытается найти последовательности инструкций, которые могут быть распознаны в качестве опасных или подозрительных (например, длинные последовательности инструкций NOP, которые, как правило, обнаруживаются внутри вредоносных кодов запуска оболочки известных атак). В случае, когда такие последовательности найдены, вызывается оповещение о тревоге. Следует отметить, что для данного типа двоичной модели фаза обучения не требуется.A third example of a binary model is the so-called network emulator. The network emulator is an algorithm that is configured to determine if dangerous executable instructions are contained within a set of bytes. For a given sequence of bytes, the algorithm first translates the existing byte values into the appropriate assembly instructions (translation into assembly language). It then tries to find instruction sequences that might be recognized as dangerous or suspicious (for example, long NOP instruction sequences that are typically found inside malicious shell startup codes of known attacks). When such sequences are found, an alert is raised. It should be noted that for this type of binary model, the training phase is not required.

В случае, когда двоичное поле содержит так называемый Большой Двоичный Объект (BLOB), в котором данные организованы в соответствии со структурой, которая не указана в спецификации сетевого протокола, тот же подход, который описан в данном документе, может быть применен для дальнейшего разделения BLOB на составляющие его поля до тех пор, пока не будут извлечены и обработаны базовые поля (например, числовые поля, строковые поля, логические поля и т.д.). Например, двоичное поле протокола может содержать изображение GIF или JPEG, для которого существует спецификация, однако такая спецификация не является частью самой спецификации сетевого протокола. В данном случае, может быть использована спецификация изображения GIF или JPEG для дальнейшего разделения значения поля на его базовые составляющие поля. Затем модель может быть выбрана и соответствующим образом построена для составляющих полей объекта. Другой такой случай возникает, когда двоичное поле содержит целую область памяти одного из сетевых узлов, осуществляющих связь (например, карты распределения памяти PLC, обмен которыми является частью протокола Modbus). Структура данной области памяти может быть определена в других документах (например, в спецификациях поставщика PLC), или может быть получена в результате предположения на основании наблюдения за достаточным количеством образцов данных. Такая информация может быть использована для дальнейшего разделения области памяти на ее базовые поля, которые затем могут быть обработаны в соответствии с проиллюстрированными в данном документе методиками.In the case where a binary field contains a so-called Binary Large Object (BLOB), in which the data is organized according to a structure that is not specified in the network protocol specification, the same approach as described in this document can be applied to further partition the BLOB to its constituent fields until the underlying fields (such as numeric fields, string fields, boolean fields, etc.) have been retrieved and processed. For example, a protocol binary field might contain a GIF or JPEG image for which a specification exists, but that specification is not part of the network protocol specification itself. In this case, a GIF or JPEG image specification can be used to further subdivide the field value into its underlying field constituents. The model can then be selected and appropriately built for the constituent fields of the object. Another such case occurs when the binary field contains the entire memory area of one of the network nodes performing the communication (for example, PLC memory maps, the exchange of which is part of the Modbus protocol). The structure of this memory area may be defined in other documents (eg PLC vendor specifications), or may be guesswork based on observation of a sufficient number of data samples. Such information can be used to further subdivide the memory area into its base fields, which can then be processed in accordance with the techniques illustrated herein.

Кроме того, для строкового типа данных может быть применена модель, как описывается в документе Bolzoni, D. and Etalle, S. (2008), Boosting Web Intrusion Detection Systems by Inferring Positive Signatures. In: Confederated International Conferences On the Move to Meaningful Internet Systems (OTM). Для двоичного типа данных может быть применена суб-модель из известных основанных на аномалии систем обнаружения вторжения, основанная на анализе полезной нагрузки. Пример может быть найден в документе Anomalous payload-based network intrusion detection (RAID, страницы 203-222, 2004) под авторством Ke Wang и Salvatore J. Stolfo. В данной работе авторы представляют систему, именуемую PAYL, которая, использует n-грамм анализ для обнаружения аномалий. n-грамм является последовательностью n последовательных, байт. Частота и стандартное отклонение 1-грамм (последовательностей из 1 байта) анализируются и сохраняются в моделях обнаружения, которыеIn addition, a model can be applied to the string data type as described in Bolzoni, D. and Etalle, S. (2008), Boosting Web Intrusion Detection Systems by Inferring Positive Signatures. In: Confederate International Conferences On the Move to Meaningful Internet Systems (OTM). For the binary data type, a sub-model of known anomaly-based intrusion detection systems based on payload analysis can be applied. An example can be found in Anomalous payload-based network intrusion detection (RAID, pages 203-222, 2004) by Ke Wang and Salvatore J. Stolfo. In this paper, the authors present a system called PAYL, which uses n-gram analysis to detect anomalies. An n-gram is a sequence of n consecutive bytes. The frequency and standard deviation of 1-grams (sequences of 1 byte) are analyzed and stored in detection models that

- 12 042211 строятся во время фазы обучения. Затем, на фазе обнаружения вторжения, выбирается соответствующая модель (с использованием значение длины полезной нагрузки) и используется для сравнения входящего трафика.- 12 042211 are built during the learning phase. Then, during the intrusion detection phase, an appropriate model is selected (using the payload length value) and used to compare the incoming traffic.

Другой пример может быть найден в работе POSEIDON: a 2-tier Anomaly-based Network (IWIA, страницы 144-156. IEEE Computer Society, 2006) под авторством Damiano Bolzoni, Emmanuele Zambon, Sandro Etalle и Pieter Hartel. В данном труде авторы строят поверх PAYL улучшенную систему посредством игнорирования длины полезной нагрузки для выбора (и построения) моделей обнаружения, и использования вместо этого нейронной сети, которая осуществляет предварительную обработку данных полезной нагрузки и выход которой используется для выбора соответствующего режима обнаружения.Another example can be found in POSEIDON: a 2-tier Anomaly-based Network (IWIA, pages 144-156. IEEE Computer Society, 2006) by Damiano Bolzoni, Emmanuele Zambon, Sandro Etalle and Pieter Hartel. In this paper, the authors build an improved system on top of PAYL by ignoring the payload length to select (and build) discovery models, and instead use a neural network that preprocesses the payload data and whose output is used to select the appropriate discovery mode.

Еще один пример может быть найден в документе под авторством Michalis Polychronakis, Kostas С. Anagnostakis, и Evangelos P. Markatos. Comprehensive Shellcode Detection using Runtime Heuristics. In Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC). Декабрь 2010, Остин, Техас, США. В данном труде авторы представляют эмулятор сети. Данный компонент программного обеспечения реализует эвристику и симулирует посредством программного обеспечения физический CPU. Эмулятор сети может тестировать, содержат ли входные данные исполняемый (и вредный) код. В варианте осуществления, процесс синтаксического анализа может содержать этапы, на которых:Another example can be found in a paper by Michalis Polychronakis, Kostas C. Anagnostakis, and Evangelos P. Markatos. Comprehensive Shellcode Detection using Runtime Heuristics. In Proceedings of the 26th Annual Computer Security Applications Conference (ACSAC). December 2010, Austin, Texas, USA. In this work, the authors present a network emulator. This software component implements the heuristic and simulates the physical CPU through the software. The network emulator can test whether the input contains executable (and harmful) code. In an embodiment, the parsing process may comprise the steps of:

i) собирают пакеты данных из сети передачи данных;i) collecting data packets from the data network;

ii) фрагментируют IP пакеты;ii) fragment IP packets;

iii) собирают сегменты TCP;iii) collect TCP segments;

iv) извлекают данные приложения иiv) retrieve application data and

v) извлекают сообщения протокола.v) retrieve protocol messages.

Как было сказано выше, существует возможность выбора разных типов модели в соответствии с семантикой поля, с которым ассоциируется модель. Также можно регулировать один или более параметры модели (конкретные для каждого типа модели) в соответствии с семантикой для расширения или сужения безопасной области, определяемой моделью. Здесь даны некоторые примеры использования семантики поля для выбора типа модели или регулировки параметров модели.As mentioned above, it is possible to select different types of model according to the semantics of the field with which the model is associated. It is also possible to adjust one or more model parameters (specific for each model type) according to the semantics to expand or narrow the safe area defined by the model. Here are some examples of using field semantics to select a model type or adjust model parameters.

В случае числового поля, которое представляет тип сообщения протокола, может быть использована модель типа числового перечисления. Такой тип модели позволяет гарантировать то, что только типы сообщения, перечисленные в модели, определены в качестве безопасной области. В случае, когда модель строится автоматически во время фазы обучения, все наблюдаемые типы сообщения, рассматриваются в качестве безопасных. В случае, когда модель строится вручную, набор разрешенных сообщений может быть построен в соответствии с конкретными политиками безопасности. Например, политика безопасности может предписывать то, что только операции чтения выполняются на некотором сетевом узле. В данном случае, набор разрешенных сообщений будет содержать только сообщения чтения.In the case of a numeric field that represents a protocol message type, the numeric enumeration type model can be used. This type of model ensures that only the message types listed in the model are defined as a safe area. In the case where the model is built automatically during the training phase, all observable message types are considered safe. In the case when the model is built manually, the set of allowed messages can be built in accordance with specific security policies. For example, a security policy may dictate that only read operations are performed on a certain network node. In this case, the set of allowed messages will only contain read messages.

В случае числового поля, которое представляет частоту вращения двигателя, в контексте производственного процесса, может быть использована модель числового диапазона. Такой тип модели позволяет гарантировать то, что частота вращения двигателя не будет установлена ниже или выше значения, которое считается безопасным. В случае, когда модель строится автоматически во время фазы обучения, минимальные/максимальные разрешенные значения могут быть установлены равными минимальной/максимальной частотам вращения, наблюдаемым во время фазы обучения (точный диапазон). В случае, когда модель строится вручную, минимальное и максимальное значения диапазона могут быть установлены на основании технической спецификации двигателя, чтобы гарантировать то, что частота оборотов остается в допустимых рабочих условиях.In the case of a numeric field that represents engine speed, in the context of a manufacturing process, a numeric range model can be used. This type of model ensures that the engine speed is not set below or above a value that is considered safe. In case the model is built automatically during the learning phase, the minimum/maximum allowed values can be set to the minimum/maximum speeds observed during the learning phase (exact range). In the case where the model is built manually, the minimum and maximum range values can be set based on the engine specification to ensure that the RPM remains within acceptable operating conditions.

В случае, когда числовое поле, которое представляет собой длину связанного с безопасностью поля (например, длину строкового буфера), может быть использована модель типа числового распределения. Более того, поскольку поле очень существенно в плане безопасности, поскольку оно может быть целью атаки, направленной на переполнение буфера, то может быть установлено пороговое значение высокой вероятности. Таким образом, зона безопасности, определяемая моделью, ограничивается значениями, которые с высокой вероятностью генерируются точно таким же числовым распределением, которое наблюдается во время фазы обучения. Другими словами, если длина значения поля слишком большая по отношению к той, которая ранее наблюдалась во время фазы обучения, то значение рассматривается как аномальное, и вследствие этого как возможная атака. Например, код запуска оболочки, используемый для переноса атаки, направленной на переполнение буфера, может быть больше нормального содержимого буфера, тем самым генерируя аномальное значение для поля длины буфера.In the case where a numeric field that represents the length of a security-related field (eg, the length of a string buffer), a numeric distribution type model can be used. Moreover, since the field is very significant in terms of security, since it can be the target of a buffer overflow attack, a threshold value of high probability can be set. Thus, the safety zone defined by the model is limited to values that are generated with high probability by exactly the same numerical distribution that is observed during the training phase. In other words, if the length of the field value is too large compared to what was previously observed during the learning phase, then the value is considered anomalous, and therefore a possible attack. For example, the shell startup code used to carry a buffer overflow attack could be larger than the normal contents of the buffer, thereby generating an anomalous value for the buffer length field.

В случае строкового поля, которое представляет собой имя человека, может быть выбрана модель строкового типа и пороговое значение по умолчанию для количества символьных знаков, не включенных в модель, может быть установлено на очень низком уровне. Поскольку не ожидается, что имя человека содержит много символов, то установка порогового значения по умолчанию равного очень низкому уровню гарантирует то, что сигнал обнаружения вторжения генерируется сразу в случае, когда наблюдаемое значение содержит символы, которые представлены в модели. Это может быть случаем так называемой атаки по внедрению SQL, которая использует специальные знаки, такие как одинарные или двойные кавычки, запятые и т.д.In the case of a string field that represents a person's name, a string type model can be chosen and the default threshold for the number of character characters not included in the model can be set to a very low level. Since a person's name is not expected to contain many characters, setting the default threshold to a very low level ensures that an intrusion detection signal is generated immediately when the observed value contains the characters that are represented in the model. This may be a case of a so-called SQL injection attack that uses special characters such as single or double quotes, commas, and so on.

Фиг. 4 схематично изображает этапы процесса обнаружения вторжения: этап Ь1: синтаксический анализ трафика данных для извлечения по меньшей мере одного поля протокола сообщения протоколаFig. 4 schematically depicts the steps of the intrusion detection process: step b1: parsing the data traffic to extract at least one protocol field of the protocol message

- 13 042211 трафика данных, этап Ь2: ассоциирование извлеченного поля протокола с моделью для данного поля протокола, причем модель выбирается из набора моделей, этап Ь3: оценка того, находится ли содержимое извлеченного поля протокола в безопасной области, как определяется моделью, и этап Ь4: генерирование сигнала обнаружения вторжения (например, сопровождаемое фильтрацией извлеченного поля протокола или сообщения протокола, содержащего поле протокола, генерированием оповещения тревоги для пользователя, или любым другим действием обнаружения вторжения) в случае, когда установлено, что содержимое извлеченного поля протокола находится за пределами безопасной области.- 13 042211 data traffic, step b2: associate the extracted protocol field with a model for the given protocol field, where the model is selected from a set of models, step b3: evaluate whether the content of the extracted protocol field is in the safe area as determined by the model, and step b4 : generation of an intrusion detection signal (e.g., followed by filtering the extracted log field or protocol message containing the protocol field, generating an alert to the user, or any other intrusion detection action) when the content of the extracted log field is determined to be outside the secure area .

В варианте осуществления, сигнал обнаружения вторжения может дополнительно генерироваться, когда синтаксический анализ не может установить поле как согласующееся с протоколом, или когда извлеченное поле не может быть ассоциировано ни с одной из моделей из набора моделей.In an embodiment, an intrusion detection signal may additionally be generated when parsing fails to set the field as protocol compliant, or when the extracted field cannot be associated with any of the models from the set of models.

Фиг. 5 схематично изображает, в качестве примера, общий вид концепций, предлагаемых в данной патентной заявке. Процесс начинается с синтаксического анализа (500) сетевого трафика для извлечения по меньшей мере одного поля протокола сообщения протокола. Второй этап содержит ассоциирование (501) извлеченного поля протокола с моделью для данного поля протокола, причем модель выбирается из набора моделей. Набор моделей может содержать разные типы модели, при этом набор моделей представлен на фиг. 5 обозначением 513. Выбор типа модели для извлеченного поля протокола может быть обусловлен как типом значения поля протокола (представлен обозначением 511), так и семантикой, связанной с полем протокола (представлена обозначением 512). Набор (513) разных типов модели также предоставляется в качестве входных данных, причем разные типы модели могут включать в себя: модель числового диапазона, модель числового набора (перечисления), модель числового распределения, модель строки ASCII, модель строки Юникод, логическая модель, двоичная модель, основанная на п-грамм, эмулятор сети, набор сигнатур обнаружения вторжения, и т.д. Процесс ассоциирования синтаксически проанализированного поля протокола с его соответствующей моделью (некоторым типом модели) также может быть усовершенствован посредством учета зависимости поля, которое описывает операцию, с полем, которое описывает аргумент такой операции (как представлено обозначением 509). В более общем плане, любая зависимость одного значения поля от другого значения поля (как представлено обозначением 510) может учитываться при ассоциировании синтаксически проанализированного поля протокола с его соответствующей моделью, таким образом, что несколько моделей строится для одного и того же поля в соответствии со значением другого поля в том же самом сообщении. На фазе обучения, в случае, когда модель выбранного типа модели не существует для синтаксически проанализированного поля протокола, такая модель может быть создана (этап 515). Подобным образом, в случае, когда модель уже существует, модель может быть обновлена (этап 516) на фазе обучения, чтобы включать в себя текущее значение синтаксически проанализированного поля в безопасной области, определяемой моделью. В случае, когда синтаксический анализ не может установить поле, наблюдаемое в сетевых данных, как согласующееся со спецификацией протокола, может быть сгенерирован (этап 508) сигнал обнаружения вторжения. Во время фазы обнаружения, в случае, когда невозможно ассоциировать с синтаксически проанализированным полем существующую модель выбранного типа модели, может быть сгенерирован (этап 504) сигнал обнаружения вторжения. С другой стороны, в случае, когда возможно ассоциировать с синтаксически проанализированным полем существующую модель выбранного типа модели, значение поля оценивается (этап 503) по отношению к безопасной области, определяемой моделью. В случае, когда значение синтаксически проанализированного поля протокола не находится в рамках безопасной области, определяемой моделью, может быть сгенерирован (этап 505) сигнал обнаружения вторжения. В заключении, в случае, когда сигнал обнаружения вторжения генерируется по любой причине из рассмотренных выше, могут быть предприняты дополнительные этапы, такой как удаление (этап 506) из сетевого трафика сообщения протокола, ассоциированного с полем протокола с аномальным значением, или вызов (этап 507) и вывод сообщения оповещения о вторжении.Fig. 5 schematically depicts, by way of example, a general view of the concepts proposed in this patent application. The process begins by parsing (500) the network traffic to extract at least one protocol field of the protocol message. The second step comprises associating (501) the extracted protocol field with a model for the given protocol field, the model being selected from a set of models. A model set may contain different types of model, with the model set shown in FIG. 5 at 513. The choice of model type for the extracted protocol field can be driven by both the value type of the protocol field (represented by 511) and the semantics associated with the protocol field (represented by 512). A set (513) of different model types is also provided as input, the different model types may include: numeric range model, numeric set (enumeration) model, numeric distribution model, ASCII string model, Unicode string model, boolean model, binary n-gram model, network emulator, intrusion detection signature set, etc. The process of associating a parsed protocol field with its corresponding model (some type of model) can also be improved by considering the dependency of a field that describes an operation with a field that describes an argument of such an operation (as represented by notation 509). More generally, any dependency of one field value on another field value (as represented by 510) can be taken into account when associating a parsed protocol field with its corresponding model, such that multiple models are built for the same field according to the value another field in the same message. During the training phase, in the event that a model of the selected model type does not exist for the parsed protocol field, such a model may be created (block 515). Similarly, in the case where the model already exists, the model may be updated (block 516) during the training phase to include the current value of the parsed field in the safe region defined by the model. In the event that parsing cannot establish a field observed in the network data as consistent with the protocol specification, an intrusion detection signal may be generated (block 508). During the discovery phase, in the event that it is not possible to associate an existing model of the selected model type with the parsed field, an intrusion detection signal may be generated (block 504). On the other hand, in the case where it is possible to associate with the parsed field an existing model of the selected model type, the value of the field is evaluated (block 503) with respect to the safe area defined by the model. In the case where the value of the parsed protocol field is not within the safe region defined by the model, an intrusion detection signal may be generated (block 505). Finally, in the event that an intrusion detection signal is generated for any of the reasons discussed above, additional steps may be taken, such as removing (step 506) from the network traffic the protocol message associated with the protocol field with an anomalous value, or calling (step 507 ) and output an intrusion alert message.

Следует понимать, что раскрытые варианты осуществления являются лишь примерными для изобретения, которое может быть воплощено в различных формах. Вследствие этого, конкретные раскрываемые здесь структурные и функциональные подробности не должны интерпретироваться как накладывающие ограничение, а лишь как основа для формулы изобретения и в качестве репрезентативной основы для обучения специалистов в соответствующей области техники по различному использованию настоящего изобретения практически в любой соответствующем образом детализированной структуре. Кроме того, использованные здесь понятия и фразы не предназначены для ограничения, а, наоборот, для предоставления понятного описания изобретения.It should be understood that the disclosed embodiments are only exemplary of the invention, which may be embodied in various forms. As a consequence, the specific structural and functional details disclosed herein should not be interpreted as limiting, but only as a basis for the claims and as a representative basis for teaching those skilled in the art in the various uses of the present invention in virtually any suitably detailed structure. In addition, the terms and phrases used herein are not intended to be limiting, but rather to provide a clear description of the invention.

Элементы вышеупомянутых вариантов осуществления могут быть объединены для создания других вариантов осуществления.Elements of the above embodiments may be combined to create other embodiments.

Использованные здесь формы единственного числа определяются как одно или более одного. Используемое здесь понятие другой определяется как, по меньшей мере, второй или более. Используемые здесь понятия включающий в себя и/или обладающий определены как содержащий (т.е. не исключающий другие элементы или этапы). Любые ссылочные обозначения в формуле изобретения не должны толковаться как накладывающие ограничение на объем формулы изобретения или изобретения. Тот лишь факт, что некоторые меры размещены во взаимно разных зависимых пунктах формулы изобретения, не указывает на то, что сочетание этих мер не может быть использовано для получения преимущества. Объем изобретения ограничивается только нижеследующей формулой изобретения.The singular forms used here are defined as one or more than one. As used herein, the term other is defined as at least the second or more. As used herein, the terms including and/or having are defined as containing (ie, not excluding other elements or steps). Any reference designations in the claims are not to be construed as limiting the scope of the claims or the invention. The mere fact that some measures are placed in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. The scope of the invention is limited only by the following claims.

Claims

CLAIM

1. A method for monitoring data traffic in a data network, the method comprising parsing the data traffic to extract at least one protocol field of the data traffic protocol message;

associating the retrieved protocol field with a corresponding model for that protocol field, the model being selected from a set of models, the model set containing different models for different protocol fields;

evaluating whether the content of the extracted protocol field is in a safe area, as determined by the model; and generating an intrusion detection signal when it is determined that the content of the extracted protocol field is outside the safe area, wherein a model training phase is built for the extracted protocol field, the training phase comprising steps of providing a plurality of model types;

determining the data type of the retrieved protocol field;

selecting a model type for the extracted protocol field from a plurality of model types based on a characteristic of the extracted protocol field, the characteristic containing the determined protocol type; and building a model for the extracted protocol field based on the selected model type.

2. The method of claim 1, wherein the model set comprises a model for an operator protocol field and a model for an argument protocol field, wherein the steps of associating and evaluating are performed for the operator protocol field and the argument protocol field.

3. The method of claim 2, wherein the set of models further comprises a model for the marshaling protocol field, wherein the steps of associating and evaluating are additionally performed for the marshaling protocol field.

4. A method according to any one of the preceding claims, wherein the protocol message comprises at least one primitive protocol field and at least one composite protocol field such that the protocol message contains a tree structure of protocol fields.

5. A method according to any one of the preceding claims, wherein the protocol field characteristic comprises the semantics of the protocol field, the method comprising determining the semantics of the extracted protocol field and selecting a model type using the determined semantics.

6. A method according to any one of the preceding claims, wherein the model set contains a corresponding model for each protocol field of the protocol field set.

7. A method according to any one of the preceding claims, wherein a model for a field is determined in a learning phase, the learning phase comprising parsing the data traffic to extract at least one protocol field for the protocol used in the data traffic;

associating the retrieved protocol field with a model for that protocol field, wherein the model is selected from a set of models; and updating the model for the extracted protocol field using the contents of the extracted protocol field.

8. The method of claim 7, wherein if an association cannot be made between the extracted protocol field and one of the models, then the method comprises creating a new model for the extracted protocol field and adding the new model to the model set.

9. A method according to any one of the preceding claims, which generates an intrusion detection signal when the parse fails to set the field as consistent with the protocol.

10. A method according to any one of the preceding claims, which generates an intrusion detection signal when the extracted field cannot be associated with any of the models from the set of models.

11. The method according to any one of the preceding claims, wherein the protocol is at least one of an application layer protocol, a session layer protocol, a transport layer protocol, or a lower layer protocol from the protocol stack.

12. The method according to any one of the preceding claims, the method further comprising, in response to generating an intrusion detection signal, at least one of removing said protocol field or a data packet containing said protocol field; and calling and outputting an intrusion alert message.

13. The method according to any of the preceding paragraphs, in which the model for the protocol field contains at least one of the set of acceptable protocol field values;

- 15 042211 numerical distribution of protocol field values and determining the range of acceptable protocol field values.

14. A method according to any one of the preceding claims, wherein the model for the protocol field contains a definition of acceptable letters, numbers, symbols, and scripts.

15. A method according to any one of the preceding claims, wherein the model for the protocol field contains a set of predefined intrusion signatures.

16. A method according to any one of the preceding claims, wherein the model set comprises two models for one protocol field, wherein a specific one of the two models is associated with one protocol field based on the value of the other protocol field.

17. The method of claim 1, wherein the data network is selected from the group consisting of a Manufacturing Process Control network, a SCADA network, an industrial plant data network, a data center network, an office data network, and combinations thereof.

18. The method of claim 1, wherein the protocol message is selected from the group consisting of a SCADA protocol message, an industrial enterprise data network protocol message, a data center network protocol message, an office data network protocol message, an HTTP protocol message, and combinations thereof.

19. A data communication network comprising an intrusion detection system for detecting an intrusion into data traffic in a data communication network, the intrusion detection system comprising a parser for parsing the data traffic to extract at least one protocol field of the data traffic protocol message;

a machine for associating the extracted protocol field with a corresponding model for that protocol field, the model being selected from a set of models, the model set containing different models for different protocol fields;

a model processing unit for analyzing whether the content of the extracted protocol field is in a safe area as determined by the model; and an execution unit for generating an intrusion detection signal when it is determined that the content of the extracted protocol field is outside the safe area, wherein the system is configured, in a learning phase, to build a model for the extracted protocol field, wherein the learning phase comprises the steps of which provide many types of models;

determining the data type of the retrieved protocol field;

20. The communication network of claim 19, wherein the model set comprises a model for the operator protocol field and a model for the argument protocol field, wherein the machine is configured to associate and evaluate the operator protocol field and the argument protocol field.

21. The communication network of claim 20, wherein the model set further comprises a model for the marshaling protocol field, wherein the engine is configured to associate and evaluate the marshaling protocol field.

22. The communication network according to any one of claims 19-21, wherein the protocol message comprises at least one primitive protocol field and at least one composite protocol field such that the protocol message contains a tree structure of protocol fields.

23. The communication network according to any one of claims 19-22, wherein the protocol field characteristic contains the semantics of the protocol field, and the system is configured to determine the semantics of the extracted protocol field and select the model type using the determined semantics.

24. The communication network according to any one of claims 19-23, wherein the model set contains a corresponding model for each protocol field of the protocol field set.

25. The communication network according to any one of claims 19 to 24, which is further configured to operate in a learning phase, wherein the learning phase is for learning at least one of the models, and the model processing unit is configured to update, in the learning phase, model for the extracted protocol field using the content of the extracted protocol field.

26. The communication network of claim 25, wherein the machine is also configured to, in the learning phase, if an association cannot be made between the extracted protocol field and one of the models, create a new model for the extracted protocol field and add the new model to the set models.

27. The communication network according to any one of claims 19-26, wherein the execution unit is also configured to generate an intrusion detection signal in response to an indication from the parser that the parser cannot set the field as consistent with the protocol.

- 16 042211

28. The communication network according to any one of claims 19-27, wherein the execution unit is also configured to generate an intrusion detection signal in response to an indication from the machine that the extracted field cannot be associated with any of the models from the set of models.

29. The communication network according to any one of claims 19 to 28, wherein the protocol is at least one of an application layer protocol, a session layer protocol, a transport layer protocol, or a lower layer protocol from the protocol stack.

30. The data network according to any one of claims 19 to 29, wherein the execution unit is configured to, in response to generating an intrusion detection signal, delete the protocol field or the data packet containing the protocol field; and call and output an intrusion alert message.

31. The data network according to any one of paragraphs.19-30, in which the model for the protocol field contains at least one of the set of acceptable protocol field values;

numerical distribution of protocol field values; and determining the range of acceptable protocol field values.

32. The data communication network according to any one of claims 19-31, wherein the model for the protocol field contains a definition of acceptable letters, numbers, symbols, and scripts.

33. The communication network according to any one of claims 19-32, wherein the model for the protocol field contains a set of predefined intrusion signatures.

34. The communication network according to any one of claims 19-33, wherein the set of models contains two models for one protocol field, and the machine is configured to associate a particular one of the two models with one protocol field based on the value of the other protocol field.

35. A data communication network according to any one of claims 19-34, which is selected from the group consisting of a Manufacturing Process Control network, a SCADA network, an industrial plant data network, a data center network, an office data network, and combinations thereof.

36. The data communication network according to any one of claims 19 to 35, wherein the protocol message is selected from the group consisting of a SCADA protocol message, an industrial enterprise data network protocol message, a data center network protocol message, an office data network protocol message, an HTTP protocol message. and their combinations.

37. A data center comprising an intrusion detection system for detecting an intrusion into data traffic in a data center data network, the intrusion detection system comprising a parser for parsing the data traffic to extract at least one protocol field of the data traffic protocol message;

determining the data type of the retrieved protocol field;

38. An industrial enterprise comprising an intrusion detection system for detecting an intrusion into data traffic in an industrial enterprise data network, the intrusion detection system comprising a parser for parsing the data traffic to extract at least one protocol field of the data traffic protocol message;

a model processing unit for analyzing whether the content of the extracted protocol field is in a safe area as determined by the model; and an execution unit for generating an intrusion detection signal in the event that the content of the extracted protocol field is found to be outside the safe area, wherein the system is configured, in the learning phase, to build a model for the extracted

- 17 042211 th field of the protocol, and the training phase contains steps in which a plurality of model types are provided;

determining the data type of the retrieved protocol field;

39. An office data network comprising an intrusion detection system for detecting an intrusion into data traffic in the office data network, the intrusion detection system comprising a parser for parsing the data traffic to extract at least one protocol field of the data traffic protocol message;

determining the data type of the retrieved protocol field;