RU2805014C1

RU2805014C1 - Method for generating adversarial examples for intrusion detection system of industrial control system

Info

Publication number: RU2805014C1
Application number: RU2022132242A
Authority: RU
Inventors: Александр Игоревич Гетьман; Андрей Игоревич Перминов; Дмитрий Александрович Рыболовлев; Андрей Георгиевич Мацкевич; Максим Николаевич Горюнов; Мария Ивановна Булгакова
Filing date: 2022-12-09
Publication date: 2023-10-10

Abstract

FIELD: computer engineering.

SUBSTANCE: invention is aimed at increasing the variability of the values of the selected features to increase the likelihood of generating effective adversarial examples that allow you to evade detection of an attack by an intrusion detection system. It is achieved due to the fact that when generating adversarial examples for an intrusion detection system of an industrial control system, the received data, which has the same distribution as the training data used by the intrusion detection system of an industrial control system, is additionally passed through a network device with an MTU different from MTU of industrial control system.

EFFECT: increasing the variability of the values of the selected features to increase the likelihood of generating effective adversarial examples that allow you to evade detection of an attack by an intrusion detection system.

1 cl, 2 dwg

Description

Изобретение относится к области информационных технологий, в частности к информационной безопасности, и может быть использовано для обучения систем обнаружения вторжений (СОВ).The invention relates to the field of information technology, in particular to information security, and can be used for training intrusion detection systems (IDS).

В рамках мероприятия «Методы обнаружения и противодействия атакам с внедрением закладок и зловредного кода в модели машинного обучения», разрабатываемого исследовательским центром доверенного искусственного интеллекта ИСП РАН, известны подходы к реализации состязательных атак в отношении моделей обнаружения вторжений. Настоящее изобретение также относится к области состязательного обучения, развивает предложенные центром доверенного искусственного интеллекта ИСП РАН подходы и описывает способ генерации примеров, реализующих состязательные атаки уклонения в отношении системы обнаружения вторжений промышленной системы управления.As part of the event “Methods for detecting and countering attacks involving the introduction of bookmarks and malicious code in machine learning models,” developed by the Research Center for Trusted Artificial Intelligence of the ISP RAS, approaches to implementing adversarial attacks against intrusion detection models are known. The present invention also relates to the field of adversarial learning, develops approaches proposed by the Center for Trusted Artificial Intelligence of the ISP RAS and describes a method for generating examples that implement adversarial evasion attacks against an intrusion detection system of an industrial control system.

Термины, используемые в тексте описания изобретения:Terms used in the text of the description of the invention:

Система обнаружения вторжений (СОВ) – программная система, предназначенная для выявления несанкционированной и вредоносной активности в компьютерной сети или на отдельном узле.An intrusion detection system (IDS) is a software system designed to detect unauthorized and malicious activity on a computer network or on a separate node.

Сетевые устройства – электронные устройства, необходимые для работы компьютерной сети, например: маршрутизатор, коммутатор, концентратор, коммутационная панель и др.Network devices are electronic devices necessary for the operation of a computer network, for example: router, switch, hub, patch panel, etc.

MTU (maximum transmission unit) – максимальный полезный блок данных одного пакета, который может быть передан сетевым устройством (сетью промышленной системы управления как совокупностью сетевых устройств) без фрагментации.MTU (maximum transmission unit) is the maximum useful data block of one packet that can be transmitted by a network device (an industrial control system network as a collection of network devices) without fragmentation.

Сейчас большое внимание во всем мире уделяется информационной безопасности, в частности, безопасности промышленных систем управления. Для решения данных проблем ведется разработка и внедрение систем обнаружения вторжений, сигнатурных и несигнатурных. Недостатком сигнатурных СОВ является невозможность обнаружения новых, ранее неизвестных компьютерных атак. Для обнаружения новых атак оправдано использование несигнатурных СОВ, среди которых наиболее широкое распространение получают системы, основанные на применении методов машинного обучения.Nowadays, much attention around the world is paid to information security, in particular, the security of industrial control systems. To solve these problems, the development and implementation of intrusion detection systems, signature and non-signature, is underway. The disadvantage of signature-based IDS is the inability to detect new, previously unknown computer attacks. To detect new attacks, the use of non-signature IDS is justified, among which systems based on the use of machine learning methods are most widely used.

Однако СОВ промышленных систем управления, основанные на машинном обучении, уязвимы в определенных аспектах. Моделью машинного обучения можно манипулировать посредством выборок, намеренно созданных злоумышленником. Атакующий, специальным образом подобрав пример входных данных, может заставить модель ошибаться: незначительное изменение примера входных данных приведёт к ошибке в ответе модели (классификатора) машинного обучения. Совокупность примеров входных данных модели, на которых модель ошибается, т.е. ошибочно классифицирует, называется состязательной выборкой. Возможность реализации состязательных атак создает потенциальную угрозу безопасности системы, основанной на машинном обучении.However, IDS of industrial control systems based on machine learning are vulnerable in certain aspects. A machine learning model can be manipulated through samples intentionally created by an attacker. An attacker, by specially selecting an example of input data, can force the model to make mistakes: a slight change in an example of input data will lead to an error in the response of the machine learning model (classifier). A set of examples of model input data on which the model makes mistakes, i.e. misclassifies is called adversarial sampling. The possibility of implementing adversarial attacks poses a potential security threat to a system based on machine learning.

Таким образом, проведение исследований в области состязательного обучения принципиально важно и актуально для обеспечения информационной безопасности. Разработка способов генерации состязательных примеров позволит смоделировать действия потенциального атакующего, которые могут привести к ошибке СОВ и тем самым позволят уклониться от обнаружения атаки, и выработать адекватные защитные механизмы.Thus, conducting research in the field of adversarial learning is fundamentally important and relevant for ensuring information security. Developing methods for generating adversarial examples will make it possible to simulate the actions of a potential attacker that could lead to an IDS error and thereby allow one to evade detection of an attack and develop adequate defense mechanisms.

Известен по патенту US20210067549A1 Способ обнаружения вторжения в компьютерную сеть и реагирования на него, включающий: генерацию набора обучающих состязательных данных, в который входят исходные примеры и состязательные примеры, посредством нарушения одного или нескольких исходных примеров с помощью интегрированной градиентной атаки для генерации состязательных примеров; кодирование исходной и состязательной выборок для генерации соответствующих исходных выборок и состязательных графовых представлений, основанных на агрегировании окрестностей узлов; обучение нейронной сети на основе графов для обнаружения аномальной активности в компьютерной сети с использованием состязательного набора обучающих данных и выполнение действия по обеспечению безопасности в ответ на обнаруженную аномальную активность.Known from US20210067549A1 A method for detecting and responding to an intrusion into a computer network, comprising: generating a set of adversarial training data that includes initial examples and adversarial examples by disrupting one or more initial examples using an integrated gradient attack to generate adversarial examples; encoding the source and adversarial samples to generate corresponding source samples and adversarial graph representations based on aggregation of node neighborhoods; training a graph neural network to detect anomalous activity on a computer network using an adversarial training set and performing a security action in response to the detected anomalous activity.

Основным недостатком данного способа является обобщенное представление способа генерации состязательных примеров, основанное на применении так называемой «интегрированной градиентной атаки». Не раскрываются конкретные действия по получению состязательного примера из исходного примера; не указывается подмножество изменяемых признаков признакового пространства; не учитываются известные ограничения в области генерации признаков сессий сетевого трафика со стороны атакующего, например, невозможность прямого произвольного изменения значений признаков сессий сетевого трафика: длительности сессии, статистики длин пакетов, межпакетных задержек и др.The main disadvantage of this method is a generalized representation of the method for generating adversarial examples, based on the use of the so-called “integrated gradient attack”. The specific steps to obtain an adversarial example from the original example are not disclosed; the subset of variable features of the feature space is not indicated; the well-known limitations in the field of generation of signs of network traffic sessions on the part of the attacker are not taken into account, for example, the impossibility of directly arbitrarily changing the values of signs of network traffic sessions: session duration, statistics of packet lengths, inter-packet delays, etc.

Наиболее близким по технической сущности и выполняемым функциям, выбранный в качестве прототипа, является Способ генерации состязательных примеров для промышленной системы управления по патенту US20210319113A1, включающий следующие шаги:The closest in technical essence and performed functions, chosen as a prototype, is the Method for generating adversarial examples for an industrial control system according to the patent US20210319113A1, including the following steps:

1) прослушивают генератором образцов трафик промышленной системы управления для получения данных, имеющих то же распределение, что и обучающие данные, используемые системой обнаружения вторжений промышленной системы управления, размечают полученные данные и принимают размеченные аномальные данные в качестве исходных примеров атаки;1) listen with a sample generator to the traffic of the industrial control system to obtain data having the same distribution as the training data used by the intrusion detection system of the industrial control system, mark the received data and accept the marked anomalous data as initial examples of the attack;

2) выполняют анализ трафика промышленной системы управления, извлекают признаки, включающие IP-адрес источника, номер порта источника, IP-адрес назначения, номер порта назначения, промежуток времени между пакетами, время передачи пакета и код назначения пакета;2) analyze the traffic of an industrial control system, extract features including the source IP address, source port number, destination IP address, destination port number, time interval between packets, packet transmission time and packet destination code;

3) создают и обучают классификатор машинного обучения на основе признаков, извлеченных на шаге 2;3) create and train a machine learning classifier based on the features extracted in step 2;

4) преобразуют задачу обучения системы обнаружения вторжений промышленной системы управления в задачу оптимизации с использованием классификатора, созданного на шаге 3, и решают задачу оптимизации для получения состязательных примеров, при этом задача оптимизации состоит в следующем:4) transform the problem of training an intrusion detection system of an industrial control system into an optimization problem using the classifier created in step 3, and solve the optimization problem to obtain adversarial examples, and the optimization problem is as follows:

x* = arg min g(x) иx* = arg min g(x) and

d(x*, x0) < dmax,d(x*, x0) < dmax,

где x0 – исходный пример атаки;where x0 is the initial example of the attack;

x* – сформированный состязательный пример;x* – generated adversarial example;

g(x) – вероятность того, что пример x* будет определён как аномальный пример (пример атаки); g(x) is the probability that example x* will be identified as an anomalous example (an example of an attack);

d(x*, x0) – расстояние между состязательным примером и исходным примером атаки;d(x*, x0) – distance between the adversarial example and the original attack example;

dmax – максимальное евклидово расстояние, разрешенное промышленной системой управления, указывающее на то, что состязательный пример не оказывает вредоносного воздействия, если расстояние превышено; dmax is the maximum Euclidean distance allowed by the industrial control system, indicating that the adversarial example does not have a harmful effect if the distance is exceeded;

5) тестируют состязательный пример, созданный на шаге 4, в реальной промышленной системе управления, при этом если состязательный пример успешно обходит систему обнаружения вторжений промышленной системы управления и сохраняет возможность атаки, он принимается в качестве эффективного состязательного примера, а если состязательный пример не может уклониться от системы обнаружения вторжений промышленной системы управления или сохранить возможность атаки, состязательный пример отбрасывают.5) test the adversarial example created in step 4 in a real industrial control system, and if the adversarial example successfully bypasses the intrusion detection system of the industrial control system and retains the ability to attack, it is accepted as an effective adversarial example, and if the adversarial example cannot evade from the intrusion detection system of an industrial control system or to preserve the possibility of attack, the adversarial example is discarded.

Недостатком способа-прототипа является ограниченная вариативность значений признаков выбранного авторами признакового пространства, поскольку этап прослушивания трафика промышленной системы управления выполняется при конкретных настройках сети, а не в широком диапазоне допустимых настроек сети. Следствием поиска состязательных примеров в условиях ограниченной области определения признаков является относительно невысокая вероятность нахождения таких состязательных примеров.The disadvantage of the prototype method is the limited variability of feature values of the feature space chosen by the authors, since the stage of listening to traffic of an industrial control system is performed under specific network settings, and not in a wide range of permissible network settings. The consequence of searching for adversarial examples in a limited area of feature definition is the relatively low probability of finding such adversarial examples.

Техническая проблема заключается в низкой вероятности нахождения состязательных примеров по причине ограниченной вариативности значений выбранных признаков.The technical problem is the low probability of finding adversarial examples due to the limited variability of the values of the selected features.

Техническим результатом является повышение вариативности значений выбранных признаков для повышения вероятности генерации эффективных состязательных примеров, позволяющих уклониться от обнаружения атаки со стороны СОВ. The technical result is to increase the variability of the values of the selected features to increase the likelihood of generating effective adversarial examples that allow you to evade detection of an attack from the IDS.

Техническая проблема решается и технический результат реализуется за счет того, что в способе генерации состязательных примеров для системы обнаружения вторжений промышленной системы управления используют следующую последовательность действий: The technical problem is solved and the technical result is realized due to the fact that the following sequence of actions is used in the method for generating adversarial examples for the intrusion detection system of an industrial control system:

- прослушивают генератором образцов трафика через сетевое устройство с MTU промышленной системы управления для получения данных, имеющих то же распределение, что и обучающие данные, используемые системой обнаружения вторжений промышленной системы управления, размечают полученные данные и принимают размеченные аномальные данные в качестве исходных примеров атаки;- listening with a generator of traffic samples through a network device with the MTU of the industrial control system to obtain data having the same distribution as the training data used by the intrusion detection system of the industrial control system, marking the received data and taking the marked anomalous data as initial examples of the attack;

- далее анализируют трафик промышленной системы управления, извлекают признаки, включающие IP-адрес источника, номер порта источника, IP-адрес назначения, номер порта назначения, промежуток времени между пакетами, время передачи пакета и код назначения пакета;- further analyze the traffic of the industrial control system, extract features including the source IP address, source port number, destination IP address, destination port number, time interval between packets, packet transmission time and packet destination code;

- после чего создают и обучают классификатор машинного обучения на основе признаков, полученных при анализе трафика;- after which a machine learning classifier is created and trained based on the features obtained from traffic analysis;

- преобразуют задачу обучения системы обнаружения вторжений промышленной системы управления в задачу оптимизации с использованием созданного классификатора и решают задачу оптимизации для получения состязательных примеров, при этом задача оптимизации состоит в следующем:- transform the problem of training an intrusion detection system of an industrial control system into an optimization problem using the created classifier and solve the optimization problem to obtain adversarial examples, wherein the optimization problem is as follows:

x* = arg min g(x) иx* = arg min g(x) and

d(x*, x0) < dmax,d(x*, x0) < dmax,

g(x) – вероятность того, что пример x* будет определён как аномальный пример (пример атаки);g(x) is the probability that example x* will be identified as an anomalous example (an example of an attack);

d(x*, x0) – расстояние между состязательным примером и исходным примером атаки; d(x*, x0) – distance between the adversarial example and the original attack example;

dmax – максимальное евклидово расстояние, разрешенное промышленной системой управления, указывающее на то, что состязательный пример не оказывает вредоносного воздействия, если расстояние превышено;dmax is the maximum Euclidean distance allowed by the industrial control system, indicating that the adversarial example does not have a harmful effect if the distance is exceeded;

- затем тестируют созданный состязательный пример в реальной промышленной системе управления и по результату принимают либо не принимают в качестве эффективного состязательного примера.- then the created adversarial example is tested in a real industrial control system and, based on the result, is accepted or not accepted as an effective adversarial example.

Отличительной особенностью способа является то, что полученные данные, имеющие то же распределение, что и обучающие данные, используемые системой обнаружения вторжений промышленной системы управления, дополнительно пропускают через сетевое устройство с MTU, отличным от MTU промышленной системы управления. A distinctive feature of the method is that the received data, having the same distribution as the training data used by the intrusion detection system of the industrial control system, is additionally passed through a network device with an MTU different from the MTU of the industrial control system.

Прохождение сессии сетевого трафика через устройство с MTU, отличным от MTU промышленной системы управления, например, MTU было 1520 байт, а сессия сетевого трафика дополнительно пропускается через устройство с MTU 400 байт, приводит к изменению распределения длин пакетов в сессии, межпакетных задержек, длительности сессии, скорости поступления пакетов, характеристик указанных распределений и др. Таким образом достигается повышение вариативности значений признаков исходных примеров, расширяется область поиска, повышается вероятность нахождения состязательных примеров и эффективных состязательных примеров. При этом важным фактом при реализации заявляемого способа является учет известных ограничений в области генерации признаков сессий сетевого трафика со стороны атакующего, а именно невозможность прямого произвольного изменения значений отдельных признаков сессий сетевого трафика, т.к. модификация значений признаков сессии сетевого трафика атакующим выполняется дополнительным действием, реализуемым на практике – пропуском трафика через сетевое устройство с другим MTU. Passing a session of network traffic through a device with an MTU different from the MTU of an industrial control system, for example, the MTU was 1520 bytes, and a session of network traffic is additionally passed through a device with an MTU of 400 bytes, leads to a change in the distribution of packet lengths in the session, inter-packet delays, and session duration , the rate of arrival of packets, the characteristics of the specified distributions, etc. In this way, an increase in the variability of the values of the attributes of the original examples is achieved, the search area is expanded, and the probability of finding adversarial examples and effective adversarial examples increases. At the same time, an important fact when implementing the proposed method is taking into account known restrictions in the field of generating signs of network traffic sessions on the part of the attacker, namely the impossibility of directly arbitrarily changing the values of individual signs of network traffic sessions, because modification of the values of network traffic session attributes by an attacker is performed by an additional action, implemented in practice - passing traffic through a network device with a different MTU.

Проведенный анализ уровня техники позволил установить, что аналоги, характеризующиеся совокупностями признаков, тождественными всем признакам заявленного способа, отсутствуют. Следовательно, заявленное изобретение соответствует условию патентоспособности «новизна».The analysis of the state of the art made it possible to establish that there are no analogues characterized by sets of features identical to all the features of the claimed method. Consequently, the claimed invention meets the patentability condition of “novelty”.

Перечисленная новая совокупность существенных признаков обеспечивает расширение возможностей способа прототипа за счет того, что сессии сетевого трафика, соответствующие состязательным примерам, дополнительно пропускают через сетевое устройство с MTU, отличным от MTU промышленной системы управления.The listed new set of essential features provides an expansion of the capabilities of the prototype method due to the fact that network traffic sessions corresponding to adversarial examples are additionally passed through a network device with an MTU different from the MTU of the industrial control system.

Результаты поиска известных решений в данной и смежной областях техники с целью выявления признаков, совпадающих с отличительными от прототипов признаками заявленного изобретения, показали, что они не следуют явным образом из уровня техники. Из определенного заявителем уровня техники не выявлена известность влияния предусматриваемых существенными признаками заявленного изобретения на достижение указанного технического результата. Следовательно, заявленное изобретение соответствует условию патентоспособности «изобретательский уровень».The results of a search for known solutions in this and related fields of technology in order to identify features that coincide with the features of the claimed invention that are distinctive from the prototypes, showed that they do not follow explicitly from the prior art. The prior art determined by the applicant does not reveal the impact of the essential features of the claimed invention on achieving the specified technical result. Therefore, the claimed invention meets the patentability requirement of “inventive step”.

«Промышленная применимость» способа обусловлена технической возможностью реализации данного способа.The “industrial applicability” of the method is determined by the technical feasibility of implementing this method.

Заявленный способ поясняется чертежами:The claimed method is illustrated by drawings:

фиг. 1 – блок-схема способа генерации состязательных примеров для системы обнаружения вторжений промышленной системы управления.fig. 1 is a flow diagram of a method for generating adversarial examples for an intrusion detection system of an industrial control system.

фиг. 2 – схема генерации состязательных примеров для системы обнаружения вторжений промышленной системы управления.fig. 2 – scheme for generating adversarial examples for an intrusion detection system of an industrial control system.

В блоке 1 (фиг. 1) прослушивают генератором образцов (фиг. 2, блок 2) трафик промышленной системы управления (фиг. 2, блок 3) для получения данных, имеющих то же распределение, что и обучающие данные, используемые системой обнаружения вторжений промышленной системы управления (фиг. 2, блок 4).In block 1 (Fig. 1), the sample generator (Fig. 2, block 2) listens to the traffic of the industrial control system (Fig. 2, block 3) to obtain data having the same distribution as the training data used by the industrial intrusion detection system control systems (Fig. 2, block 4).

В блоке 2 (фиг. 1) полученные в блоке 1 (фиг. 1) данные дополнительно пропускают через сетевое устройство (фиг. 2, блок 1) с MTU, отличным от MTU промышленной системы управления (фиг. 2, блок 3), в результате чего размеры пакетов этих данных становятся отличными от размеров пакетов в промышленной системе управления.In block 2 (Fig. 1), the data received in block 1 (Fig. 1) is additionally passed through a network device (Fig. 2, block 1) with an MTU different from the MTU of the industrial control system (Fig. 2, block 3), in As a result, the packet sizes of this data become different from the packet sizes in an industrial control system.

В блоке 3 (фиг. 1) размечают полученные данные и принимают размеченные аномальные данные в качестве исходных примеров атаки (фиг. 2, блок 5).In block 3 (Fig. 1), the received data is marked and the marked anomalous data is taken as initial examples of the attack (Fig. 2, block 5).

В блоке 4 (фиг. 1) выполняют анализ трафика промышленной системы управления, извлекают признаки (фиг. 2, блок 6), включающие IP-адрес источника, номер порта источника, IP-адрес назначения, номер порта назначения, промежуток времени между пакетами, время передачи пакета и код назначения пакета.In block 4 (Fig. 1), the traffic of an industrial control system is analyzed, features are extracted (Fig. 2, block 6), including the source IP address, source port number, destination IP address, destination port number, time interval between packets, packet transmission time and packet destination code.

В блоке 5 (фиг. 1) создают и обучают классификатор машинного обучения (фиг. 2, блок 7) на основе признаков (фиг. 2, блок 6), извлеченных в блоке 4 (фиг. 1).In block 5 (Fig. 1), a machine learning classifier (Fig. 2, block 7) is created and trained based on the features (Fig. 2, block 6) extracted in block 4 (Fig. 1).

В блоке 6 (фиг. 1) преобразуют задачу обучения состязательной системы обнаружения вторжений промышленной системы управления в задачу оптимизации с использованием классификатора (фиг. 2, блок 7), созданного в блоке 5 (фиг. 1).In block 6 (Fig. 1), the task of training an adversarial intrusion detection system of an industrial control system is converted into an optimization problem using a classifier (Fig. 2, block 7) created in block 5 (Fig. 1).

В блоке 7 (фиг. 1) решают задачу оптимизации для получения состязательных примеров (фиг. 2, блок 8), при этом задача оптимизации состоит в следующем:In block 7 (Fig. 1), the optimization problem is solved to obtain adversarial examples (Fig. 2, block 8), and the optimization problem is as follows:

x* = arg min g(x) иx* = arg min g(x) and

d(x*, x0) < dmax,d(x*, x0) < dmax,

В блоке 8 (фиг. 1) тестируют состязательный пример (фиг. 2, блок 8), созданный в блоке 7 (фиг. 1), в реальной промышленной системе управления (фиг. 2, блок 3). In block 8 (Fig. 1), an adversarial example (Fig. 2, block 8), created in block 7 (Fig. 1), is tested in a real industrial control system (Fig. 2, block 3).

В блоке 9 (фиг. 1) проверяют условия, обходит ли состязательный пример систему обнаружения вторжений промышленной системы управления и сохраняет ли он возможность атаки. Если условия выполняются, то его принимают (фиг. 1, блок 10) в качестве эффективного состязательного примера (фиг. 2, блок 9), а если условия не выполняются, состязательный пример отбрасывают (фиг. 1, блок 11).In block 9 (Fig. 1), the conditions are checked whether the adversarial example bypasses the intrusion detection system of the industrial control system and whether it retains the possibility of attack. If the conditions are met, then it is accepted (Fig. 1, block 10) as an effective adversarial example (Fig. 2, block 9), and if the conditions are not met, the adversarial example is discarded (Fig. 1, block 11).

Заявленный способ подтвержден примером программы на языке программирования python с достигнутым заявленным результатом. В программе осуществляется построение классификатора и поиск состязательных примеров для данных, полученных на основе трафика промышленной системы управления с MTU, равным 1520 – эти данные соответствуют прототипу, и данных, полученных из трафика промышленной системы управления и дополнительно пропущенных через сетевое устройство с MTU, равным 400 – эти данные соответствуют дополнительному действию в заявленном способе. Затем производится сравнение количества обнаруженных состязательных примеров. Ожидается, что дополнительное действие в заявленном способе позволит увеличить количество обнаруженных эффективных состязательных примеров, и, следовательно, повысить вероятность генерации эффективных состязательных примеров, позволяющих уклониться от обнаружения атаки со стороны СОВ. Указанный результат соответствует заявленному техническому результату.The claimed method is confirmed by an example of a program in the python programming language with the stated result achieved. The program builds a classifier and searches for adversarial examples for data obtained from industrial control system traffic with an MTU of 1520 - this data corresponds to the prototype, and data obtained from industrial control system traffic and additionally passed through a network device with an MTU of 400 – these data correspond to the additional action in the claimed method. The number of adversarial examples detected is then compared. It is expected that the additional action in the claimed method will increase the number of detected effective adversarial examples, and, therefore, increase the likelihood of generating effective adversarial examples that allow one to evade detection of an attack from the IDS. The indicated result corresponds to the declared technical result.

Ниже представлено описанием программы и выполняемых действий.Below is a description of the program and the actions performed.

Загрузка необходимых библиотек:Loading required libraries:

import mathimport math

import pickleimport pickle

from typing import Listfrom typing import List

import numpy as npimport numpy as np

import pandas as pdimport pandas as pd

from sklearn import model_selectionfrom sklearn import model_selection

from sklearn.ensemble import RandomForestClassifierfrom sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, confusion_matrix, , precision_score, recall_scorefrom sklearn.metrics import accuracy_score, confusion_matrix, , precision_score, recall_score

from sklearn.model_selection import train_test_split from sklearn.model_selection import train_test_split

Далее прослушивают трафик через сетевое устройство с MTU промышленной системы управления (1520 байт) для получения данных, имеющих то же распределение, что и обучающие данные, используемые системой обнаружения вторжений промышленной системы управления, размечают полученные данные. В примере размеченные полученные данные записывают в файл 'Results_1520.csv' и затем читают в программе. Метки, соответствующие атакам (признак 'GlobalLabel'), размечены как "Attack", метки, означающие отсутствие атаки – "Benign". На основе этих данных создается датафрейм df_1520:Next, they listen to traffic through the network device with the MTU of the industrial control system (1520 bytes) to obtain data that has the same distribution as the training data used by the intrusion detection system of the industrial control system, and mark the received data. In the example, the marked received data is written to the file 'Results_1520.csv' and then read in the program. Labels corresponding to attacks (the 'GlobalLabel' attribute) are marked as "Attack", labels indicating the absence of an attack are marked as "Benign". Based on this data, a dataframe df_1520 is created:

df_1520 = pd.read_csv('Results_1520.csv')df_1520 = pd.read_csv('Results_1520.csv')

Перевод номинальных значений признака 'GlobalLabel' в датафрейме df_1520 в цифровые ("Attack" переводится в "1", "Benign" – в "0"):Converting the nominal values of the 'GlobalLabel' attribute in the df_1520 dataframe into digital values ("Attack" is translated to "1", "Benign" to "0"):

df_1520['GlobalLabel']df_1520['GlobalLabel']

= df_1520['GlobalLabel'].apply(lambda x: 0 if x == 'Benign' else 1) = df_1520['GlobalLabel'].apply(lambda x: 0 if x == 'Benign' else 1)

Создание признака 'Packet Time', равного 1 / 'Flow Packets/s' и умножение его на 100000 для получения значения, превышающего 1:Making the 'Packet Time' attribute equal to 1 / 'Flow Packets/s' and multiplying it by 100000 to get a value greater than 1:

df_1520['Packet Time'] = df_1520['Flow Packets/s'].apply(lambda x: 100000/x)df_1520['Packet Time'] = df_1520['Flow Packets/s'].apply(lambda x: 100000/x)

Удаление точек в значениях признаков 'Source IP' и 'Destination IP':Removing dots in the values of the 'Source IP' and 'Destination IP' characteristics:

df_1520['Source IP'] = df_1520['Source IP'].apply(lambda x: x.replace('.',''))df_1520['Source IP'] = df_1520['Source IP'].apply(lambda x: x.replace('.',''))

df_1520['Destination IP'] = df_1520['Destination IP'].apply(lambda x: x.replace('.',''))df_1520['Destination IP'] = df_1520['Destination IP'].apply(lambda x: x.replace('.',''))

Выбор признаков:Feature selection:

• IP-адрес источника - 'Source IP'; • source IP address - 'Source IP';

• номер порта источника - 'Source Port'; • source port number - 'Source Port';

• IP-адрес назначения - 'Destination IP'; • Destination IP address - 'Destination IP';

• номер порта назначения -'Destination Port'; • destination port number -'Destination Port';

• промежуток времени между пакетами -'Flow IAT Mean'; • time interval between packets - 'Flow IAT Mean';

• время передачи пакета - 'Packet Time': • packet transmission time - 'Packet Time':

webattack_features = ['Source IP',webattack_features = ['Source IP',

'Source Port', 'Source Port'

'Destination IP', 'Destination IP',

'Destination Port', 'Destination Port',

'Flow IAT Mean', 'Flow IAT Mean',

'Packet Time'] 'Packet Time']

Просмотр выбранных признаков:View selected features:

df_1520[webattack_features]df_1520[webattack_features]

Source IPSource IP Source PortSource Port Destination IPDestination IP Destination PortDestination Port Flow IAT MeanFlow IAT Mean Packet TimePacket Time 00 83149198508314919850 6051460514 17218031721803 443443 1.634757e+051.634757e+05 16155.24471616155.244716 11 83149198508314919850 6051660516 17218031721803 443443 5.279682e+055.279682e+05 50841.37777450841.377774 22 83149198508314919850 6062860628 17218031721803 443443 7.404261e+057.404261e+05 71194.81545771194.815457 33 83149198508314919850 6063060630 17218031721803 443443 1.372046e+061.372046e+06 131239.151936131239.151936 44 83149198508314919850 6078660786 17218031721803 443443 1.007281e+061.007281e+06 95132.08366895132.083668 ...... ...... ...... ...... ...... ...... ...... 1794117941 83149198508314919850 5448054480 17218031721803 443443 3.237820e+043.237820e+04 2698.1833332698.183333 1794217942 83149198508314919850 5465054650 17218031721803 443443 8.489444e+028.489444e+02 80.42631680.426316 1794317943 83149198508314919850 5465254652 17218031721803 443443 5.656823e+055.656823e+05 53590.95778653590.957786 1794417944 83149198508314919850 5473454734 17218031721803 443443 5.591972e+055.591972e+05 52976.57354752976.573547 1794517945 83149198508314919850 5481654816 17218031721803 443443 1.038160e+041.038160e+04 865.133333865.133333

17946 rows × 6 columns 17946 rows × 6 columns

Формирование целевого вектора обучающей выборки:Formation of the target vector of the training sample:

y = df_1520['GlobalLabel'].valuesy = df_1520['GlobalLabel'].values

Формирование матрицы объектов-признаков обучающей выборки:Formation of a matrix of object-features of the training sample:

X = df_1520[webattack_features].valuesX = df_1520[webattack_features].values

Просмотр размерности матрицы объектов-признаков и целевого вектора:View the dimension of the matrix of feature objects and the target vector:

print(X.shape, y.shape)print(X.shape, y.shape)

(17946, 6) (17946,)(17946, 6) (17946,)

Разделение обучающей выборки на обучающую (X_train, y_train) и тестовую (X_test, y_test): Dividing the training set into training (X_train, y_train) and testing (X_test, y_test):

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True, random_state=42)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True, random_state=42)

Построение классификатора типа "случайный лес":Building a random forest classifier:

RFmodel = RandomForestClassifier(max_depth=5, n_estimators=5, max_features=3)RFmodel = RandomForestClassifier(max_depth=5, n_estimators=5, max_features=3)

Обучение классификатора на обучающей выборке:Training the classifier on the training set:

RFmodel.fit(X_train, y_train)RFmodel.fit(X_train, y_train)

Формирование вектора предсказаний модели на тестовой выборке:Formation of a vector of model predictions on a test sample:

y_pred = RFmodel.predict(X_test)y_pred = RFmodel.predict(X_test)

Получение матрицы ошибок модели RFmodel на исходной тестовой выборке:Obtaining the error matrix of the RFmodel on the original test sample:

matrix = confusion_matrix(y_test, y_pred)matrix = confusion_matrix(y_test, y_pred)

print(matrix)print(matrix)

array([[1731, 61],array([[1731, 61],

[ 178, 2517]]) [ 178, 2517]])

Функция, осуществляющая вычисление и вывод метрик качества:A function that calculates and displays quality metrics:

def print_metrics(y_eval: np.ndarray, y_pred: np.ndarray, average: str = 'binary') -> List[float]:def print_metrics(y_eval: np.ndarray, y_pred: np.ndarray, average: str = 'binary') -> List[float]:

accuracy = accuracy_score(y_eval, y_pred)accuracy = accuracy_score(y_eval, y_pred)

precision = precision_score(y_eval, y_pred, average=average)precision = precision_score(y_eval, y_pred, average=average)

recall = recall_score(y_eval, y_pred, average=average)recall = recall_score(y_eval, y_pred, average=average)

f1 = f1_score(y_eval, y_pred, average=average)f1 = f1_score(y_eval, y_pred, average=average)

print('Accuracy =', accuracy)print('Accuracy =', accuracy)

print('Precision =', precision)print('Precision =', precision)

print('Recall =', recall)print('Recall =', recall)

print('F1 =', f1)print('F1 =', f1)

Вывод метрик качества модели RFmodel на исходной тестовой выборке:Output of RFmodel quality metrics on the initial test sample:

print_metrics(y_test, y_pred)print_metrics(y_test, y_pred)

Accuracy = 0.9467350122576331Accuracy = 0.9467350122576331

Precision = 0.9763382467028704Precision = 0.9763382467028704

Recall = 0.9339517625231911Recall = 0.9339517625231911

F1 = 0.954674758202162F1 = 0.954674758202162

Поиск состязательных примеров согласно способу-прототипу.Search for adversarial examples according to the prototype method.

Формирование начального состояния состязательной выборки:Formation of the initial state of the adversarial sample:

X_test_evasion_attack = X_test.copy()X_test_evasion_attack = X_test.copy()

Задача обучения состязательной системы обнаружения вторжений промышленной системы управления формулируется в виде задачи оптимизации согласно способу-прототипу. Из тестовой выборки берутся все сессии с меткой "атака", на которых модель не ошибается. Оптимизационная задача в настоящем примере для простоты решается методом перебора: для каждой сессии перебором изменяется значение признака "Packet Time" (от исходного до исходного+500). Если получилось изменить ответ классификатора, такой пример считается состязательным. Все состязательные примеры выводятся. Изменение значения признака "Packet Time" на практике может осуществляться внесением задержки на физическом уровне, поэтому такое изменение сохраняет действие атаки и обнаруженные представленным образом состязательные примеры являются эффективными состязательными примерами.The task of training an adversarial intrusion detection system for an industrial control system is formulated as an optimization problem according to the prototype method. All sessions labeled “attack” in which the model does not make mistakes are taken from the test sample. For simplicity, the optimization problem in this example is solved by brute force: for each session the value of the “Packet Time” attribute is changed by brute force (from initial to initial+500). If it was possible to change the classifier's answer, such an example is considered adversarial. All adversarial examples are output. Changing the value of the "Packet Time" attribute in practice can be carried out by introducing a delay at the physical layer, so such a change preserves the effect of the attack and the adversarial examples detected in the presented way are effective adversarial examples.

for i in range(0, X_test.shape[0]):for i in range(0, X_test.shape[0]):

if (y_test[i] == 1) and (RFmodel.predict(X_test[[i]]) == 1): if (y_test[i] == 1) and (RFmodel.predict(X_test[[i]]) == 1):

X_test_attack = X_test[[i]] X_test_attack = X_test[[i]]

j = math.ceil(X_test[i, 5]) j = math.ceil(X_test[i, 5])

for Packet_Time in range(j, j + 500): for Packet_Time in range(j, j + 500):

X_test_attack[0, 5] = Packet_Time X_test_attack[0, 5] = Packet_Time

pred = RFmodel.predict(X_test_attack) pred = RFmodel.predict(X_test_attack)

if pred[0] < 1: if pred[0] < 1:

print(i, Packet_Time) print(i, Packet_Time)

X_test_evasion_attack[i, 5] = Packet_Time X_test_evasion_attack[i, 5] = Packet_Time

break break

61 5308361 53083

261 22362261 22362

400 22362400 22362

598 22362598 22362

745 53083745 53083

831 22362831 22362

893 22362893 22362

948 22362948 22362

987 22362987 22362

1003 223621003 22362

1128 223621128 22362

1257 223621257 22362

1475 223621475 22362

1592 530831592 53083

1681 223621681 22362

1827 533431827 53343

1839 223621839 22362

1851 223621851 22362

1887 223621887 22362

2202 28002202 2800

2378 223622378 22362

2421 223622421 22362

2578 223622578 22362

2725 17202725 1720

2851 530832851 53083

2978 223622978 22362

3018 223623018 22362

3033 533433033 53343

3041 223623041 22362

3135 530833135 53083

3247 223623247 22362

3488 530833488 53083

3551 530833551 53083

3555 223623555 22362

3701 28003701 2800

3852 223623852 22362

3918 530833918 53083

4055 223624055 22362

4100 223624100 22362

4117 223624117 22362

4118 530834118 53083

4237 223624237 22362

4265 223624265 22362

4291 223624291 22362

4295 533434295 53343

4377 223624377 22362

4444 223624444 22362

4452 533434452 53343

4463 530834463 53083

Общее количество обнаруженных эффективных состязательных примеров (согласно способу-прототипу) – 49.The total number of detected effective adversarial examples (according to the prototype method) is 49.

Сравнение предсказаний модели для одного из векторов исходной тестовой выборки и состязательного примера:Comparison of model predictions for one of the vectors of the original test sample and the adversarial example:

print("Пример строки исходной тестовой выборки:")print("Example line of initial test sample:")

print(X_test[[61]])print(X_test[[61]])

pred = RFmodel.predict(X_test[[61]])pred = RFmodel.predict(X_test[[61]])

print("Предсказание модели для исходной тестовой выборки: ", pred[0])print("Model prediction for the original test sample: ", pred[0])

print("Состязательный пример:")print("Adversarial example:")

print(X_test_evasion_attack[[61]])print(X_test_evasion_attack[[61]])

y_pred_evasion_attack = RFmodel.predict(X_test_evasion_attack[[61]])y_pred_evasion_attack = RFmodel.predict(X_test_evasion_attack[[61]])

print("Предсказание модели для состязательного примера: ", y_pred_evasion_attack[0])print("Model prediction for adversarial example: ", y_pred_evasion_attack[0])

Пример строки исходной тестовой выборки: Example of a line from the original test sample:

[[' 8314919850' 39372 ' 1721803' 443 557563.11111111 52821.76845083388]] [[' 8314919850' 39372 ' 1721803' 443 557563.11111111 52821.76845083388]]

Предсказание модели для исходной тестовой выборки: 1Model prediction for initial test set: 1

Состязательный пример:Adversarial example:

[[' 8314919850' 39372 ' 1721803' 443 557563.11111111 53083]][[' 8314919850' 39372 ' 1721803' 443 557563.11111111 53083]]

Предсказание модели для состязательного примера: 0Model prediction for adversarial example: 0

Состязательный пример "обманывает" модель: при сохранении свойства атаки классификатор изменил ответ с 1 ("атака") на 0 ("не атака").The adversarial example "fools" the model: while maintaining the attack property, the classifier changed the answer from 1 ("attack") to 0 ("not an attack").

Формирование вектора предсказаний модели на тестовой выборке с добавленными состязательными примерами:Formation of a vector of model predictions on a test sample with added adversarial examples:

y_pred_evasion_attack = RFmodel.predict(X_test_evasion_attack)y_pred_evasion_attack = RFmodel.predict(X_test_evasion_attack)

Получение матрицы ошибок модели RFmodel на тестовой выборке с добавленными состязательными примерами:Obtaining the error matrix of the RFmodel on the test set with added adversarial examples:

matrix = confusion_matrix(y_test, y_pred_evasion_attack)matrix = confusion_matrix(y_test, y_pred_evasion_attack)

matrixmatrix

array([[1731, 61],array([[1731, 61],

[ 227, 2468]]) [227, 2468]])

Вывод метрик качества модели RFmodel на тестовой выборке с добавленными состязательными примерами:Output of RFmodel quality metrics on the test set with added adversarial examples:

print_metrics(y_test, y_pred_evasion_attack)print_metrics(y_test, y_pred_evasion_attack)

Accuracy = 0.9358145754401604Accuracy = 0.9358145754401604

Precision = 0.9758797943851325Precision = 0.9758797943851325

Recall = 0.9157699443413729Recall = 0.9157699443413729

F1 = 0.9448698315467076F1 = 0.9448698315467076

Качество модели снизилось, поскольку состязательные примеры, добавленные в тестовую выборку, "обманывают" модель.The quality of the model has decreased because adversarial examples added to the test set "fool" the model.

Далее выполняются шаги, соответствующие дополнительному действию в заявленном способе.Next, steps are performed corresponding to the additional action in the claimed method.

Полученные из трафика промышленной системы управления данные, имеющие то же распределение, что и обучающие данные, используемые системой обнаружения вторжений промышленной системы управления, дополнительно пропускают через сетевое устройство с MTU, равным 400 байтам, и отличным от MTU промышленной системы управления, равным 1520 байтам. В примере размеченные полученные данные записывают в файл 'Results_400.csv' и затем читают в программе. Метки, соответствующие атакам (признак 'GlobalLabel'), размечены как "Attack", отсутствие атаки – "Benign". На основе этих данных создается датафрейм df_400:Data derived from the industrial control system traffic, having the same distribution as the training data used by the industrial control system intrusion detection system, is further passed through a network device with an MTU of 400 bytes and a different MTU from the industrial control system of 1520 bytes. In the example, the marked received data is written to the file 'Results_400.csv' and then read in the program. Labels corresponding to attacks (the 'GlobalLabel' attribute) are marked as "Attack", the absence of an attack is marked as "Benign". Based on this data, a dataframe df_400 is created:

df_400 = pd.read_csv('Results_400.csv')df_400 = pd.read_csv('Results_400.csv')

Перевод номинальных значений признака 'GlobalLabel' в датафрейме df_400 в цифровые ("Attack" переводится в "1", "Benign"- в "0"):Converting the nominal values of the 'GlobalLabel' attribute in the df_400 dataframe into digital values ("Attack" is translated to "1", "Benign" - to "0"):

df_400['GlobalLabel'] = df_400['GlobalLabel'].apply(lambda x: 0 if x == 'Benign' else 1)df_400['GlobalLabel'] = df_400['GlobalLabel'].apply(lambda x: 0 if x == 'Benign' else 1)

df_400['Packet Time'] = df_400['Flow Packets/s'].apply(lambda x: 100000/x)df_400['Packet Time'] = df_400['Flow Packets/s'].apply(lambda x: 100000/x)

df_400['Source IP'] = df_400['Source IP'].apply(lambda x: x.replace('.',''))df_400['Source IP'] = df_400['Source IP'].apply(lambda x: x.replace('.',''))

df_400['Destination IP'] = df_400['Destination IP'].apply(lambda x: x.replace('.',''))df_400['Destination IP'] = df_400['Destination IP'].apply(lambda x: x.replace('.',''))

Просмотр признаков webattack_features датафрейма df_400:View the webattack_features of the df_400 dataframe:

df_400[webattack_features]df_400[webattack_features]

Source IPSource IP Source PortSource Port Destination IPDestination IP Destination PortDestination Port Flow IAT MeanFlow IAT Mean Packet TimePacket Time 00 83149198508314919850 6051460514 17218031721803 443443 1.634757e+051.634757e+05 16155.24471616155.244716 11 83149198508314919850 6051660516 17218031721803 443443 5.279682e+055.279682e+05 50841.37777450841.377774 22 83149198508314919850 6062860628 17218031721803 443443 7.404261e+057.404261e+05 71194.81545771194.815457 33 83149198508314919850 6063060630 17218031721803 443443 1.372046e+061.372046e+06 131239.151936131239.151936 44 83149198508314919850 6078660786 17218031721803 443443 1.007281e+061.007281e+06 95132.08366895132.083668 ...... ...... ...... ...... ...... ...... ...... 97259725 83149198508314919850 5448054480 17218031721803 443443 3.237820e+043.237820e+04 2698.1833332698.183333 97269726 83149198508314919850 5465054650 17218031721803 443443 8.489444e+028.489444e+02 80.42631680.426316 97279727 83149198508314919850 5465254652 17218031721803 443443 5.656823e+055.656823e+05 53590.95778653590.957786 97289728 83149198508314919850 5473454734 17218031721803 443443 5.591972e+055.591972e+05 52976.57354752976.573547 97299729 83149198508314919850 5481654816 17218031721803 443443 1.038160e+041.038160e+04 865.133333865.133333

9730 rows × 6 columns 9730 rows × 6 columns

y_400 = df_400['GlobalLabel'].valuesy_400 = df_400['GlobalLabel'].values

X_400 = df_400[webattack_features].valuesX_400 = df_400[webattack_features].values

print(X_400.shape, y_400.shape)print(X_400.shape, y_400.shape)

(9730, 6) (9730,)(9730, 6) (9730,)

Разделение обучающей выборки на обучающую (X_400_train, y_400_train) и тестовую (X_400_test, y_400_test): Dividing the training set into training (X_400_train, y_400_train) and testing (X_400_test, y_400_test):

X_400_train, X_400_test, y_400_train, y_400_test = train_test_split(X_400, y_400, test_size=0.25, shuffle=True, random_state=42)X_400_train, X_400_test, y_400_train, y_400_test = train_test_split(X_400, y_400, test_size=0.25, shuffle=True, random_state=42)

RFmodel_400 = RandomForestClassifier(max_depth=5, n_estimators=5, max_features=3)RFmodel_400 = RandomForestClassifier(max_depth=5, n_estimators=5, max_features=3)

RFmodel_400.fit(X_400_train, y_400_train)RFmodel_400.fit(X_400_train, y_400_train)

y_400_pred = RFmodel_400.predict(X_400_test)y_400_pred = RFmodel_400.predict(X_400_test)

Получение матрицы ошибок модели RFmodel_400 на исходной тестовой выборке:Obtaining the error matrix of the RFmodel_400 model on the initial test sample:

matrix = confusion_matrix(y_400_test, y_400_pred)matrix = confusion_matrix(y_400_test, y_400_pred)

print(matrix)print(matrix)

array([[1167, 64],array([[1167, 64],

[ 28, 1174]]) [28, 1174]])

Вывод метрик качества модели RFmodel_400 на исходной тестовой выборке:Output of quality metrics for the RFmodel_400 model on the initial test sample:

print_metrics(y_400_test, y_400_pred)print_metrics(y_400_test, y_400_pred)

Accuracy = 0.9621866009042335Accuracy = 0.9621866009042335

Precision = 0.9483037156704361Precision = 0.9483037156704361

Recall = 0.9767054908485857Recall = 0.9767054908485857

F1 = 0.962295081967213F1 = 0.962295081967213

Поиск состязательных примеров.Finding adversarial examples.

X_400_test_evasion_attack = X_400_test.copy()X_400_test_evasion_attack = X_400_test.copy()

Из тестовой выборки берутся все сессии с меткой "атака", на которых модель не ошибается. Для каждой сессии перебором изменяется значение признака "Packet Time" (от исходного до исходного+500). Если получилось изменить ответ классификатора, такой пример считается состязательным. Все состязательные примеры выводятся.All sessions labeled “attack” in which the model does not make mistakes are taken from the test sample. For each session, the value of the "Packet Time" attribute is changed by brute force (from initial to initial+500). If it was possible to change the classifier's answer, such an example is considered adversarial. All adversarial examples are output.

for i in range(0, X_400_test.shape[0]):for i in range(0, X_400_test.shape[0]):

if (y_400_test[i] == 1) and (RFmodel_400.predict(X_400_test[[i]]) == 1): if (y_400_test[i] == 1) and (RFmodel_400.predict(X_400_test[[i]]) == 1):

X_400_test_attack = X_400_test[[i]] X_400_test_attack = X_400_test[[i]]

j = math.ceil(X_400_test[i, 5]) j = math.ceil(X_400_test[i, 5])

for Packet_Time in range(j, j + 500): for Packet_Time in range(j, j + 500):

X_400_test_attack[0, 5] = Packet_Time X_400_test_attack[0, 5] = Packet_Time

pred = RFmodel_400.predict(X_400_test_attack) pred = RFmodel_400.predict(X_400_test_attack)

if pred[0] < 1: if pred[0] < 1:

print(i, Packet_Time) print(i, Packet_Time)

X_400_test_evasion_attack[i, 5] = Packet_Time X_400_test_evasion_attack[i, 5] = Packet_Time

break break

40 2665840 26658

117 26658117 26658

186 26658186 26658

217 26658217 26658

231 26658231 26658

280 26658280 26658

321 26658321 26658

353 26658353 26658

355 2981355 2981

475 26658475 26658

500 26658500 26658

525 53758525 53758

552 26658552 26658

571 2252571 2252

581 26658581 26658

637 26658637 26658

652 53758652 53758

689 26658689 26658

720 2252720 2252

736 26658736 26658

934 26658934 26658

935 2981935 2981

941 26658941 26658

1032 266581032 26658

1184 266581184 26658

1226 266581226 26658

1229 266581229 26658

1243 22521243 2252

1287 266581287 26658

1291 266581291 26658

1605 266581605 26658

1644 266581644 26658

1655 266581655 26658

1704 266581704 26658

1778 29811778 2981

1799 266581799 26658

1856 266581856 26658

1868 266581868 26658

1889 266581889 26658

1916 266581916 26658

1953 266581953 26658

1975 266581975 26658

1984 266581984 26658

1991 266581991 26658

2014 266582014 26658

2017 35542017 3554

2039 266582039 26658

2091 266582091 26658

2187 266582187 26658

2197 266582197 26658

2265 266582265 26658

2278 266582278 26658

2320 266582320 26658

2327 266582327 26658

2331 266582331 26658

2353 266582353 26658

2365 266582365 26658

2371 266582371 26658

2410 266582410 26658

2425 266582425 26658

Общее количество обнаруженных эффективных состязательных примеров после выполнения дополнительного действия заявленного способа – 60.The total number of effective adversarial examples detected after performing the additional action of the claimed method is 60.

print(X_400_test[[40]])print(X_400_test[[40]])

pred_400 = RFmodel_400.predict(X_400_test[[40]])pred_400 = RFmodel_400.predict(X_400_test[[40]])

print("Предсказание модели для исходной тестовой выборки: ", pred_400[0])print("Model prediction for the original test sample: ", pred_400[0])

print(X_400_test_evasion_attack[[40]])print(X_400_test_evasion_attack[[40]])

y_400_pred_evasion_attack = RFmodel_400.predict(X_400_test_evasion_attack[[40]])y_400_pred_evasion_attack = RFmodel_400.predict(X_400_test_evasion_attack[[40]])

print("Предсказание модели для состязательного примера: ", y_400_pred_evasion_attack[0])print("Model prediction for adversarial example: ", y_400_pred_evasion_attack[0])

Пример строки исходной тестовой выборки:Example of a line from the original test sample:

[[' 8314919850' 42692 ' 1721803' 443 280166.66666667 26542.105274403366]][[' 8314919850' 42692 ' 1721803' 443 280166.66666667 26542.105274403366]]

Состязательный пример:Adversarial example:

[[' 8314919850' 42692 ' 1721803' 443 280166.66666667 26658]][[' 8314919850' 42692 ' 1721803' 443 280166.66666667 26658]]

y_400_pred_evasion_attack = RFmodel_400.predict(X_400_test_evasion_attack)y_400_pred_evasion_attack = RFmodel_400.predict(X_400_test_evasion_attack)

Получение матрицы ошибок модели RFmodel_400 на тестовой выборке с добавленными состязательными примерами:Obtaining the error matrix of the RFmodel_400 model on the test set with added adversarial examples:

matrix = confusion_matrix(y_400_test, y_400_pred_evasion_attack)matrix = confusion_matrix(y_400_test, y_400_pred_evasion_attack)

print(matrix)print(matrix)

array([[1167, 64],array([[1167, 64],

[ 88, 1114]]) [88, 1114]])

Вывод метрик качества модели RFmodel_400 на тестовой выборке с добавленными состязательными примерами:Output of quality metrics of the RFmodel_400 model on the test set with added adversarial examples:

print_metrics(y_400_test, y_400_pred_evasion_attack)print_metrics(y_400_test, y_400_pred_evasion_attack)

Accuracy = 0.9375256884504727Accuracy = 0.9375256884504727

Precision = 0.9456706281833617Precision = 0.9456706281833617

Recall = 0.9267886855241264Recall = 0.9267886855241264

F1 = 0.9361344537815126F1 = 0.9361344537815126

Выводы. Conclusions.

При генерации состязательных примеров согласно способу-прототипу обнаружено 49 эффективных состязательных примеров. После выполнения дополнительного действия заявленного способа, а именно дополнительного пропуска трафика через сетевое устройство с MTU, отличным от MTU промышленной системы управления, удалось обнаружить ещё 60 эффективных состязательных примеров. Общее количество обнаруженных эффективных состязательных примеров в заявленном способе составило 109 примеров.When generating adversarial examples according to the prototype method, 49 effective adversarial examples were found. After performing an additional action of the claimed method, namely additionally passing traffic through a network device with an MTU different from the MTU of the industrial control system, it was possible to detect another 60 effective adversarial examples. The total number of detected effective adversarial examples in the claimed method was 109 examples.

Дополнительное действие в заявленном способе – пропуск трафика через сетевое устройство с MTU, отличным от MTU промышленной системы управления – приводит к большей вариативности значений признаков исходных примеров, расширяет область поиска, повышает вероятность нахождения состязательных примеров и эффективных состязательных примеров.An additional action in the claimed method - passing traffic through a network device with an MTU different from the MTU of the industrial control system - leads to greater variability in the feature values of the original examples, expands the search area, and increases the likelihood of finding adversarial examples and effective adversarial examples.

Claims

A method for generating adversarial examples for an intrusion detection system of an industrial control system, including the following sequences of actions:

listen with a sample generator to traffic through a network device with a maximum useful data unit of one packet (MTU), which is transmitted by a protocol without fragmentation of an industrial control system to obtain data that has the same distribution as the training data used by the intrusion detection system of an industrial control system, mark the received data and take the labeled anomalous data as initial attack examples;

perform traffic analysis of an industrial control system, extract features including source IP address, source port number, destination IP address, destination port number, time interval between packets, packet transmission time and packet destination code;

create and train a machine learning classifier based on features obtained from traffic analysis;

transform the problem of training an intrusion detection system of an industrial control system into an optimization problem using the created classifier and solve the optimization problem to obtain adversarial examples, wherein the optimization problem is as follows:

x* = arg min g(x) and

d(x*, x0) < dmax,

where x0 is the initial example of the attack;

x* – generated adversarial example;

g(x) is the probability that example x* will be identified as an anomalous example (an example of an attack);

d(x*, x0) – distance between the adversarial example and the original attack example;

dmax is the maximum Euclidean distance allowed by the industrial control system, indicating that the adversarial example does not have a harmful effect if the distance is exceeded;

testing the created adversarial example in a real industrial control system and, based on the result, accepting or not accepting it as an effective adversarial example;

characterized in that the received data having the same distribution as the training data used by the intrusion detection system of the industrial control system is further passed through a network device with an MTU different from the MTU of the industrial control system.