RU2821220C1

RU2821220C1 - Method and system for eliminating vulnerabilities in program code

Info

Publication number: RU2821220C1
Application number: RU2023112128A
Authority: RU
Inventors: Кирилл Евгеньевич Вышегородцев; Александр Михайлович Кузьмин
Original assignee: Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк)
Filing date: 2023-05-11
Publication date: 2024-06-18

Abstract

FIELD: physics.

SUBSTANCE: invention relates to computer engineering. Computer-implemented method of eliminating vulnerabilities in a program code, carried out using at least one processor and comprising steps of: obtaining data on vulnerabilities in program code; obtaining data containing at least a source code unit and a type of detected vulnerability; source code containing the vulnerability is transformed into an abstract syntax tree (AST), where internal vertices are associated with programming language operators, and leaves with corresponding operands; forming a path for bypassing vertices in the AST; generating an ordered sequence and processing it using a machine learning coding model; method includes processing a matrix of hidden sequences using a generative machine learning model trained on matrices of hidden states of sequences for vulnerabilities, during which a new ordered sequence is obtained, corresponding to the source program code, but with the vulnerability eliminated; ordered sequence obtained at the previous step is converted into a source code block.

EFFECT: higher security of software due to elimination of vulnerabilities in program code.

8 cl, 2 dwg

Description

ОБЛАСТЬ ТЕХНИКИTECHNICAL FIELD

[0001] Заявленное техническое решение в общем относится к области вычислительной техники, а в частности к автоматизированному способу и системе устранения уязвимостей в программном коде с помощью алгоритмов машинного обучения.[0001] The claimed technical solution generally relates to the field of computer technology, and in particular to an automated method and system for eliminating vulnerabilities in program code using machine learning algorithms.

УРОВЕНЬ ТЕХНИКИBACKGROUND OF THE ART

[0002] С развитием информационных технологий IT-решения (information technology - информационные технологии) стали оказывать существенное влияние на все сферы и отрасли жизнедеятельности. В настоящее время различные компании и организации активно внедряют и используют в своей структуре IТ-решения.[0002] With the development of information technology, IT solutions (information technology) began to have a significant impact on all spheres and sectors of life. Currently, various companies and organizations are actively implementing and using IT solutions in their structure.

[0003] Разработка программного обеспечения для крупных финансовых организаций (например, банков) всегда трудоемкий и кропотливый труд. Кроме того, при разработке программного продукта необходимо учесть все риски возникновения уязвимостей в программном коде. Для данных проверок привлекаются эксперты по кибербезопасности, которые вручную проверяют наличие уязвимостей в программном коде в разрабатываемом продукте, что многократно увеличивает время проверки, а также не исключает человеческого фактора.[0003] Software development for large financial organizations (for example, banks) is always labor-intensive and painstaking work. In addition, when developing a software product, it is necessary to take into account all the risks of vulnerabilities in the program code. For these checks, cybersecurity experts are involved, who manually check for vulnerabilities in the software code of the product being developed, which greatly increases the verification time and also does not exclude the human factor.

[0004] Из уровня техники известен патент US 8631384 В2 "Creating a test progression plan", патентообладатель: IBM, опубл. 01.12.2011. В данном решении описывается автоматизированный процесс составления планов тестирования программных продуктов. Известное решение обеспечивает автоматическое создание плана выполнения теста программного обеспечения путем вычисления для каждой единицы периода тестирования х усилий по выполнению тестовых блоков АТТх и усилий по завершению выполнения тестового блока ССх. В вычислении вводятся три переменные, характеризующие стратегию тестирования: эффективность, которая представляет эффективность группы тестирования, коэффициент плотности дефектов и значение коэффициента проверки. Выбирая стратегию тестирования, менеджер тестов определяет значения трех переменных, которые влияют на план развития. Во время выполнения теста кумулятивная кривая «попытка» значений АТТх и кумулятивная кривая «завершение» значений ССх позволяют менеджеру тестирования сравнить уже предпринятые усилия с ожидаемыми усилиями, предпринятыми для испытательных блоков, которые были предприняты и для испытательных единиц, которые были закончены, то есть, когда дефекты, найденные в коде, были исправлены.[0004] Patent US 8631384 B2 "Creating a test progression plan" is known from the prior art, patent holder: IBM, publ. 01.12.2011. This solution describes the automated process of drawing up test plans for software products. The known solution provides automatic creation of a software test execution plan by calculating for each unit of the testing period x the effort to execute the ATx test blocks and the effort to complete the execution of the CCx test block. The calculation introduces three variables that characterize the testing strategy: efficiency, which represents the effectiveness of the test group, the defect density ratio, and the value of the inspection ratio. When choosing a test strategy, the test manager determines the values of three variables that influence the development plan. During test execution, the cumulative "attempt" curve of ATx values and the cumulative "completion" curve of CCx values allow the test manager to compare the effort already made with the expected effort taken for test units that were attempted and for test units that were completed, i.e. when defects found in the code have been fixed.

[0005] Недостатком известных решений в данной области техники является отсутствие возможности автоматизированного устранения уязвимостей в программном коде.[0005] A disadvantage of known solutions in this field of technology is the lack of the ability to automatically eliminate vulnerabilities in program code.

РАСКРЫТИЕ ИЗОБРЕТЕНИЯDISCLOSURE OF INVENTION

[0006] В заявленном техническом решении предлагается новый подход к устранению уязвимостей в программном коде. В данном решении используется алгоритм машинного обучения, который позволяет автоматизировать процесс проверки программного кода и значительно ускорить процесс устранения уязвимостей в программном коде.[0006] The claimed technical solution proposes a new approach to eliminating vulnerabilities in program code. This solution uses a machine learning algorithm that automates the process of checking program code and significantly speeds up the process of eliminating vulnerabilities in program code.

[0007] Таким образом, решается техническая проблема автоматизированного устранения уязвимостей в программном коде.[0007] Thus, the technical problem of automated elimination of vulnerabilities in program code is solved.

[0008] Техническим результатом, достигающимся при решении данной проблемы, является повышение безопасности программного обеспечения за счет устранения уязвимостей в программном коде.[0008] The technical result achieved by solving this problem is to increase software security by eliminating vulnerabilities in the program code.

[0009] Указанный технический результат достигается благодаря осуществлению компьютерно-реализуемого способа устранения уязвимостей в программном коде, выполняемый с помощью по меньшей мере одного процессора и содержащий этапы, на которых:[0009] The specified technical result is achieved through the implementation of a computer-implemented method for eliminating vulnerabilities in program code, executed using at least one processor and containing stages in which:

- получают данные об уязвимостях в программном коде;- receive data on vulnerabilities in the program code;

- получают на основе предыдущего этапа данные, содержащие по меньшей мере блок исходного кода и тип обнаруженной уязвимости;- obtain, based on the previous stage, data containing at least a block of source code and the type of vulnerability detected;

- преобразуют исходный код, содержащий уязвимость в дерево абстрактного синтаксиса (AST), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами;- convert the source code containing the vulnerability into an abstract syntax tree (AST), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands;

- формируют путь обхода вершин в AST;- form a path to traverse the vertices in AST;

- формируют упорядоченную последовательность, которая представляет собой типы каждого элемента пути;- form an ordered sequence that represents the types of each element of the path;

- осуществляют обработку упорядоченной последовательности с помощью кодирующей модели машинного обучения, обученной на упорядоченных последовательностях данных об уязвимостях, в ходе которой получают представление упорядоченной последовательности в виде матрицы скрытых состояний;- process the ordered sequence using a machine learning encoding model trained on ordered sequences of vulnerability data, during which a representation of the ordered sequence is obtained in the form of a matrix of hidden states;

- осуществляют обработку матрицы скрытых последовательностей с помощью генеративной модели машинного обучения, обученной на матрицах скрытых состояний последовательностей для уязвимостей, последовательностей без уязвимостей и их исходных кодах, в ходе которой получают новую упорядоченную последовательность, соответствующую исходному программному коду по выполняемым функциям, но с устраненной в нем уязвимостью;- process the matrix of hidden sequences using a generative machine learning model trained on matrices of hidden states of sequences for vulnerabilities, sequences without vulnerabilities and their source codes, during which a new ordered sequence is obtained that corresponds to the source program code for the functions performed, but with the functions eliminated without vulnerability;

- переводят полученную на предыдущем этапе упорядоченную последовательность в блок исходного кода, эквивалентный по функционалу изначальному блоку исходного кода.- translate the ordered sequence obtained at the previous stage into a source code block that is functionally equivalent to the original source code block.

[0010] В одном из частных вариантов реализации способа упорядоченная последовательность, которая представляет собой типы каждого элемента пути, является многомерной и представляет собой граф.[0010] In one of the particular embodiments of the method, the ordered sequence, which represents the types of each element of the path, is multidimensional and represents a graph.

[0011] В другом частном варианте реализации способа преобразуют исходный код, содержащий уязвимость в граф потока управления (CFG - control flow graph), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0011] In another particular embodiment of the method, the source code containing the vulnerability is converted into a control flow graph (CFG) in which internal vertices are mapped to programming language operators, and leaves are mapped to the corresponding operands.

[0012] В другом частном варианте реализации способа преобразуют исходный код, содержащий уязвимость в граф зависимостей управления (CDG - Control Dependence Graphs), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0012] In another particular embodiment of the method, the source code containing the vulnerability is converted into a control dependency graph (CDG - Control Dependence Graphs), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands.

[0013] В другом частном варианте реализации способа преобразуют исходный код, содержащий уязвимость в граф зависимости данных (DDG - Data Dependence Graph), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0013] In another particular embodiment of the method, the source code containing the vulnerability is converted into a data dependency graph (DDG - Data Dependence Graph), in which internal vertices are mapped to programming language operators, and leaves are mapped to the corresponding operands.

[0014] В другом частном варианте реализации способа преобразуют исходный код, содержащий уязвимость в граф зависимости программы (PDG - Program Dependence graphs), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0014] In another particular embodiment of the method, the source code containing the vulnerability is converted into a program dependency graph (PDG - Program Dependence graphs), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands.

[0015] В другом частном варианте реализации способа преобразуют исходный код, содержащий уязвимость в граф свойств кода (CPG - Code Property Graphs), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0015] In another particular embodiment of the method, the source code containing the vulnerability is converted into a code property graph (CPG - Code Property Graphs), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands.

[0016] Кроме того, заявленный технический результат достигается за счет системы устранения уязвимостей в программном коде, содержащей:[0016] In addition, the claimed technical result is achieved through a system for eliminating vulnerabilities in program code, containing:

- по меньшей мере один процессор;- at least one processor;

- по меньшей мере одну память, соединенную с процессором, которая содержит машиночитаемые инструкции, которые при их выполнении по меньшей мере одним процессором обеспечивают выполнение заявленного способа.- at least one memory connected to the processor, which contains machine-readable instructions that, when executed by at least one processor, ensure the execution of the claimed method.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF THE DRAWINGS

[0017] Фиг. 1 иллюстрирует блок-схему заявленного способа.[0017] FIG. 1 illustrates a block diagram of the claimed method.

[0018] Фиг. 2 иллюстрирует пример общего вида вычислительной системы, которая обеспечивает реализацию заявленного решения.[0018] FIG. 2 illustrates an example of a general view of a computing system that ensures the implementation of the claimed solution.

ОСУЩЕСТВЛЕНИЕ ИЗОБРЕТЕНИЯIMPLEMENTATION OF THE INVENTION

[0001] Ниже будут описаны понятия и термины, необходимые для понимания данного технического решения.[0001] The concepts and terms necessary to understand this technical solution will be described below.

[0002] Модель в машинном обучении (МО) - совокупность методов искусственного интеллекта, характерной чертой которых является не прямое решение задачи, а обучение в процессе применения решений множества сходных задач.[0002] A model in machine learning (ML) is a set of artificial intelligence methods, the characteristic feature of which is not the direct solution of a problem, but learning in the process of applying solutions to many similar problems.

[0003] Уязвимость в программном обеспечении - недостаток в системе, используя который, можно намеренно нарушить ее целостность и вызвать неправильную работу. Уязвимость может быть результатом ошибок программирования, недостатков, допущенных при проектировании системы, ненадежных паролей, вирусов и других вредоносных программ, скриптовых и SQL-инъекций. Уязвимости могут быть неэксплуатируемыми и эксплуатируемыми.[0003] A software vulnerability is a flaw in a system that, when exploited, can intentionally compromise its integrity and cause incorrect operation. Vulnerabilities can be the result of programming errors, flaws in system design, weak passwords, viruses and other malware, script and SQL injections. Vulnerabilities can be unexploitable or exploitable.

[0004] Эксплойт - компьютерная программа, фрагмент программного кода или последовательность команд, использующие уязвимости в программном обеспечении и применяемые для проведения атаки на вычислительную систему.[0004] Exploit - a computer program, a piece of program code or a sequence of commands that takes advantage of vulnerabilities in software and is used to carry out an attack on a computer system.

[0005] Эксплуатируемая уязвимость - уязвимость в программном обеспечении, для которой может быть создан и применен эксплойт.[0005] An exploitable vulnerability is a vulnerability in software for which an exploit can be created and applied.

[0006] AST - абстрактное синтаксическое дерево. Конечное помеченное ориентированное дерево, в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0006] AST is an abstract syntax tree. A finite labeled directed tree in which the internal vertices are associated with programming language operators and the leaves are associated with corresponding operands.

[0007] F-1 мера представляет собой совместную оценку точности и полноты.[0007] The F-1 measure is a joint assessment of precision and recall.

[0008] ROC-кривая - графическая характеристика качества бинарного классификатора, отражающая зависимость доли истинно-положительных классификаций от доли ложно-положительных классификаций при варьировании порога решающего правила.[0008] ROC curve is a graphical characteristic of the quality of a binary classifier, reflecting the dependence of the proportion of true-positive classifications on the proportion of false-positive classifications when varying the threshold of the decision rule.

[0009] Матрица ошибок - это способ разбить классифицируемые объекты на четыре категории в зависимости от комбинации реального класса и ответа классификатора.[0009] An error matrix is a way to break classifiable objects into four categories depending on the combination of the actual class and the classifier response.

[0010] Коннекторы - программные компоненты, осуществляющие сбор данных от источников информации (Система управления задачами /Система для совместной работы над релизами /Система управления версиями /Система управления проектами /Система управления сервисами предприятия /и др.) и приведение данных к необходимым структуре и формату.[0010] Connectors are software components that collect data from information sources (task management system / release collaboration system / version control system / project management system / enterprise service management system / etc.) and bring the data to the necessary structure and format.

[0011] Хранилище - система для хранения больших объемов собранных и обработанных коннекторами данных, а также генерируемой иными компонентами системы.[0011] Storage - a system for storing large volumes of data collected and processed by connectors, as well as data generated by other system components.

[0012] Данное техническое решение может быть реализовано на компьютере, в виде автоматизированной информационной системы (АИС) или машиночитаемого носителя, содержащего инструкции для выполнения вышеупомянутого способа.[0012] This technical solution can be implemented on a computer, in the form of an automated information system (AIS) or a machine-readable medium containing instructions for performing the above method.

[0013] Техническое решение может быть реализовано в виде распределенной компьютерной системы.[0013] The technical solution can be implemented in the form of a distributed computer system.

[0014] В данном решении под системой подразумевается компьютерная система, ЭВМ (электронно-вычислительная машина), ЧПУ (числовое программное управление), ПЛК (программируемый логический контроллер), компьютеризированные системы управления и любые другие устройства, способные выполнять заданную, четко определенную последовательность вычислительных операций (действий, инструкций).[0014] In this solution, a system means a computer system, a computer (computer), CNC (computer numerical control), PLC (programmable logic controller), computerized control systems and any other devices capable of performing a given, well-defined sequence of computing operations (actions, instructions).

[0015] Под устройством обработки команд подразумевается электронный блок либо интегральная схема (микропроцессор), исполняющая машинные инструкции (программы)/[0015] A command processing device means an electronic unit or an integrated circuit (microprocessor) that executes machine instructions (programs)/

[0016] Устройство обработки команд считывает и выполняет машинные инструкции (программы) с одного или более устройства хранения данных, например, таких устройств, как оперативно запоминающие устройства (ОЗУ) и/или постоянные запоминающие устройства (ПЗУ). В качестве ПЗУ могут выступать, но, не ограничиваясь, жесткие диски (HDD), флеш-память, твердотельные накопители (SSD), оптические носители данных (CD, DVD, BD, MD и т.п.) и др.[0016] A command processing device reads and executes machine instructions (programs) from one or more data storage devices, such as devices such as random access memory (RAM) and/or read only memory (ROM). ROM can be, but is not limited to, hard drives (HDD), flash memory, solid-state drives (SSD), optical storage media (CD, DVD, BD, MD, etc.), etc.

[0017] Программа - последовательность инструкций, предназначенных для исполнения устройством управления вычислительной машины или устройством обработки команд.[0017] Program - a sequence of instructions intended for execution by a computer control device or command processing device.

[0018] Подготовка данных для обучения.[0018] Preparing training data.

[0019] Обучение модели проводилось на исторических данных об уязвимостях программного обеспечения, размеченных на 2 класса:[0019] The model was trained on historical data on software vulnerabilities, divided into 2 classes:

код с уязвимостью (класс 1),code with vulnerability (class 1),

код без уязвимости (класс 0).code without vulnerability (class 0).

[0020] Обучение модели МО для каждого типа уязвимости производится на заранее размеченных данных. Всего было доступно на момент создания модели от 2178 до 55949 уязвимостей в зависимости от типа уязвимости, обнаруженных в заданный временной диапазон, например, 1-3 месяца.[0020] The ML model for each type of vulnerability is trained on pre-labeled data. In total, at the time of creation of the model, from 2178 to 55949 vulnerabilities were available, depending on the type of vulnerability, discovered in a given time range, for example, 1-3 months.

[0021] В обучающей выборке использовались исключительно уникальные уязвимости, количество которых варьируется от 153 до 2147 в зависимости от типа уязвимости. Для оценки качества модели набор данных был разбит на 2 части: тренировочную и контрольную выборки. Разбиение происходило случайным образом в отношении 70% на тренировочную выборку и 30% на контрольную выборку.[0021] The training set used exclusively unique vulnerabilities, the number of which varies from 153 to 2147 depending on the type of vulnerability. To assess the quality of the model, the data set was divided into 2 parts: training and control samples. The split occurred randomly with a ratio of 70% to the training set and 30% to the control set.

[0022] Взвешенная f-1 мера для всех классификаторов в среднем составляет около 0.89.[0022] The weighted f-1 measure for all classifiers averages about 0.89.

[0023] Как показано на Фиг. 1 компьютерно-реализуемый способ устранения уязвимостей в программном коде (100) состоит из нескольких этапов, выполняемых по меньшей мере одним процессором.[0023] As shown in FIG. 1, a computer-implemented method for addressing vulnerabilities in software code (100) consists of several steps performed by at least one processor.

[0024] На этапе (101) получают данные об уязвимостях в программном коде.[0024] At step (101), data about vulnerabilities in the program code is obtained.

[0025] На данном этапе получают данные об уязвимостях. В одном из частных вариантов изобретения эти данные получены в результате сканирования программного кода с помощью инструмента SAST (Static Application Security Testing).[0025] At this stage, data on vulnerabilities is obtained. In one of the private embodiments of the invention, this data is obtained as a result of scanning program code using the SAST (Static Application Security Testing) tool.

[0026] Далее на этапе (102) получают на основе предыдущего этапа данные, содержащие по меньшей мере файл исходного кода и тип обнаруженной уязвимости.[0026] Next, at step (102), based on the previous step, data containing at least a source code file and the type of vulnerability detected is obtained.

[0027] Далее на этапе (103) преобразуют исходный код, содержащий уязвимость в дерево абстрактного синтаксиса (AST), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0027] Next, at step (103), the source code containing the vulnerability is converted into an abstract syntax tree (AST), in which internal nodes are mapped to programming language operators and leaves to corresponding operands.

[0028] На данном этапе производится синтаксический анализ (парсинг) исходного кода, содержащегося в файле, полученном на Этапе (103), и написанного на любом языке программирования (Java, С#, ASP, Visual Basic, С, С++, PHP, Apex, Ruby, JavaScript, VBScript, Perl, Swift, Python, Groovy, Scala и др.), результатом которого является дерево разбора или дерево абстрактного синтаксиса, отображающее зависимости между всеми элементами исходного кода, содержащими информацию об их положениях в исходном коде (номер строки начала элемента, номер колонки начала элемента, номер строки окончания элемента, номер колонки окончания элемента), их классах, таких как выражения (expressions), инструкции (statements), объявления (declarations) и др., типах их классов, их именах, их родительских и наследных элементах, комментариях к ним и пр.[0028] At this stage, syntactic analysis (parsing) of the source code contained in the file obtained at Stage (103) and written in any programming language (Java, C#, ASP, Visual Basic, C, C++, PHP) is performed , Apex, Ruby, JavaScript, VBScript, Perl, Swift, Python, Groovy, Scala, etc.), the result of which is a parse tree or abstract syntax tree that displays the dependencies between all elements of the source code, containing information about their positions in the source code ( line number of the beginning of the element, column number of the beginning of the element, line number of the end of the element, column number of the end of the element), their classes, such as expressions, statements, declarations, etc., types of their classes, their names , their parent and inherited elements, comments to them, etc.

[0029] В одном из частных вариантов реализации заявленного технического решения, преобразуют исходный код, содержащий уязвимость в граф потока управления (CFG -control flow graph), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0029] In one of the particular embodiments of the claimed technical solution, the source code containing the vulnerability is converted into a control flow graph (CFG - control flow graph), in which internal vertices are mapped to programming language operators, and leaves are mapped to the corresponding operands.

[0030] В другом частном варианте реализации заявленного технического решения преобразуют исходный код, содержащий уязвимость в граф зависимостей управления (CDG - Control Dependence Graphs), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0030] In another particular embodiment of the claimed technical solution, the source code containing the vulnerability is converted into a control dependency graph (CDG - Control Dependence Graphs), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands.

[0031] В другом частном варианте реализации заявленного технического решения преобразуют исходный код, содержащий уязвимость в граф зависимости данных (DDG - Data Dependence Graph), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0031] In another particular embodiment of the claimed technical solution, the source code containing the vulnerability is converted into a data dependency graph (DDG - Data Dependence Graph), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands.

[0032] В другом частном варианте реализации заявленного технического решения преобразуют исходный код, содержащий уязвимость в граф зависимости программы (PDG - Program Dependence graphs), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0032] In another particular embodiment of the claimed technical solution, the source code containing the vulnerability is converted into a program dependency graph (PDG - Program Dependence graphs), in which internal vertices are mapped to programming language operators, and leaves are mapped to the corresponding operands.

[0033] В другом частном варианте реализации заявленного технического решения преобразуют исходный код, содержащий уязвимость в граф свойств кода (CPG - Code Property Graphs), в котором внутренние вершины сопоставлены с операторами языка программирования, а листья с соответствующими операндами.[0033] In another particular embodiment of the claimed technical solution, the source code containing the vulnerability is converted into a code property graph (CPG - Code Property Graphs), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands.

[0034] Далее на этапе (104) формируют путь обхода вершин AST.[0034] Next, at step (104), a path to traverse the AST vertices is formed.

[0035] Далее на этапе (105) формируют упорядоченную последовательность, которая представляет собой типы каждого элемента.[0035] Next, at step (105), an ordered sequence is generated that represents the types of each element.

[0036] На данном этапе из пути обхода вершин в AST, сформированного на этапе (105), в котором каждый элемент содержит информацию, полученную на этапе (104), формируется упорядоченная последовательность типов классов элементов (типов вершин) этого пути.[0036] At this stage, from the vertex traversal path in the AST generated in step (105), in which each element contains the information obtained in step (104), an ordered sequence of element class types (vertex types) of this path is formed.

[0037] В одном из частных вариантов изобретения упорядоченная последовательность, представляющая собой типы каждого элемента пути, является многомерной и представляет собой граф (представление дерева абстрактного синтаксиса (AST) в виде графа).[0037] In one particular embodiment of the invention, the ordered sequence representing the types of each path element is multidimensional and is a graph (an abstract syntax tree (AST) representation as a graph).

[0038] Далее на этапе (106) осуществляют обработку упорядоченной последовательности с помощью кодирующей модели машинного обучения (МО), обученной на упорядоченных последовательностях данных об уязвимостях, в ходе которой получают представление упорядоченной последовательности в виде матрицы скрытых состояний.[0038] Next, at step (106), the ordered sequence is processed using a machine learning (ML) encoding model trained on the ordered sequences of vulnerability data, during which a representation of the ordered sequence in the form of a matrix of hidden states is obtained.

[0039] На данном этапе формируется словарь всех возможных типов классов (типов) вершин AST, полученных на Этапе (103). Значениями данного словаря для каждой последовательности, сформированной на этапе (105), будут являться количества соответствующих типов, имеющихся в данной последовательности и/или N-граммы таких типов. Из значений такого словаря формируется численный вектор с одинаковой размерностью для всех последовательностей по меньшей мере по одному из методов:[0039] At this stage, a dictionary of all possible types of classes (types) of AST vertices obtained at Stage (103) is formed. The values of this dictionary for each sequence generated at step (105) will be the number of corresponding types present in this sequence and/or N-grams of such types. From the values of such a dictionary, a numerical vector with the same dimension is formed for all sequences using at least one of the methods:

• Мешок слов (bag-of-words),• Bag-of-words,

• One-hot encoding (ОНЕ),• One-hot encoding (ONE),

• Кодирование словаря уникальными индексами,• Encoding the dictionary with unique indexes,

• Word2Vec,• Word2Vec,

• Векторные представления типов - «Embedding» и/или их совокупности.• Vector representations of types - “Embedding” and/or their combinations.

[0040] Для получения Embedding типов могут быть использованы такие технологии как нейронные сети: Полносвязанные, Рекуррентные, Сверточные, Трансформеры. В одном из частных вариантов изобретения для получения векторных представлений типов используются предобученные нейронные сети, такие как DistilBERT: smaller, faster, cheaper, lighter; ALBERT (Lite BERT Google); TinyBERT; T-NLG (Turing Natural Language Generation); USE (Universal Sentence Encoder); ELMo (Embeddings from Language Models) или наследуемые от них сети.[0040] To obtain Embedding types, technologies such as neural networks can be used: Fully Connected, Recurrent, Convolutional, Transformers. In one of the particular embodiments of the invention, pre-trained neural networks are used to obtain vector representations of types, such as DistilBERT: smaller, faster, cheaper, lighter; ALBERT (Lite BERT Google); TinyBERT; T-NLG (Turing Natural Language Generation); USE (Universal Sentence Encoder); ELMo (Embeddings from Language Models) or networks inherited from them.

[0041] Векторизация[0041] Vectorization

Для извлечения признаков из сформированных словарей применялся метод векторизации TF-IDF (term-frequency times inverse document-frequency). Векторизатор обучался на обучающей выборке без использования стоп-слов (stop_words=None), без использования IDF, так как, исходя из размера обучающей выборки и наличия в каждом элементе обучающей выборки очень распространенных признаков, IDF-компонент не позволит вычислить особенные признаки для каждого словаря, использовались 1 и 2-граммы (параметр подбирался опытным путем).To extract features from the generated dictionaries, the TF-IDF (term-frequency times inverse document-frequency) vectorization method was used. The vectorizer was trained on a training sample without using stop words (stop_words=None), without using IDF, since, based on the size of the training sample and the presence of very common features in each element of the training sample, the IDF component will not allow calculating special features for each dictionary , 1 and 2 grams were used (the parameter was selected experimentally).

[0042] Обучение модели машинного обучения[0042] Training a machine learning model

Для получения матрицы скрытых последовательностей используется блок энкодера (кодировки) генеративной модели машинного обучения. Данный блок может представлять собой слой Embedding, случайно обученный или инициализированный. Далее могут идти нейронные сети: Полносвязанные, Рекуррентные, Сверточные, Трансформеры. В одном из частных вариантов изобретения блок энкодера представляет собой по меньшей мере один слой нейронные сети из типов Рекуррентные нейронные сети (Recurrent neural networks, RNN), Долгая краткосрочная память (Long short term memory, LSTM), Управляемые рекуррентные нейроны (Gated recurrent units, GRU), Нейронные машины Тьюринга (Neural Turing machines, NMT), Двунаправленные RNN, LSTM и GRU (BiRNN, BiLSTM и BiGRU), Глубокие остаточные сети (Deep residual networks, DRN), Нейронные эхо-сети (Echo state networks, ESN), Машины неустойчивых состояний (Liquid state machines, LSM), самоорганизующаяся карта Кохонена (Kohonen networks, KN, или organising (feature) map, SOM, SOFM).To obtain a matrix of hidden sequences, an encoder (encoding) block of a generative machine learning model is used. This block can be an Embedding layer, randomly trained or initialized. Next can be neural networks: Fully connected, Recurrent, Convolutional, Transformers. In one of the particular embodiments of the invention, the encoder block represents at least one layer of neural networks from the types Recurrent neural networks (RNN), Long short term memory (LSTM), Controlled recurrent neurons (Gated recurrent units, GRU), Neural Turing machines (NMT), Bidirectional RNNs, LSTM and GRU (BiRNN, BiLSTM and BiGRU), Deep residual networks (DRN), Echo state networks (ESN) , Liquid state machines (LSM), self-organizing Kohonen networks (KN, or organizing (feature) map, SOM, SOFM).

[0043] Далее на этапе (107) осуществляют обработку матрицы скрытых последовательностей с помощью генеративной модели машинного обучения (МО), обученной на матрицах скрытых состояний последовательностей для уязвимостей, последовательностей без уязвимостей и их исходных кодах, в ходе которой получают новую упорядоченную последовательность, соответствующую исходному программному коду по выполняемым функциям, но с устраненной в нем уязвимостью.[0043] Next, at step (107), the matrix of hidden sequences is processed using a generative machine learning (ML) model trained on the matrices of hidden states of sequences for vulnerabilities, sequences without vulnerabilities and their source codes, during which a new ordered sequence corresponding to the source program code according to the functions performed, but with the vulnerability eliminated in it.

[0044] Данная обработка осуществляется блоком генератора (декодера) генеративной модели машинного обучения. Этот блок может представлять собой декодер типа Вариационный автоэнкодер (VAE), Методы глубокого обучения, ограниченную машину Больцмана (RBM), Глубокую сеть доверия (Deep Belief Network - DBN), нейронные сети: Полносвязанные, Рекуррентные, Сверточные, Трансформеры.[0044] This processing is carried out by the generator (decoder) block of the generative machine learning model. This block can be a decoder such as Variational Autoencoder (VAE), Deep Learning Methods, Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), neural networks: Fully Connected, Recurrent, Convolutional, Transformers.

[0045] В одном из частных вариантов изобретения блок декодера представляет собой по меньшей мере один слой нейронные сети из типов Сверточных нейронных сете (CNN), Рекуррентные нейронные сети (Recurrent neural networks, RNN), Долгая краткосрочная память (Long short term memory, LSTM), Управляемые рекуррентные нейроны (Gated recurrent units, GRU), Нейронные машины Тьюринга (Neural Turing machines, NMT), Двунаправленные RNN, LSTM и GRU (BiRNN, BiLSTM и BiGRU), Глубокие остаточные сети (Deep residual networks, DRN), Нейронные эхо-сети (Echo state networks, ESN), Машины неустойчивых состояний (Liquid state machines, LSM), самоорганизующаяся карта Кохонена (Kohonen networks, KN, или organising (feature) map, SOM, SOFM).[0045] In one of the particular embodiments of the invention, the decoder block represents at least one layer of neural networks from the types Convolutional neural networks (CNN), Recurrent neural networks (RNN), Long short term memory (LSTM) ), Gated recurrent units (GRU), Neural Turing machines (NMT), Bidirectional RNNs, LSTM and GRU (BiRNN, BiLSTM and BiGRU), Deep residual networks (DRN), Neural echo networks (Echo state networks (ESN), Liquid state machines (LSM), self-organizing Kohonen networks (KN, or organizing (feature) map, SOM, SOFM).

[0046] В одном из частных вариантов изобретения в качестве модели машинного обучения используется Генеративно-состязательные сети (англ. Generative adversarial networks, сокр. GAN). Данная генеративно-состязательная сеть может содержать блок энкодера со слоями указанные в разделе выше. Может содержать блок декодера со слоями указанные в разделе выше. Блок дискриминатора (дискриминантной модели) в одном из частных вариантов изобретения содержит нейронные сети: Полносвязанные, Рекуррентные, Сверточные, Трансформеры; Классификаторы: DecisionTree (дерево решений), LogRegression (логистическая регрессия), Bayes (Наивный байесовский классификатор), SVM (метод опорных векторов), K-means (к-соседей), RandomForest (случайный лес), градиентные бустинги: XGBoost, LightGBM.[0046] In one of the particular embodiments of the invention, Generative adversarial networks (abbr. GAN) are used as a machine learning model. This generative adversarial network may contain an encoder block with the layers specified in the section above. May contain a decoder block with the layers specified in the section above. The discriminator block (discriminant model) in one of the particular embodiments of the invention contains neural networks: Fully connected, Recurrent, Convolutional, Transformers; Classifiers: DecisionTree (decision tree), LogRegression (logistic regression), Bayes (Naive Bayes classifier), SVM (support vector machine), K-means (k-neighbors), RandomForest (random forest), gradient boosting: XGBoost, LightGBM.

[0047] Обучение происходит следующим образом: операции производятся над двумя группами блоков программного кода, где первые блоки программного кода - это код с уязвимостью, а вторые блоки программного кода - это код без уязвимости. Данные блоки переводятся в упорядоченную последовательность элементов AST дерева в соответствии с этапами (101)-(106).[0047] Training occurs as follows: operations are performed on two groups of program code blocks, where the first program code blocks are code with a vulnerability, and the second program code blocks are code without a vulnerability. These blocks are translated into an ordered sequence of AST tree elements in accordance with steps (101)-(106).

[0048] Далее получают последовательность кода с уязвимостью - А, соответствующую ей последовательность кода без уязвимости - Б. Полученную последовательность А (кода с уязвимостью) подают в блок энкодера (кодировки) генеративной модели машинного обучения. Получают матрицу скрытых состояний (107). Данную матрицу скрытых состояний подают в блок генератора (декодера) генеративной модели машинного обучения (108). Получают новую (восстановленную) упорядоченную последовательность - В. Данную последовательность (В) сравнивают с исходной последовательностью Б (кода без уязвимости). Полученные несоответствия пересчитывают в количественную характеристику по выбранной функции потерь. Данную количественную характеристику используют для обучения нейронной сети по выбранному алгоритму обучения.[0048] Next, a code sequence with vulnerability - A, and a corresponding code sequence without vulnerability - B are obtained. The resulting sequence A (code with vulnerability) is fed into the encoder (encoding) block of the generative machine learning model. A matrix of hidden states (107) is obtained. This matrix of hidden states is supplied to the generator (decoder) block of the generative machine learning model (108). A new (restored) ordered sequence is obtained - B. This sequence (B) is compared with the original sequence B (code without vulnerability). The resulting discrepancies are recalculated into a quantitative characteristic using the selected loss function. This quantitative characteristic is used to train a neural network using the selected learning algorithm.

[0049] В одном из частных вариантов изобретения используется по меньшей мере одна из функций потерь: KLD (Вычисляет потерю дивергенции Кулбека-Лейблера между истинным значением и предсказанным значением), МАЕ (Вычисляет среднюю абсолютную ошибку между метками и прогнозами), МАРЕ (Вычисляет среднюю абсолютную процентную ошибку между истинным значением и предсказанным значением), MSE (Вычисляет среднеквадратичную ошибку между метками и прогнозами), MSLE (Вычисляет среднеквадратичную логарифмическую ошибку между истинным значением и предсказанным значением), binary_crossentropy (Вычисляет двоичную потерю кроссэнтропии), binary_focal_crossentropy (Вычисляет двоичную потерю фокальной кроссэнтропии), categoricalcrossentropy (Вычисляет категориальную потерю кроссэнтропии), categorical_hinge (Вычисляет категориальную потерю «лассо» между истинным значением и предсказанным значением), cosinesimilarity (Вычисляет косинусное сходство между метками и предсказаниями), hinge (Вычисляет потери в шарнире между истинным значением и предсказанным значением), huber (Вычисляет величину потерь по Хуберу), kl_divergence (Вычисляет потерю дивергенции Кулбека-Лейблера между истинным значением и предсказанным значением), kullback_leibler divergence (Вычисляет потерю дивергенции Кулбека-Лейблера между истинным значением и предсказанным значением), logcosh (Логарифм гиперболического косинуса ошибки прогнозирования), logcosh (Логарифм гиперболического косинуса ошибки прогнозирования), mean_absolute_error (Вычисляет среднюю абсолютную ошибку между метками и прогнозами), mean_absolute_percentage_error (Вычисляет среднюю абсолютную процентную ошибку между истинным значением и предсказанным значением), mean_squared_error (Вычисляет среднеквадратичную ошибку между метками и прогнозами), mean_squared_logarithmic_error (Вычисляет среднеквадратичную логарифмическую ошибку между истинным значением и предсказанным значением), poisson (Вычисляет потери Пуассона между истинным значением и предсказанным значением), sparse_categorical_crossentropy (Вычисляет разреженную категориальную потерю кроссэнтропии), squared_hinge (Вычисляет квадрат потерь на «лассо» между истинным значением и предсказанным значением). Данные функции потерь часто имеют следующие обозначения: binary_cross_entropy, binary_cross_entropy_with_logits, poisson_nll_loss, cosine_embedding_loss, cross_entropy, ctc_loss, gaussian_nl_loss, hingee_mbedding_loss, kl_div, Il_loss, mse_loss, margin_ranking_loss, multilabel_margin_loss, multilabel_soft_margin_loss, multirnargin_loss, nll_loss, huber_loss, smooth_Il_loss, soft_margin_loss, triplet_margin_loss, triplet_margin_with_distance_loss.[0049] In one particular embodiment of the invention, at least one of the loss functions is used: KLD (Computes the loss of Kullbeck-Leibler divergence between the true value and the predicted value), MAE (Computes the average absolute error between labels and predictions), MAPE (Computes the average absolute percentage error between the true value and the predicted value), MSE (Calculate the mean square error between labels and predictions), MSLE (Calculate the mean square log error between the true value and the predicted value), binary_crossentropy (Calculate the binary crossentropy loss), binary_focal_crossentropy (Calculate the binary focal loss crossentropy), categoricalcrossentropy (Calculate the categorical crossentropy loss), categorical_hinge (Calculate the categorical lasso loss between the true value and the predicted value), cosinesimilarity (Calculate the cosine similarity between labels and predictions), hinge (Calculate the hinge loss between the true value and the predicted value ), huber (Calculate the magnitude of the Huber loss), kl_divergence (Calculate the Kullback-Leibler divergence loss between the true value and the predicted value), kullback_leibler divergence (Calculate the Kullback-Leibler divergence loss between the true value and the predicted value), logcosh (Logarithm of the hyperbolic cosine of the error prediction), logcosh (Logarithm of the hyperbolic cosine of the prediction error), mean_absolute_error (Calculates the mean absolute error between labels and predictions), mean_absolute_percentage_error (Calculates the mean absolute percentage error between the true value and the predicted value), mean_squared_error (Calculates the mean squared error between labels and predictions), mean_squared_logarithmic_error (Calculate the mean squared logarithmic error between the true value and the predicted value), poisson (Calculate the Poisson loss between the true value and the predicted value), sparse_categorical_crossentropy (Calculate the sparse categorical crossentropy loss), squared_hinge (Calculate the squared lasso loss between the true value and the predicted value value). These loss functions often have the following notations: binary_cross_entropy, binary_cross_entropy_with_logits, poisson_nll_loss, cosine_embedding_loss, cross_entropy, ctc_loss, gaussian_nl_loss, hingee_mbedding_loss, kl_div, Il_loss, mse_loss, margin_ranking_loss, multilabel_margin_loss, multilabel_soft_margin_ loss, multirnargin_loss, nll_loss, huber_loss, smooth_Il_loss, soft_margin_loss, triplet_margin_loss, triplet_margin_with_distance_loss.

[0050] В одном из частных вариантов изобретения используется по меньшей мере один из алгоритмов обучения: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, FTRL, SGD, FastSGD, SGD-Nesterov, SAGA, SAGA+.[0050] In one of the private embodiments of the invention, at least one of the learning algorithms is used: Adadelta, Adagrad, Adam, AdamW, SparseAdam, Adamax, ASGD, LBFGS, NAdam, RAdam, RMSprop, Rprop, FTRL, SGD, FastSGD, SGD- Nesterov, SAGA, SAGA+.

[0051] Далее на этапе (108) переводят полученную на предыдущем этапе упорядоченную последовательность в блок исходного кода, эквивалентный по функционалу изначальному блоку исходного кода.[0051] Next, at step (108), the ordered sequence obtained at the previous step is translated into a source code block that is functionally equivalent to the original source code block.

[0052] В результате, данный подход позволяет формировать и приоритизировать задачи по устранению недостатков в разрабатываемом программном обеспечении, тем самым повышая скорость разработки программного кода, уменьшает количество ошибок в программном коде, а также напрямую влияет на безопасность программного кода, что обеспечивает повышение скорости обновления программного обеспечения или вывода программного обеспечения на рынок за счет исключения из уязвимостей и увеличивая надежность программного обеспечения от действий злоумышленников, направленных на:[0052] As a result, this approach allows you to formulate and prioritize tasks to eliminate deficiencies in the software being developed, thereby increasing the speed of program code development, reduces the number of errors in the program code, and also directly affects the security of the program code, which ensures an increase in the update rate software or bringing software to market by eliminating vulnerabilities and increasing the reliability of software from malicious actions aimed at:

• Хищение чувствительной информации,• Theft of sensitive information,

• Причинение репутационного или финансового ущерба организации или пользователю,• Causing reputational or financial damage to an organization or user,

• Уничтожение важных данных или препятствование доступу к важным данным,• Destroying important data or preventing access to important data,

• Искажение информации,• Distortion of information,

• Хищение денежных средств.• Theft of funds.

[0053] На Фиг. 2 представлен пример общего вида вычислительной системы (300), которая обеспечивает реализацию заявленного способа или является частью компьютерной системы, например, сервером, персональным компьютером, частью вычислительного кластера, обрабатывающим необходимые данные для осуществления заявленного технического решения.[0053] In FIG. 2 shows an example of a general view of a computing system (300), which implements the claimed method or is part of a computer system, for example, a server, a personal computer, or part of a computing cluster that processes the necessary data to implement the claimed technical solution.

[0054] В общем случае, система (300) содержит объединенные общей шиной информационного обмена один или несколько процессоров (301), средства памяти, такие как ОЗУ (302) и ПЗУ (303), интерфейсы ввода/вывода (304), устройства ввода/вывода (1105), и устройство для сетевого взаимодействия (306).[0054] In general, the system (300) contains one or more processors (301), memory devices such as RAM (302) and ROM (303), input/output interfaces (304), and input devices connected by a common information exchange bus. /output (1105), and a device for network communication (306).

[0055] Процессор (301) (или несколько процессоров, многоядерный процессор и т.п.) может выбираться из ассортимента устройств, широко применяемых в настоящее время, например, таких производителей, как: Intel™, AMD™, Apple™, Samsung Exynos™, MediaTEK™, Qualcomm Snapdragon™ и т.п.Под процессором или одним из используемых процессоров в системе (300) также необходимо учитывать графический процессор, например, GPU NVIDIA или Graphcore, тип которых также является пригодным для полного или частичного выполнения способа, а также может применяться для обучения и применения моделей машинного обучения в различных информационных системах.[0055] The processor (301) (or multiple processors, multi-core processor, etc.) may be selected from a variety of devices commonly used today, for example, from manufacturers such as: Intel™, AMD™, Apple™, Samsung Exynos ™, MediaTEK™, Qualcomm Snapdragon™, etc. Under the processor or one of the processors used in the system (300), it is also necessary to take into account the graphics processor, for example, NVIDIA GPU or Graphcore, the type of which is also suitable for carrying out the method in whole or in part, and can also be used to train and apply machine learning models in various information systems.

[0056] ОЗУ (302) представляет собой оперативную память и предназначено для хранения исполняемых процессором (301) машиночитаемых инструкций для выполнения необходимых операций по логической обработке данных. ОЗУ (302), как правило, содержит исполняемые инструкции операционной системы и соответствующих программных компонент (приложения, программные модули и т.п.). При этом, в качестве ОЗУ (302) может выступать доступный объем памяти графической карты или графического процессора.[0056] RAM (302) is a random access memory and is designed to store computer-readable instructions executable by the processor (301) to perform the necessary logical data processing operations. The RAM (302) typically contains executable operating system instructions and associated software components (applications, program modules, etc.). In this case, the available memory capacity of the graphics card or graphics processor can act as RAM (302).

[0057] ПЗУ (303) представляет собой одно или более устройств постоянного хранения данных, например, жесткий диск (HDD), твердотельный накопитель данных (SSD), флэш-память (EEPROM, NAND и т.п.), оптические носители информации (CD-R/RW, DVD-R/RW, BlueRay Disc, MD) и др.[0057] The ROM (303) is one or more permanent storage devices, such as a hard disk drive (HDD), a solid state drive (SSD), flash memory (EEPROM, NAND, etc.), optical storage media ( CD-R/RW, DVD-R/RW, BlueRay Disc, MD), etc.

[0058] Для организации работы компонентов системы (300) и организации работы внешних подключаемых устройств применяются различные виды интерфейсов В/В (304). Выбор соответствующих интерфейсов зависит от конкретного исполнения вычислительного устройства, которые могут представлять собой, не ограничиваясь: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232 и т.п.[0058] To organize the operation of system components (300) and organize the operation of external connected devices, various types of I/O interfaces (304) are used. The choice of appropriate interfaces depends on the specific design of the computing device, which can be, but is not limited to: PCI, AGP, PS/2, IrDa, FireWire, LPT, COM, SATA, IDE, Lightning, USB (2.0, 3.0, 3.1, micro, mini, type C), TRS/Audio jack (2.5, 3.5, 6.35), HDMI, DVI, VGA, Display Port, RJ45, RS232, etc.

[0059] Для обеспечения взаимодействия пользователя с вычислительной системой (300) применяются различные средства (305) В/В информации, например, клавиатура, дисплей (монитор), сенсорный дисплей, тач-пад, джойстик, манипулятор мышь, световое перо, стилус, сенсорная панель, трекбол, динамики, микрофон, средства дополненной реальности, оптические сенсоры, планшет, световые индикаторы, проектор, камера, средства биометрической идентификации (сканер сетчатки глаза, сканер отпечатков пальцев, модуль распознавания голоса) и т.п.[0059] To ensure user interaction with the computing system (300), various means (305) of I/O information are used, for example, a keyboard, a display (monitor), a touch display, a touch pad, a joystick, a mouse, a light pen, a stylus, touch panel, trackball, speakers, microphone, augmented reality tools, optical sensors, tablet, light indicators, projector, camera, biometric identification tools (retina scanner, fingerprint scanner, voice recognition module), etc.

[0060] Средство сетевого взаимодействия (306) обеспечивает передачу данных посредством внутренней или внешней вычислительной сети, например, Интранет, Интернет, ЛВС и т.п. В качестве одного или более средств (306) может использоваться, но не ограничиваться: Ethernet карта, GSM модем, GPRS модем, LTE модем, 5G модем, модуль спутниковой связи, NFC модуль, Bluetooth и/или BLE модуль, Wi-Fi модуль и др.[0060] The network communication means (306) provides data transmission via an internal or external computer network, for example, an Intranet, the Internet, a LAN, etc. One or more means (306) may be used, but not limited to: Ethernet card, GSM modem, GPRS modem, LTE modem, 5G modem, satellite communication module, NFC module, Bluetooth and/or BLE module, Wi-Fi module and etc.

[0061] Представленные материалы заявки раскрывают предпочтительные примеры реализации технического решения и не должны трактоваться как ограничивающие иные, частные примеры его воплощения, не выходящие за пределы испрашиваемой правовой охраны, которые являются очевидными для специалистов соответствующей области техники.[0061] The submitted application materials disclose preferred examples of implementation of a technical solution and should not be interpreted as limiting other, particular examples of its implementation that do not go beyond the scope of the requested legal protection, which are obvious to specialists in the relevant field of technology.

Claims

1. A computer-implemented method for eliminating vulnerabilities in program code, performed using at least one processor and containing the steps of:

- receive data on vulnerabilities in the program code;

- obtain, based on the previous stage, data containing at least a block of source code and the type of vulnerability detected;

- transform the source code containing the vulnerability into an abstract syntax tree (AST), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands;

- form a path to traverse the vertices in AST;

- form an ordered sequence that represents the types of each element of the path;

- process the ordered sequence using a machine learning encoding model trained on ordered sequences of vulnerability data, during which a representation of the ordered sequence is obtained in the form of a matrix of hidden states;

- process the matrix of hidden sequences using a generative machine learning model trained on matrices of hidden states of sequences for vulnerabilities, sequences without vulnerabilities and their source codes, during which a new ordered sequence is obtained that corresponds to the source program code for the functions performed, but with the functions eliminated without vulnerability;

- translate the ordered sequence obtained at the previous stage into a source code block that is functionally equivalent to the original source code block.

2. The method according to claim 1, characterized in that the ordered sequence, which represents the types of each element of the path, is multidimensional and represents a graph.

3. The method according to claim 1, characterized in that the source code containing the vulnerability is converted into a control flow graph (CFG - control flow graph), in which internal vertices are associated with programming language operators, and leaves with the corresponding operands.

4. The method according to claim 1, characterized by converting the source code containing the vulnerability into a control dependency graph (CDG - Control Dependence Graphs), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands.

5. The method according to claim 1, characterized in that the source code containing the vulnerability is converted into a data dependency graph (DDG - Data Dependence Graph), in which internal vertices are associated with programming language operators, and leaves with the corresponding operands.

6. The method according to claim 1, characterized by converting the source code containing the vulnerability into a program dependency graph (PDG - Program Dependence graphs), in which internal vertices are mapped to programming language operators, and leaves to the corresponding operands.

7. The method according to claim 1, characterized in that the source code containing the vulnerability is converted into a code property graph (CPG - Code Property Graphs), in which internal vertices are associated with programming language operators, and leaves with the corresponding operands.

8. A system for eliminating vulnerabilities in program code, containing at least one processor and memory storing machine-readable instructions, which, when executed by the processor, implement the method according to any one of claims. 1-7.