RU2649748C2

RU2649748C2 - System and method for interpreting and analyzing dynamic characteristics of the current state of tasks performed

Info

Publication number: RU2649748C2
Application number: RU2016134614A
Authority: RU
Inventors: Александр Сергеевич Антонов; Вадим Владимирович Воеводин; Владимир Валентинович Воеводин; Сергей Анатольевич Жуматий; Дмитрий Александрович Никитенко; Константин Сергеевич Стефанов; Алексей Михайлович Теплов; Павел Артемович Швец
Priority date: 2016-08-24
Filing date: 2016-08-24
Publication date: 2018-04-04
Also published as: RU2016134614A; RU2016134614A3

Abstract

FIELD: information technology.

SUBSTANCE: invention relates to means for analyzing dynamic characteristics of parallel programs and supercomputers. System contains a set of computational nodes of a supercomputer, each of which is equipped with a monitoring system (CM), a task flow control system (SOUP), information processing server, including a module for data aggregation from monitoring systems of each computing node, a data analysis module from the task flow control system of each computing node, as well as a data storage unit, a tool for visualizing the results of processing. At the same time, SM means include sensors of system monitoring that provide information on the status and degree of use of available resources from each of the available monitoring systems, MEMS facilities are designed to obtain information about the status of tasks, their distribution by nodes and the nature of resource use at compute nodes. Method describes system operation.

EFFECT: technical result consists in increasing efficiency of supercomputer by analyzing current state of problem being solved.

11 cl, 7 dwg, 1 tbl, 1 ex

Description

Область техникиTechnical field

Заявляемая группа изобретений относится к тематике исследования поведения программ (задач) во время выполнения на суперкомпьютерных системах и предназначена для реализации всестороннего анализа динамических характеристик параллельных программ и суперкомпьютеров. При проведении анализа учитываются характеристики задачи, начиная от момента постановки в очередь и заканчивая ее завершением. Это позволяет получать полную информацию как о самой задаче, так и о всей совокупности выполняемых на суперкомпьютере задач.The inventive group of inventions relates to the study of the behavior of programs (tasks) during execution on supercomputer systems and is intended to implement a comprehensive analysis of the dynamic characteristics of parallel programs and supercomputers. During the analysis, the characteristics of the task are taken into account, starting from the moment of queuing and ending with its completion. This allows you to get complete information about both the task itself and the entire set of tasks performed on the supercomputer.

Вместе с ростом масштаба вычислительных систем и решаемых на них задач растет и сложность написания эффективных программ. Свойства аппаратного и программного обеспечения суперкомпьютера, свойства самой исполняемой программы, взаимное влияние исполняемых программ друг на друга - все это необходимо учитывать, если стремишься добиться высокой эффективности. Основная задача любой суперкомпьютерной установки - удовлетворение потребностей пользователей, решающих задачи, за которыми стоят реальные проблемы из различных прикладных областей. Поэтому проблема повышения эффективности работы каждого отдельного приложения чрезвычайно актуальна. Ее успешное решение для отдельного приложения благоприятно сказывается на решении не только той реальной научной задачи, которая за ней стоит, но и на работе всей системы в целом.Along with the growing scale of computing systems and the tasks they solve, the complexity of writing effective programs also grows. The properties of the hardware and software of a supercomputer, the properties of the executable program itself, the mutual influence of the executable programs on each other - all this must be taken into account if you want to achieve high efficiency. The main task of any supercomputer installation is to satisfy the needs of users who solve problems that are faced with real problems from various application areas. Therefore, the problem of increasing the efficiency of each individual application is extremely relevant. Its successful solution for a separate application favorably affects the solution not only of the real scientific problem that is behind it, but also on the work of the entire system as a whole.

Уровень техникиState of the art

Значительное число доступных сегодня программных средств, ориентированных на обнаружение конкретного местоположения ошибки в программе пользователя, локализации части программы, которая работает неэффективно и нуждается в оптимизации, используют метод трассировки. Чаще всего определяется набор фиксируемых событий, гранулярность их сбора, сами источники данных. Помимо сбора сведений о возникновении самих событий, распространена практика использования файла трасс для анализа итоговой последовательности операций и/или событий при выполнении приложения.A significant number of software tools available today that are aimed at detecting a specific location of an error in the user program, localizing the part of the program that is inefficient and needs to be optimized, use the tracing method. Most often, a set of recorded events, the granularity of their collection, and the data sources themselves are determined. In addition to collecting information about the occurrence of the events themselves, it is common practice to use the trace file to analyze the final sequence of operations and / or events during application execution.

Подходы, имеющие в основе трассировку, достаточно развиты и широко распространены. Наиболее характерными представителями своих подходов могут служить следующие программные средства.The approaches based on tracing are quite developed and widespread. The following software tools can serve as the most characteristic representatives of their approaches.

Известная система Scalasca (http://www.scalasca.org) поддерживает выполнение оптимизации производительности параллельных приложений на основании измерений 19 характеристик в ходе выполнения программы и соответствующем анализе. Такой анализ позволяет определить потенциально узкие места в приложении, относящиеся к коммуникации и синхронизации процессов. Система также предоставляет направления для дальнейшего углубленного изучения проблемных мест. В системе можно выбрать один из двух режимов анализа: исследование производительности на уровне вызовов функций, основанный на суммарных временах выполнения (профилирование) и изучение поведения задачи, основанное на трассировке событий. Система доступна для загрузки под лицензией с открытыми исходными кодами New BSD.The well-known Scalasca system (http://www.scalasca.org) supports the performance optimization of parallel applications based on measurements of 19 characteristics during program execution and related analysis. This analysis allows you to identify potentially bottlenecks in the application related to communication and process synchronization. The system also provides directions for further in-depth study of problem areas. In the system, you can choose one of two analysis modes: a performance study at the level of function calls based on the total execution times (profiling) and a study of the behavior of the task based on event tracking. The system is available for download under the New BSD Open Source License.

Система Score-P (http://www.vi-hps.org/projects/score-p) является относительно новой системой сбора и анализа данных о работе программ, работающей на уровне приложений. Система доступна с открытым исходным кодом и основана на инструментации кода программ.The Score-P system (http://www.vi-hps.org/projects/score-p) is a relatively new system for collecting and analyzing data on the work of programs operating at the application level. The system is available with open source code and is based on instrumentation of program code.

Известная технология ThreadSpotter (http://www.roguewave.com/products/threadspotter), ориентированная на оптимизацию производительности, разработана корпорацией Rogue Wave и является естественным продолжением исследовательских проектов Упсальского Университета, Швеция. В обычном режиме собирается разнородная информация о поведении программы в виде так называемого «отпечатка». На основе данной информации производительность кэшей любого размера с любым размером строки и с любой стратегией выталкивания может оцениваться ThreadSpotter'oM без привязки к целевой системе. ThreadSpotter также позволяет обнаруживать ошибки и места падения производительности в приложениях в контексте определенных закономерностей доступа к данным. Такого рода ошибки группируются в 4 группы проблем: проблемы замусоривания кэшей, проблемы латентности, проблемы пропускной способности и проблемы взаимодействия потоков.Well-known ThreadSpotter technology (http://www.roguewave.com/products/threadspotter), focused on performance optimization, was developed by Rogue Wave Corporation and is a natural continuation of research projects at Uppsala University, Sweden. In normal mode, heterogeneous information is collected about the behavior of the program in the form of a so-called “fingerprint”. Based on this information, the performance of caches of any size with any line size and with any push strategy can be evaluated by ThreadSpotter'oM without reference to the target system. ThreadSpotter also allows you to detect errors and places where performance drops in applications in the context of certain patterns of data access. Errors of this kind are grouped into 4 groups of problems: problems of littering caches, problems of latency, problems of throughput, and problems of interaction of flows.

Из уровня техники известны СУПЗ, которые также могут быть использованы при реализации заявленного изобретения:The prior art CPS, which can also be used in the implementation of the claimed invention:

SLURM (Simple Linux Utility for Resource Management) - высокомасштабируемый отказоустойчивый менеджер кластеров и планировщик заданий для вычислительных узлов больших кластеров. SLURM поддерживает очередь ожидающих заданий и управляет общей загрузкой ресурсов в процессе выполнения вычислительных задач. Также SLURM управляет доступными вычислительными узлами. Наконец, в дополнение к мониторингу параллельных заданий вплоть до их завершения SLURM распределяет нагрузку по выделенным узлам (https://computing.llnl.gov/liniix/slurm/).SLURM (Simple Linux Utility for Resource Management) is a highly scalable fail-safe cluster manager and task scheduler for computing nodes of large clusters. SLURM maintains a queue of pending jobs and manages the total load of resources during the execution of computational tasks. SLURM also manages available compute nodes. Finally, in addition to monitoring parallel tasks, until their completion, SLURM distributes the load among the selected nodes (https://computing.llnl.gov/liniix/slurm/).

CLEO - программный комплекс, входящий в состав системы ParCon, решающей задачи эффективного управления ресурсами вычислительных кластеров, а также анализа эффективности кластеров и параллельных программ (http://parcon.parallel.ru/cleo.html). Эта система ориентирована на работу с параллельными приложениями и поддерживает многие параллельные среды.CLEO is a software package that is part of the ParCon system that solves the problems of efficiently managing resources of computing clusters, as well as analyzing the effectiveness of clusters and parallel programs (http://parcon.parallel.ru/cleo.html). This system is oriented to work with parallel applications and supports many parallel environments.

Каждая из таких систем ведет собственный учет всех известных ей задач и событий, связанных с ходом выполнения приложения.Each of these systems maintains its own record of all the tasks and events known to it related to the progress of the application.

Таким образом, если требуются сведения о потоке, то основным источником данных о самой структуре потока задач является непосредственно система управления потоком задач (СУПЗ), называемая также менеджером ресурсов (Resource Manager).Thus, if information about the flow is required, the main source of data about the structure of the task flow itself is the task flow control system (CPS) itself, also called the Resource Manager.

Также при реализации подхода, описанного в заявляемом изобретении, используют данные систем мониторинга с каждого вычислительного узла. В качестве таких систем мониторинга могут быть использованы известные ClustrX Watch, Ganglia.Also, when implementing the approach described in the claimed invention, data from monitoring systems from each computing node are used. As such monitoring systems, the well-known ClustrX Watch, Ganglia can be used.

ClustrX Watch - распределенная система мониторинга параметров кластера, предназначенная для организации сбора, регистрации и обработки данных с большого количества датчиков всех подсистем кластера и способная функционировать в отказоустойчивом режиме. ClustrX Watch является разработкой ОАО «Т-Платформы», г. Москва, и входит в систему управления кластером - ClusrtX.ClustrX Watch is a distributed cluster parameter monitoring system designed to organize the collection, recording and processing of data from a large number of sensors of all cluster subsystems and capable of functioning in a fail-safe mode. ClustrX Watch is a development of T-Platforms OJSC, Moscow, and is included in the cluster management system - ClusrtX.

Некоторые из упомянутых средств позволяют провести сбор базовой информации о состоянии вычислителя. Но существует множество специализированного инструментария, более полно реализующего возможности системного мониторинга.Some of these tools allow you to collect basic information about the status of the computer. But there are many specialized tools that more fully implements the capabilities of system monitoring.

Одним из наиболее известных средств сбора данных системного мониторинга является общедоступная система Ganglia (http://ganglia.sourceforge.net). Она позволяет собирать информацию о загрузке процессора, загрузке сети, использовании памяти, многих других ресурсах, но не позволяет осуществить анализ собранных данных в привязке к конкретным выполняемым приложениям. Что важно, система не обеспечивает высокую детализацию по времени при больших масштабах исследуемых систем.One of the best known systems monitoring data collection tools is the publicly available Ganglia system (http://ganglia.sourceforge.net). It allows you to collect information about the processor load, network load, memory usage, many other resources, but it does not allow the analysis of the collected data in relation to specific running applications. What is important, the system does not provide high time detail for large-scale systems under study.

Для суперкомпьютерных комплексов, которые призваны решать задачи из разных областей науки, характерна неоднородность решаемых задач: различные требования по объемам требуемых ресурсов, возможности использования вычислительных ускорителей, доступности локальных дисков и многому другому.Supercomputer complexes that are designed to solve problems from different fields of science are characterized by heterogeneity of the tasks being solved: various requirements for the amount of required resources, the possibility of using computational accelerators, the availability of local disks, and much more.

Таким образом, используя эти сведения, можно осуществить привязку данных системного мониторинга, отражающих состояние вычислителя непосредственно с приложением, использующим известный набор узлов или разделов вычислительной системы.Thus, using this information, it is possible to bind system monitoring data that reflects the state of the computer directly with an application that uses a known set of nodes or sections of a computer system.

Сочетание доступности данных от СУПЗ, описывающих всю структуру загрузки вычислительной системы, с богатыми возможностями системного мониторинга позволяет разработать комплексные методы анализа эффективности суперкомпьютерных приложений и систем - от уровня отдельного приложения до уровня раздела или системы в целом.The combination of data availability from CPS, describing the entire structure of the computing system, with the rich capabilities of system monitoring allows us to develop comprehensive methods for analyzing the effectiveness of supercomputer applications and systems - from the level of an individual application to the level of the partition or the system as a whole.

Раскрытие изобретенияDisclosure of invention

Задачей заявляемой группы изобретений является обеспечение качественного анализа любой задачи из всего потока задач, выполненных суперкомпьютером и количественного анализа среднего использования задачами вычислительных ресурсов. Снижение эффективности работы каждого отдельного суперкомпьютерного приложения, каждой задачи приводит к увеличению затраченного времени, что ведет к снижению эффективности работы системы в целом. Таким образом, заявляемое техническое решение направлено на повышение эффективности работы суперкомпьютера за счет создания возможности оперативного реагирования на текущее состояние решаемой задачи.The task of the claimed group of inventions is to provide a qualitative analysis of any task from the entire stream of tasks performed by a supercomputer and a quantitative analysis of the average use of computing resources by tasks. The decrease in the operational efficiency of each individual supercomputer application, of each task leads to an increase in the time spent, which leads to a decrease in the overall system performance. Thus, the claimed technical solution is aimed at improving the efficiency of the supercomputer by creating the ability to quickly respond to the current state of the problem.

Технический результат, достигаемый при использовании заявляемого изобретения, заключается в обеспечении возможности оценки пользователем информации о характеристиках и динамических особенностях выполнения той или иной конкретной задачи, решаемой суперкомпьютером, ее текущего состояния.The technical result achieved by using the claimed invention is to enable the user to evaluate information about the characteristics and dynamic features of the performance of a particular task, solved by a supercomputer, its current state.

Поставленная задача решается тем, что заявляемая система интерпретации и анализа динамических свойств задач, решаемых суперкомпьютером, включает:The problem is solved in that the claimed system of interpretation and analysis of the dynamic properties of problems solved by a supercomputer includes:

набор вычислительных узлов суперкомпьютера, каждый из которых снабжен средствами системы мониторинга (СМ) и системы управления потоком задач (СУПЗ),a set of computing nodes of a supercomputer, each of which is equipped with the means of a monitoring system (SM) and a task flow control system (CPS),

сервер обработки информации, включающий модуль агрегации данных от системы мониторинга каждого вычислительного узла, модуль анализа данных от системы управления потоком задач каждого вычислительного узла, а также блок хранения данных,an information processing server including a data aggregation module from a monitoring system of each computing node, a data analysis module from a task flow control system of each computing node, and a data storage unit,

средство визуализации результатов обработки, при этомmeans for visualizing the processing results, while

средства СМ включают датчики системного мониторинга, обеспечивающие получение информации о состоянии и степени использования доступных ресурсов от каждой из доступных систем мониторинга,SM tools include system monitoring sensors, which provide information on the status and degree of use of available resources from each of the available monitoring systems,

средства СУПЗ предназначены для получения информации о статусе задач, их распределении по узлам и характере использования ресурсов на вычислительных узлах.CPS tools are designed to obtain information about the status of tasks, their distribution among nodes and the nature of the use of resources at computing nodes.

Поставленная задача также решается тем, что заявляется способ интерпретации и анализа динамических свойств задач, решаемых суперкомпьютером, включающий следующие этапы:The problem is also solved by the fact that the claimed method of interpretation and analysis of the dynamic properties of tasks solved by a supercomputer, including the following steps:

- сбор данных с вычислительных узлов от датчиков системного мониторинга вычислительного узла суперкомпьютера по каждой отдельной задаче в ходе ее выполнения и помещение собранных данных в модуль агрегации;- collecting data from computing nodes from sensors of system monitoring of the computing node of a supercomputer for each individual task during its implementation and placing the collected data into an aggregation module;

- сбор данных от системы управления потоком заданий по каждой отдельной задаче и помещение собранных данных в модуль анализа;- collecting data from the task flow control system for each individual task and putting the collected data into the analysis module;

- обработка собранных данных сервером обработки информации и связывание их по одинаковым выполняемым задачам;- processing the collected data by the information processing server and linking them according to the same tasks being performed;

- формирование сервером обработки информации отчета по задаче по запросу пользователя, включающего результаты обработки собранных данных по каждой задаче;- the formation of the server information processing report on the task at the request of the user, including the results of processing the collected data for each task;

- визуализация отчета.- visualization of the report.

При обработке собранных данных модулями агрегации и анализа осуществляют сохранение результатов обработки в базу данных сервера обработки информации, обеспечивающую хранение данных о задачах, хранение динамических характеристик, хранение интегральных характеристик. Модуль агрегации обеспечивает выравнивание поступающих данных системного мониторинга, приведение данных к единым временным интервалам, прореживание, фильтрацию данных мониторинга, формирование динамических характеристик, сохранение в базу данных сервера обработки информации полученных динамических характеристик. Модуль анализа обеспечивает проверку корректности поступающих данных от СУПЗ, обработку сохраненных данных системного мониторинга, расчет интегральных характеристик, сохранение обработанных и данных в базу данных сервера обработки информации, формирование шаблонов визуализации результатов обработки. Данные системного мониторинга включают поток информации от отдельных датчиков системного мониторинга с указанием времени и/или места съема значения и/или идентификации источника от каждой из доступных систем мониторинга. Данные СУПЗ включают сведения о каждом запуске, по крайней мере, следующие: время начала выполнения задачи и/или постановки ее в очередь выполнения, и/или время завершения выполнения, и/или время ожидания, и/или время счета, и/или перечень выделенных вычислительных узлов, и/или число выделенных вычислительных ядер, и/или объем затраченных процессоро-часов, и/или строку запуска, и/или раздел вычислительной системы, и/или статус выполнения задачи. Отчет в качестве результата обработки информации представляет собой набор текстовой и/или табличной, и/или графической информации, отражающей общие сведения, сведения о динамических и интегральных характеристиках анализируемой задачи. В качестве общих сведений в отчет включают время начала выполнения задачи и/или постановки ее в очередь выполнения и/или время завершения выполнения, и/или время ожидания, и/или время счета, и/или перечень выделенных вычислительных узлов, и/или число выделенных вычислительных ядер, и/или объем затраченных процессоро-часов, и/или строку запуска, и/или раздел вычислительной системы, и/или статус выполнения задачи. В качестве интегральных характеристик в отчет включают минимальные, максимальные, средние (или медианные) значения динамических характеристик за время выполнения задачи с указанием превышения пороговых значений. В качестве динамических характеристик в отчет включают временные ряды, отражающие значения динамических значений от системы мониторинга, например CPU_user, LoadAverage, число операций с плавающей точкой, интенсивность сетевого обмена, интенсивность использования ввода/вывода, интенсивность обмена с памятью и характеристики использования кэш-памяти.When processing the data collected by the aggregation and analysis modules, the processing results are saved to the database of the information processing server, which provides storage of task data, storage of dynamic characteristics, storage of integral characteristics. The aggregation module provides alignment of incoming system monitoring data, converting data to uniform time intervals, thinning, filtering monitoring data, generating dynamic characteristics, and storing the obtained dynamic characteristics in the database of the information processing server. The analysis module provides verification of the correctness of the incoming data from the control system, processing the saved system monitoring data, calculating the integral characteristics, storing the processed data in the database of the information processing server, and generating visualization templates for the processing results. System monitoring data includes the flow of information from individual system monitoring sensors indicating the time and / or place of reading the value and / or identification of the source from each of the available monitoring systems. The data of the CPS includes information about each launch, at least the following: the time the task started and / or queued, and / or the time completed, and / or the waiting time, and / or the counting time, and / or list allocated computing nodes, and / or the number of allocated computing cores, and / or the amount of processor hours spent, and / or the launch line, and / or section of the computing system, and / or the status of the task. A report as a result of information processing is a set of textual and / or tabular and / or graphical information that reflects general information, information about the dynamic and integral characteristics of the analyzed problem. As general information, the report includes the start time of the task and / or putting it in the execution queue and / or the completion time, and / or the wait time, and / or the counting time, and / or the list of allocated computing nodes, and / or the number allocated computing cores, and / or the amount of processor hours spent, and / or a launch line, and / or a section of a computing system, and / or task execution status. As integral characteristics, the report includes the minimum, maximum, average (or median) values of the dynamic characteristics during the execution of the task indicating the excess of threshold values. As dynamic characteristics, the report includes time series that reflect the values of dynamic values from the monitoring system, for example CPU_user, LoadAverage, the number of floating-point operations, network exchange rate, I / O utilization rate, memory exchange rate and cache usage characteristics.

Заявляемая группа изобретений поясняется следующими чертежами.The claimed group of inventions is illustrated by the following drawings.

На фиг. 1 схематично представлена взаимосвязь узлов и программных модулей, входящих в состав заявляемой системы.In FIG. 1 schematically presents the relationship of nodes and software modules that are part of the inventive system.

На фиг. 2 представлен пример отчета, представленного системой.In FIG. 2 shows an example of a report submitted by the system.

На фиг. 3-7 приведен пример графиков, получаемых в качестве отчетов по отдельным задачам, а также интерпретация графической информации пользователем.In FIG. Figure 3-7 shows an example of graphs obtained as reports on individual tasks, as well as interpretation of graphical information by the user.

Позициями на чертежах обозначены:The positions in the drawings indicate:

1 - вычислительный узел;1 - computing node;

2 - система мониторинга;2 - monitoring system;

3 - система управления потоком задач;3 - task flow control system;

4 - сервер обработки и хранения информации;4 - server processing and storage of information;

5 - модуль анализа;5 - analysis module;

6 - модуль агрегации;6 - aggregation module;

7 - база данных;7 - database;

8 - отчет;8 - report;

9 - средство визуализации.9 - visualization tool.

Заявляемая система представляет собой техническое решение, обеспечивающее возможность получения, анализа и интерпретации данных, характеризующих качество выполнения задачи на суперкомпьютере.The inventive system is a technical solution that provides the ability to obtain, analyze and interpret data characterizing the quality of the task on a supercomputer.

Система включает как аппаратные средства ее реализации, так и программный комплекс, обеспечивающий возможность реализации запросов пользователя.The system includes both hardware for its implementation and a software package that provides the ability to implement user requests.

Аппаратная часть системы включает множество вычислительных узлов 1, составляющих суперкомпьютерную вычислительную систему. Кроме того, заявляемая система включает сервер обработки и хранения информации 4, а также средство визуализации обработанной информации 9, в качестве которого может быть использован, например, ПК/ноутбук/планшет/телефон с доступом к Интернет и веб-браузером.The hardware of the system includes many computing nodes 1 constituting a supercomputer computing system. In addition, the inventive system includes a server for processing and storing information 4, as well as a means of visualizing the processed information 9, which can be used, for example, a PC / laptop / tablet / phone with Internet access and a web browser.

Программный комплекс заявляемого изобретения реализован следующим образом.The software package of the claimed invention is implemented as follows.

Каждый вычислительный узел 1 снабжен средствами СУПЗ 3, предоставляющими данные о статусе задач, их распределении по узлам и характере использования разного рода ресурсов на узлах, и средствами СМ 2, предоставляющими данные о характере использования разного рода ресурсов на узлах по времени. На сервере обработки и хранения работают модуль агрегации 6, обеспечивающий обработку данных с датчиков системного мониторинга каждого вычислительного узла, модуль анализа 5, обеспечивающий обработку данных СУПЗ и обработку сохраненных данных системного мониторинга, а также база данных 7, в которой хранятся данные о задачах, и рассчитанные динамические и интегральные характеристики выполняемой задачи. Кроме того, для обеспечения возможности получения пользователем отчета о задаче 8, средства визуализации 9 оснащены веб-браузерами, поддерживающими JavaScript, например Chrome, FireFox, Safari, IE и др.Each computing node 1 is equipped with CPS 3 tools that provide data on the status of tasks, their distribution among nodes and the nature of the use of various kinds of resources on nodes, and SM 2 tools, which provide data on the nature of various kinds of resources used on nodes in time. Aggregation module 6, which provides data processing from the system monitoring sensors of each computing node, analysis module 5, which provides processing of the CPSS data and processing of the stored system monitoring data, as well as a database 7, in which task data is stored, operate on the processing and storage server, and calculated dynamic and integral characteristics of the task being performed. In addition, to enable the user to receive a report on task 8, visualization tools 9 are equipped with web browsers that support JavaScript, such as Chrome, FireFox, Safari, IE, etc.

Говоря о системе мониторинга на вычислительных узлах суперкомпьютерной установки, подразумевают, что в каждом узле вычислительной системы, помимо основных вычислительных средств, присутствуют датчики системного мониторинга, которые предоставляют информацию о множестве характеристик состояния программно-аппаратной среды на узле. Для каждой характеристики данные представляют собой последовательность пар значения характеристики и времени ее измерения (Vi,Ti).Speaking about the monitoring system on the computing nodes of a supercomputer installation, it is understood that in every node of the computing system, in addition to the main computing means, there are system monitoring sensors that provide information about many characteristics of the state of the software and hardware environment on the node. For each characteristic, the data is a sequence of pairs of values of the characteristic and the time of its measurement (Vi, Ti).

Заявляемый способ реализуют следующим образом.The inventive method is implemented as follows.

Датчиками системы мониторинга с узлов вычислительной системы постоянно собираются данные о состоянии и степени использования доступных ресурсов (процессор, память, сеть и др.). Датчики фиксируют время, место съема значения и обеспечивают идентификацию источника от каждой из доступных систем мониторинга.The sensors of the monitoring system from the nodes of the computing system constantly collect data on the status and degree of use of available resources (processor, memory, network, etc.). Sensors record the time, place of reading the value and provide identification of the source from each of the available monitoring systems.

Каждый датчик имеет уникальный идентификатор, возможность получить свое значение от операционной системы, переменных окружения или же от доступных интерфейсов аппаратуры через программу-агент. Примеры таких датчиков приведены в таблице 1.Each sensor has a unique identifier, the ability to get its value from the operating system, environment variables, or from the available hardware interfaces through an agent program. Examples of such sensors are shown in table 1.

Система мониторинга периодически получает значения от всех датчиков. Каждая отдельная система мониторинга имеет собственный набор доступных датчиков, существенно зависящий как от особенностей аппаратуры, так и от настроек программного окружения.The monitoring system periodically receives values from all sensors. Each individual monitoring system has its own set of available sensors, which significantly depends on both the features of the equipment and the settings of the software environment.

Зачастую на практике число снимаемых с процессора датчиков сильно ограничено и приходится из них выбирать самые важные. С другой стороны, анализ на основе выбранных характеристик должен давать по возможности всестороннюю картину исследуемого потока задач.Often, in practice, the number of sensors removed from the processor is very limited and you have to choose the most important ones from them. On the other hand, an analysis based on the selected characteristics should give as comprehensive a picture of the investigated stream of tasks as possible.

СУПЗ предоставляет в модуль агрегации данные о задачах: времена постановки в очередь, запуска, завершения, статус, распределение по узлам, строка запуска и др.CPS provides the aggregation module with task data: queuing, launch, completion times, status, distribution by nodes, launch string, etc.

Модуль агрегации, исходя из специфики исследований, приводит поступающие данные системного мониторинга к единым временным интервалам (обычно 5 мин) и фильтрует, получая тем самым из данных мониторинга динамические характеристики, и сохраняет их в одну таблицу. Динамические характеристики состоят для каждого интервала и каждого наблюдаемого параметра из трех значений: среднее за интервал, минимальное и максимальное значения за интервал.The aggregation module, based on the specifics of the research, converts the incoming system monitoring data to single time intervals (usually 5 minutes) and filters it, thereby obtaining dynamic characteristics from the monitoring data, and saves them in one table. The dynamic characteristics consist for each interval and each observed parameter of three values: the average for the interval, the minimum and maximum values for the interval.

Динамические характеристики запусков приложений пользователей доступны администраторам системы в полном объеме, а обычным пользователям доступ предоставляется только к запускам собственных приложений. Наибольший интерес представляет доля пользовательских процессов в общей загрузке процессора - CPU user time (время, затраченное на работу программ пользователей), в наибольшей мере отражающая загрузку процессора приложением. Для более подробного исследования поведения приложения можно использовать и другие датчики, однако при общих исследованиях, в том числе при определении типовых профилей использования суперкомпьютерных систем, предполагающих всестороннюю оценку как приоритетную, достаточно ограничиться включением CPU User в список ключевых характеристик.The dynamic characteristics of user application launches are available to system administrators in full, and for ordinary users access is granted only to launches of their own applications. Of greatest interest is the share of user processes in the total processor load - CPU user time (the time spent on user programs), which most reflects the processor load by the application. For a more detailed study of application behavior, you can use other sensors, however, for general research, including the determination of typical profiles for using supercomputer systems that require a comprehensive assessment as a priority, it suffices to limit the inclusion of CPU User in the list of key characteristics.

Среди наиболее часто используемых датчиков также можно выделить датчики, фиксирующие загрузку процессора; число операций с плавающей точкой; число процессов, готовых принять управление (Load Average); интенсивность межузлового обмена; интенсивность ввода/вывода; число промахов при доступе к кэш-памяти. Конечно, такой список может как расширяться с добавлением к рассмотрению новых датчиков, так и быть улучшен с точки зрения частоты получения исследуемых характеристик.Among the most commonly used sensors, sensors that record the processor load can also be distinguished; number of floating point operations; the number of processes ready to take control (Load Average); Intensite exchange rate I / O rate The number of misses when accessing the cache. Of course, such a list can both expand with the addition of new sensors to the review, and be improved in terms of the frequency of obtaining the studied characteristics.

Модуль анализа, получая данные от СУПЗ об изменении статуса задачи, проверяет корректность поступивших данных и сохраняет их в базу данных (БД).The analysis module, receiving data from the CPS about the change in the status of the task, checks the correctness of the received data and saves them to the database (DB).

Если модуль анализа получает данные о завершении задачи, он осуществляет выборку динамических характеристик по времени и узлам работы задачи и строит интегральные характеристики (средние по динамическим характеристикам), которые затем сохраняет в БД. Вместе с тем к интегральным характеристикам могут быть отнесены и принадлежность к классам по уровню среднего использования ресурсов (например, превышение порога), и др. подобная обработка.If the analysis module receives data on the completion of the task, it selects the dynamic characteristics by time and the nodes of the task and builds the integral characteristics (average over the dynamic characteristics), which it then stores in the database. At the same time, belonging to classes by the level of average resource use (for example, exceeding the threshold), and other similar processing can be attributed to integral characteristics.

Интегральные характеристики приложений представляют собой усредненные (либо медианы) данные соответствующих динамических характеристик для данной задачи, а также данные о выделенных и истраченных задачей ресурсах: раздел, время, число узлов и ядер.The integral characteristics of applications are averaged (or medians) data of the corresponding dynamic characteristics for a given task, as well as data on the resources allocated and spent by the task: partition, time, number of nodes and cores.

Данные разных потоков данных относительно конкретной задачи связываются через временной интервал выполнения задачи и через идентификатор вычислительных узлов, на которых выполнялась задача. Такие данные всегда доступны как для потоков данных системного мониторинга, так и от системы управления потоком заданий по каждой из задач.The data of different data streams relative to a specific task are connected through the time interval of the task and through the identifier of the computing nodes on which the task was performed. Such data is always available both for system monitoring data streams and from the task flow control system for each task.

Данные из БД могут быть извлечены для анализа при помощи модуля анализа. В качестве БД для хранения данных используют, например, базы данных Cassandra и MongoDB.Data from the database can be extracted for analysis using the analysis module. As a database for data storage, for example, Cassandra and MongoDB databases are used.

При поступлении запроса о списке задач с пользовательского интерфейса из веб-браузера (переход по ссылке) модуль анализа осуществляет выборку из БД списка задач, их интегральных характеристик, меток (тегов) принадлежности к тем или иным классам (если таковые были). Возможна фильтрация и уточнение запроса через пользовательский интерфейс (подготовленные ссылки или уточнение SQL запроса вручную). Результат выборки подставляется в шаблон визуализации «список задач» с цветовой индикацией превышения определенных порогов интегральными характеристиками и передается по http на клиентскую сторону, где визуализируется веб-браузером. В списке имеется возможность перейти по ссылке к отчету отдельной задачи.When a request for a list of tasks is received from the user interface from a web browser (following the link), the analysis module selects from the database a list of tasks, their integral characteristics, labels (tags) of belonging to certain classes (if any). It is possible to filter and refine the query through the user interface (prepared links or refine the SQL query manually). The selection result is inserted into the “task list” visualization template with color indication of exceeding certain thresholds by integral characteristics and transmitted via http to the client side, where it is visualized by a web browser. In the list there is an opportunity to follow the link to the report of an individual task.

При поступлении запроса о конкретной задаче с пользовательского интерфейса из веб-браузера (переход по ссылке) модуль анализа осуществляет выборку из БД данных об этой задаче из списка задач, ее интегральных характеристик, меток (тегов) принадлежности к тем или иным классам (если таковые были), установленных для данной задачи. Результат запроса подставляется в шаблон визуализации и передается на клиентскую сторону, где визуализируется веб-браузером.When a request for a specific task is received from the user interface from a web browser (following the link), the analysis module selects from the database data about this task from the list of tasks, its integral characteristics, labels (tags) of belonging to certain classes (if any ) established for this task. The query result is inserted into the visualization template and transmitted to the client side, where it is visualized by a web browser.

Одна из задач модуля анализа - это выполнение обработки данных. При этом выполнение анализа может быть инициировано из самых разнообразных частей системы. Например, запрос на анализ может прийти от средства визуализации через центр обработки запросов (ЦОЗ) в результате работы пользователя с системой через веб-браузер. В этом случае основной целью запроса может быть анализ динамики поведения параллельной программы. Анализ данных может быть инициирован системными процессами, например, по таймеру раз в сутки для построения ежедневного отчета о работе суперкомпьютера. Или же данные могут быть запрошены внешними системами интеграции и визуализации.One of the tasks of the analysis module is to perform data processing. Moreover, the analysis can be initiated from the most diverse parts of the system. For example, an analysis request may come from a visualization tool through a request processing center (CSP) as a result of a user working with the system through a web browser. In this case, the main goal of the request may be to analyze the dynamics of the behavior of the parallel program. Data analysis can be initiated by system processes, for example, by a timer once a day to build a daily report on the operation of a supercomputer. Or data may be requested by external integration and visualization systems.

В запросе указываются характеристики завершенной задачи, а также шаблон визуализации для формирования отчета. Адрес этого отчета хранится в отдельном файле вместе с отчетом о задаче, и пользователь может впоследствии открыть его в браузере.The request indicates the characteristics of the completed task, as well as a visualization template for generating the report. The address of this report is stored in a separate file with the task report, and the user can subsequently open it in a browser.

Сами запросы хранятся в текстовых файлах, называемых «шаблоны запросов». По окончании формирования отчета пользователь может просмотреть его в браузере. Основу отчета составляет информация о данных мониторинга выбранного приложения. В отчет можно включить любое количество графиков и диаграмм, отражающих различные параметры работы приложения - загрузка процессора.The queries themselves are stored in text files called “query templates”. Upon completion of the report, the user can view it in a browser. The basis of the report is information about the monitoring data of the selected application. You can include any number of graphs and charts in the report that reflect various parameters of the application - the processor load.

Отчет, в частности, может содержать следующие блоки:A report, in particular, may contain the following blocks:

1. общие данные о задаче: id, владелец, список узлов, статус, раздел запуска, время постановки на счет, запуска и окончания, строка запуска, объем процессоро-часов, число выделенных ядер и др.;1. general data about the task: id, owner, list of nodes, status, launch section, time of enrolling, starting and ending, launch line, processor hours, number of allocated cores, etc .;

2. интегральные характеристики задачи с цветовой индикацией превышения определенных порогов;2. integral characteristics of the task with color indication of exceeding certain thresholds;

3. список меток принадлежности классам (tags, теги), если была определена соответствующая принадлежность задачи модулем анализа при построении интегральных характеристик;3. a list of labels of belonging to classes (tags, tags), if the corresponding belonging of the task to the analysis module was determined when constructing the integral characteristics;

4. графики, отражающие поведение динамических характеристик, на основе выборки из БД по узлам и времени выполнения задачи.4. graphs reflecting the behavior of dynamic characteristics based on a selection from the database by nodes and task execution time.

Пример конкретного выполненияConcrete example

Ниже приведен пример графиков, получаемых в качестве отчетов по отдельным задачам, а также интерпретация графической информации пользователем.Below is an example of graphs received as reports on individual tasks, as well as interpretation of graphical information by the user.

Время работы программы: 9 часов 38 минут.Program run time: 9 hours 38 minutes.

Число задействованных ядер: 112.The number of involved cores: 112.

Использовался раздел с дисками: нетUsed partition with disks: no

На графике (фиг. 3) загрузки процессора наблюдается периодичность, что указывает на итерационную структуру алгоритма. Особенностью данного графика является большая разница между максимальной и минимальной загрузкой процессоров. Максимальная загрузка почти не отличается от 100%, тогда как минимальная почти все время находится около 0%.On the graph (Fig. 3) of the processor load, periodicity is observed, which indicates the iterative structure of the algorithm. A feature of this graph is the big difference between the maximum and minimum processor load. The maximum load almost does not differ from 100%, while the minimum load is almost 0% almost all the time.

График кэш-промахов первого уровня (фиг. 4) коррелирует с графиком загрузки процессора. Особенностью графика является то, что на каждой итерации всплеск числа кэш-промахов приходится на начало итерации. Это коррелирует со всплесками минимальной загрузки процессоров.The schedule of cache misses of the first level (Fig. 4) correlates with the schedule of processor loading. A feature of the graph is that at each iteration, a surge in the number of cache misses occurs at the beginning of the iteration. This correlates with bursts of minimal CPU utilization.

График количества кэш-промахов L2 (фиг. 5) повторяет график промахов в кэш первого уровня. Однако уровень числа промахов ниже.The graph of the number of cache misses L2 (Fig. 5) repeats the graph of misses in the cache of the first level. However, the number of misses is lower.

График активности сети Ethernet (фиг. 6) указывает на всплески активности с равными периодами. Интенсивность использования сети в эти моменты доходит до 50 МБ/сек. Это довольно большая активность, однако такая загрузка сети происходит через равные и достаточно продолжительные промежутки времени. Поэтому средняя загрузка сети составляет всего 0,13 МБ/сек.The Ethernet activity graph (Fig. 6) indicates bursts of activity with equal periods. The intensity of network use at these moments reaches 50 MB / s. This is quite a lot of activity, but such a network load occurs at equal and fairly long periods of time. Therefore, the average network load is only 0.13 MB / s.

На графике скорости передачи данных по InfiniBand (фиг. 7) видна корреляция с графиком кэш-промахов. Общая высокая загрузка сети в 104,11 МБ/сек говорит о том, что коммуникации между процессами содержат не очень большой объем данных. Процессы обмениваются данными на начале каждой итерации.On the graph of the data transfer rate by InfiniBand (Fig. 7), a correlation with the cache miss graph is visible. The overall high network load of 104.11 MB / s suggests that communications between processes do not contain a very large amount of data. Processes exchange data at the beginning of each iteration.

Данный профиль показывает зависимость повышения количества кэш-промахов разного уровня от передачи новых данных из файла по сети Ethernet. Профиль отражает зависимость активности передачи данных по сети активностью сети InfiniBand и количеством кэш-промахов. Такая зависимость отражает итерационную структуру алгоритма, при которой на каждой итерации происходит обмен данными с файлами, потому и повышается число кэш-промахов. Ожидание новых данных понижает загрузку процессора, что видно на графике использования процессора. Отличительной особенностью данного графика является большой разброс между максимальным и минимальным значением загрузки процессора и кэш-промахов. Это показывает разбалансированность задачи: часть процессов занята вычислениями, а другая простаивает. Это отражает средняя загрузка процессора в 55,7%. Потому как, судя по показаниям датчиков, часть процессоров загружена с уровнем, близким к 100%, в то время как минимальная загрузка других процессоров колеблется около 0. Это и говорит о том, что вычисления распределены неравномерно.This profile shows the dependence of increasing the number of cache misses of different levels on the transfer of new data from a file over an Ethernet network. The profile reflects the dependence of the activity of data transmission over the network, the activity of the InfiniBand network and the number of cache misses. This dependence reflects the iterative structure of the algorithm, in which data is exchanged with files at each iteration, and therefore the number of cache misses increases. Waiting for new data reduces processor load, as can be seen on the graph of processor usage. A distinctive feature of this graph is the wide variation between the maximum and minimum values of the processor load and cache misses. This shows the imbalance of the task: some processes are busy computing, while the other is idle. This reflects an average processor load of 55.7%. Because, judging by the readings of the sensors, some of the processors are loaded with a level close to 100%, while the minimum load of other processors fluctuates around 0. This suggests that the calculations are distributed unevenly.

Предложенный подход к анализу позволяет эффективно и технологически просто получить качественную оценку свойств реального потока задач, на основе которого можно судить об утилизации ресурсов суперкомпьютера, выделить проблемные места архитектуры и наметить возможные направления ее оптимизации.The proposed approach to analysis allows one to efficiently and technologically justify a qualitative assessment of the properties of a real task flow, on the basis of which one can judge the utilization of supercomputer resources, identify problem areas of the architecture and outline possible directions for its optimization.

Claims

1. The system of interpretation and analysis of the dynamic properties of problems solved by a supercomputer, including:

a set of computing nodes of a supercomputer, each of which is equipped with the means of a monitoring system (SM) and a task flow control system (CPS),

an information processing server including a module for aggregating data from monitoring systems of each computing node, a data analysis module from a task flow control system of each computing node, and also a data storage unit,

means for visualizing processing results,

wherein

SM tools include system monitoring sensors, which provide information on the status and degree of use of available resources from each of the available monitoring systems,

CPS tools provide the ability to obtain information about the status of tasks, their distribution among nodes and the nature of the use of resources at computing nodes.

2. A method for interpreting and analyzing the dynamic properties of problems solved by a supercomputer, comprising the following steps:

- collecting data from computing nodes from sensors of system monitoring of the computing node of a supercomputer for each individual task during its implementation and placing the collected data into an aggregation module;

- collecting data from the task flow control system for each individual task and putting the collected data into the analysis module;

- processing the collected data in the information processing server, linking them to the same tasks to be performed;

- formation by the server of the information processing of the report on the task at the request of the user, including the results of processing the data collected for each task;

- visualization of the report.

3. The method according to p. 2, characterized in that when processing the data collected by the aggregation and analysis modules, the processing results are stored in the database of the information processing server, providing storage of task data, storage of dynamic characteristics, storage of integral characteristics.

4. The method according to claim 2, characterized in that the aggregation module provides alignment of the incoming system monitoring data, converting the data to uniform time intervals, thinning, filtering the monitoring data, generating dynamic characteristics, storing the obtained dynamic characteristics in the database of the information processing server.

5. The method according to claim 2, characterized in that the analysis module verifies the correctness of the incoming data from the control system, processes the stored system monitoring data, calculates the integral characteristics, stores the processed data in the database of the information processing server, generates visualization templates for the processing results.

6. The method according to p. 2, characterized in that the system monitoring data includes the flow of information from individual system monitoring sensors indicating the time and / or location of the value and / or source identification from each of the available monitoring systems.

7. The method according to p. 2, characterized in that the data CPS includes information about each start, at least the following: the time the task started and / or put it in the execution queue, and / or the time it ended, and / or the time expectations, and / or counting time, and / or a list of allocated computing nodes, and / or the number of allocated computing cores, and / or the amount of processor hours spent, and / or the launch line, and / or section of the computing system, and / or status complete the task.

8. The method according to p. 2, characterized in that the report as a result of information processing is a set of textual and / or tabular and / or graphical information that reflects general information, information about the dynamic and integral characteristics of the analyzed problem.

9. The method according to p. 8, characterized in that the general information in the report includes the start time of the task and / or putting it in the execution queue, and / or the completion time of the execution, and / or the waiting time, and / or the counting time , and / or a list of allocated computing nodes, and / or the number of allocated computing cores, and / or the amount of processor hours spent, and / or a launch line, and / or a section of a computing system, and / or task execution status.

10. The method according to p. 8, characterized in that the integral characteristics of the report include the minimum, maximum, average (or median) values of the dynamic characteristics during the execution of the task indicating the excess of threshold values.

11. The method according to claim 8, characterized in that the dynamic characteristics of the report include time series that reflect the values of the dynamic values from the monitoring system, for example CPU_user, LoadAverage, the number of operations with a floating point, the intensity of network exchange, the intensity of I / O , memory exchange rate and cache usage characteristics.