RU2815189C1

RU2815189C1 - Method for time synchronization of operation of massively parallel computing system with distributed memory

Info

Publication number: RU2815189C1
Application number: RU2022134925A
Authority: RU
Inventors: Алексей Александрович Толмачев; Дмитрий Сергеевич Викторов
Filing date: 2022-12-27
Publication date: 2024-03-12

Abstract

FIELD: computer engineering.

SUBSTANCE: invention relates to computer engineering for processing digital data. Technical result is achieved by determining number of fragment, having minimum value of cardinalities of operators sets, including the time tier operator in the fragment and calculating the fragmentation complexity for the current fragment; calculating total complexity of task fragmentation; clustering fragments; evaluating the current state of the resource of the processors with cache memory and the communication network; setting a one-to-one correspondence between clusters of fragments and a subset of currently free processors with cache memory of computing nodes; evaluation of a specific type of processors with cache memory; generating a plurality of fragments for superscalar processors and super-long command line processors; evaluation of a specific type of communication network topology; estimating maximum time spent on messaging; development of text specifications of threads of parallel program with time parameterization, taking into account the time of transmitting/receiving data into the transmitting and receiving side buffers, and adding built-in time synchronization means to the code for each processor with cache memory of computing nodes; evaluation of correctness of results of development of parallel programs with time parameterisation; compiling the generated parallel code with time parameterisation.

EFFECT: high efficiency of digital information processing (reduced time for solving a problem) of a computer system.

1 cl, 2 dwg

Description

Изобретение относится к области обработки цифровых данных с помощью электронных устройств, а именно к способам временной синхронизации работы вычислительной системы, предназначенных для массивно-параллельных вычислительных систем с распределенной памятью (Massive Parallel Processing (МРР)) и может быть использовано для сокращения времени обработки цифровых данных вычислительными системами МРР.The invention relates to the field of digital data processing using electronic devices, namely to methods for time synchronization of a computer system intended for massively parallel computing systems with distributed memory (Massive Parallel Processing (MPP)) and can be used to reduce the processing time of digital data MRR computing systems.

Известен способ построения программы, заключающийся в определении в исходном коде программы на ассемблере помеченные циклы и классифицируют их на несколько предопределенных типов, выравнивают адреса начала помеченных циклов, если это требуется для цикла данного типа путем добавления ассемблерных инструкций и, сохраняя исходный код на ассемблере в памяти, строят путем компиляции и компоновки модифицированный ассемблерный код для устройства назначения [1].There is a known method for constructing a program, which consists in defining marked loops in the source code of an assembler program and classifying them into several predefined types, aligning the start addresses of the marked loops, if required for a loop of a given type, by adding assembly instructions and storing the assembler source code in memory , build a modified assembly code for the destination device by compiling and linking [1].

Известен способ создания параллельной программы с временной параметризацией многопроцессорных вычислительных систем с одинаковым доступом к памяти, заключающийся в том, что для реализации способа устройство управления, блок выборки инструкций и арифметико-логическое устройство выполняют следующие операции: получает упорядоченный выбор лексем исходной последовательной программы и формирует для лексемы ее дескриптора; получает упорядоченный выбор дескрипторов лексем из сформированного на предшествующем этапе множества дескрипторов; формирует новые структуры спецификации программы с детализацией до операций/функций; формирует новые структуры спецификации программы с детализацией до фрагментов; проверяет эквивалентности текстовой спецификации программы задачи и ее представления новыми структурами спецификации; рассчитывает для операторов новой спецификации значения приоритетов; формирует множества операторов-претендентов на начало выполнения в момент времени; назначает операторы для реализации на свободный процессор; разрабатывает текстовые спецификации нитей параллельной программы с временной параметризацией; оценивает корректность результатов разработки параллельных программ с временной параметризацией; компилирует созданный параллельный код с временной параметризацией [2].There is a known method for creating a parallel program with time parameterization of multiprocessor computer systems with the same access to memory, which consists in the fact that to implement the method, the control device, the instruction fetch unit and the arithmetic-logical unit perform the following operations: receives an ordered selection of tokens of the original sequential program and forms for tokens of its descriptor; receives an ordered selection of lexeme descriptors from the set of descriptors generated at the previous stage; forms new program specification structures with detail down to operations/functions; forms new program specification structures with detail down to fragments; checks the equivalence of the text specification of the task program and its representation by new specification structures; calculates priority values for operators of the new specification; generates sets of operators-candidates for the start of execution at a moment in time; assigns operators for implementation on a free processor; develops text specifications of parallel program threads with time parameterization; evaluates the correctness of the results of developing parallel programs with time parameterization; compiles the generated parallel code with time parameterization [2].

Недостатком данных способов является отсутствие учета конкретной архитектуры массивно-параллельных вычислительных систем с распределенной памятью и отсутствие учета параметра времени.The disadvantage of these methods is the lack of consideration of the specific architecture of massively parallel computing systems with distributed memory and the lack of consideration of the time parameter.

Одним из возможных путей повышения эффективности обработки информации является организация распределения данных программы между вычислительными узлами. Однако при распределении данных программы между вычислительными узлами возникают проблемы синхронизации вычислительных узлов, что приводит к снижению эффективности цифровой обработки информации, что необходимо учитывать при выборе моментов времени начала выполнения фрагментов/операторов при параллельном выполнении исходной программы.One of the possible ways to increase the efficiency of information processing is to organize the distribution of program data between computing nodes. However, when distributing program data between computing nodes, problems arise with the synchronization of computing nodes, which leads to a decrease in the efficiency of digital information processing, which must be taken into account when choosing the start time of execution of fragments/operators during parallel execution of the original program.

Цель изобретения - повысить эффективность цифровой обработки информации (снижение времени решения задачи) за счет временной синхронизации работы массивно-параллельной вычислительной системы с распределенной памятью.The purpose of the invention is to increase the efficiency of digital information processing (reducing the time to solve a problem) due to time synchronization of the operation of a massively parallel computing system with distributed memory.

Указанная цель достигается способом временной синхронизации работы массивно-параллельной вычислительной системы с распределенной памятью, заключающимся в выполнении следующих процедур:This goal is achieved by a method of time synchronization of the operation of a massively parallel computing system with distributed memory, which consists in performing the following procedures:

1. Упорядоченный выбор лексем исходной последовательной программы, определение их принадлежности к тому или иному классу лексем языка программирования высокого уровня, проверке принадлежности лексемы сформированному (к рассматриваемому моменту времени) множеству лексем, включении лексемы в формируемое множество лексем соответствующего типа (в случае ее отсутствия в составе множества), формировании для лексемы ее дескриптора, задающего числовое кодирование необходимых атрибутов.1. Ordered selection of tokens of the original sequential program, determination of their belonging to one or another class of tokens of a high-level programming language, verification of the token's membership in the generated (at the considered point in time) set of tokens, inclusion of the token in the generated set of tokens of the corresponding type (if it is not present in composition of the set), the formation of its descriptor for the lexeme, which specifies the numeric encoding of the necessary attributes.

2. Упорядоченный выбор дескрипторов лексем из сформированного на предшествующем этапе множества дескрипторов, определении соответствующей рассматриваемому дескриптору конструкции языка программирования высокого уровня, формировании для этой конструкции постфиксной спецификации на основе метода формирования обратной польской записи в соответствии с алгоритмом Дейкстры на основе применения «механизма стека» с приоритетами, позволяющего изменить порядок следования символов операндов и операций.2. Ordered selection of lexeme descriptors from the set of descriptors formed at the previous stage, determining the high-level programming language construction corresponding to the descriptor under consideration, forming a postfix specification for this construction based on the method of generating reverse Polish notation in accordance with Dijkstra’s algorithm based on the use of the “stack mechanism” with priorities, which allows you to change the order of symbols of operands and operations.

3. Формирование новых структур спецификации программы, описывающих исходную программу с детализацией до операций/функций:3. Formation of new program specification structures that describe the original program with detail down to operations/functions:

- выделение из множества элементов постфиксной спецификации программы подмножества операторов операций/функций программы;- selection from a set of elements of a postfix program specification of a subset of program operation statements/functions;

- сквозную нумерацию операторов;- continuous numbering of operators;

- сквозную нумерацию входов для каждого оператора и выходов;- continuous numbering of inputs for each operator and outputs;

- ввод числового кодирования типов операторов на основе постфиксного представления каждой операции/функции;- input of numeric coding of operator types based on the postfix representation of each operation/function;

- формирование для каждого оператора сформированной числовой спецификации множества номеров его операндов и задание его мощности;- formation for each operator of a generated numerical specification of a set of numbers of its operands and setting its power;

- формирование для каждого оператора множества его внешних операторов (использующих результаты выполнения оператора) и задание его мощности;- formation for each operator of a set of its external operators (using the results of the operator’s execution) and setting its power;

- формирование, исходя из постфиксного представления операций/функций, для каждого оператора соответствующих меток.- formation, based on the postfix representation of operations/functions, for each operator of the corresponding labels.

4. Формирование новых структур спецификации программы, описывающих исходную программу с детализацией до фрагментов:4. Formation of new program specification structures that describe the original program with detail down to fragments:

- выделение из множества операторов основной структуры подмножеств операторов, имеющих одинаковое значение номера фрагмента:- selection from the set of operators of the main structure of subsets of operators that have the same fragment number value:

- определение для каждого подмножества номеров и типов входных и выходных операторов, фиксация их управляющих связей и соответствующих меток передач управления;- determination for each subset of numbers and types of input and output operators, fixation of their control connections and corresponding control transfer labels;

- формирование в числовом формате основной, связной и временной структуры, специфицирующих типы и схему управляющих связей фрагментов.- formation in numerical format of the main, connected and temporary structure, specifying the types and scheme of control connections of fragments.

Результатами являются следующие сформированные новые структуры спецификации программы на этом уровне:The results are the following generated new program specification structures at this level:

- основная структура фрагментов: номер фрагмента; метки фрагментов; типы фрагментов; указатели на номер фрагмента; мощность сопряженного множества для фрагмента; указатель на начало последовательности номеров фрагментов, образующих внешнее множество фрагмента; мощность внешнего множества для фрагмента; метки фрагментов безусловного перехода и условного перехода по значению «истина»; метки фрагментов условного перехода по значению «ложь» программы.- basic structure of fragments: fragment number; fragment labels; types of fragments; pointers to the fragment number; cardinality of the conjugate set for a fragment; pointer to the beginning of the sequence of fragment numbers that form the external set of the fragment; cardinality of the outer set for the fragment; labels of fragments of unconditional transition and conditional transition by value “true”; labels of fragments of a conditional jump based on the value “false” of the program.

- связная структура фрагментов: номер строк структуры связей; указатель на продолжение последовательности номеров фрагментов, образующих сопряженное множество фрагмента для рассматриваемого фрагмента; сопряженное множество фрагмента для рассматриваемого фрагмента; указатель на продолжение последовательности номеров фрагмента, образующих внешнее множество фрагмента для рассматриваемого фрагмента; внешнее множество фрагмента.- connected structure of fragments: number of lines of the connection structure; a pointer to the continuation of the sequence of fragment numbers forming the conjugate fragment set for the fragment in question; the conjugate set of a fragment for the fragment under consideration; a pointer to the continuation of the sequence of fragment numbers forming the external fragment set for the fragment in question; the outer set of the fragment.

- временная структура фрагментов: количество вершин; номер вершины графа; момент времени, в который начинается выполнение инструкции параллельного алгоритма, интерпретируемой вершиной, соответствующей временной параллельной схемой.- temporary structure of fragments: number of vertices; graph vertex number; the point in time at which execution of a parallel algorithm instruction begins, interpreted by the vertex corresponding to the temporary parallel circuit.

5. Оценка корректности сформированных новых структур спецификаций исходной программы с целью проверки эквивалентности текстовой спецификации программы задачи и ее представления новыми структурами спецификации, т.е. оценка равенства количеств сопряженных и внешних связей в основной и связной структурах новой спецификации, оценка соответствия количеств операторов различных типов в основной и связной структурах и исходной программе, оценка соответствия количества входов, выходов и величин их разрядности в основной и связной структурах и в исходной программе, оценка корректности семантики входов и выходов операторов основной и связной структур по сравнению с семантикой (единицами измерения) операндов и результатов выполнения инструкций/функций программы, оценка эквивалентности схем управления, задаваемых основной и связной структурами и текстом программы.5. Assessing the correctness of the generated new specification structures of the original program in order to check the equivalence of the text specification of the task program and its representation by new specification structures, i.e. assessment of the equality of the numbers of conjugate and external connections in the main and connected structures of the new specification, assessment of the correspondence of the numbers of operators of various types in the main and connected structures and the original program, assessment of the correspondence of the number of inputs, outputs and their bit sizes in the main and connected structures and in the original program, assessment of the correctness of the semantics of inputs and outputs of operators of the main and connected structures in comparison with the semantics (units of measurement) of the operands and results of executing instructions/functions of the program, assessment of the equivalence of control schemes specified by the main and connected structures and the program text.

6. Расширение множества операторов и множества связей операторов основной и связной структур новой спецификации исходной программы за счет введения дополнительных переменных, представляющих результаты промежуточных вычислений правых частей операторов присваивания, и соответствующих этим переменным операторов записи данных в память, что обеспечивает переход к конструктивным основному и связному структурам новой спецификации, отображающим полный состав и связи операторов, подлежащих выполнению при решении задачи.6. Expansion of the set of operators and the set of connections between the operators of the main and connected structures of the new specification of the source program through the introduction of additional variables representing the results of intermediate calculations of the right-hand sides of the assignment operators, and the operators corresponding to these variables for writing data into memory, which ensures the transition to constructive main and connected structures of the new specification, displaying the full composition and connections of operators to be executed when solving a problem.

7. Реализация цикла по временным ярусам новой спецификации задачи с целью формирования для задачи множества фрагментов (начало фрагментации).7. Implementation of a cycle across temporary tiers of a new task specification in order to form a set of fragments for the task (beginning of fragmentation).

8. Формирование для множества операторов каждого яруса множества операторов ранжированных по убыванию мощностей их сопряженных множеств операторов.8. Formation for a set of operators of each tier of a set of operators ranked in descending order of power of their conjugate sets of operators.

9. Формирование первого фрагмента включает:9. Formation of the first fragment includes:

- текущее значение количества фрагментов;- current value of the number of fragments;

- в первый фрагмент входит множество операторов, имеющих максимальную мощность сопряженного множества;- the first fragment includes a set of operators that have the maximum cardinality of the conjugate set;

- сложность каждого фрагмента равна сумме мощностей сопряженного и внешнего множеств операторов, входящих в фрагмент.- the complexity of each fragment is equal to the sum of the powers of the conjugate and external sets of operators included in the fragment.

10. Формирование для текущего оператора принадлежащему определенному временному ярусу множества пересечений сопряженных множеств и проверку равенства сопряженного множества пустому множеству с целью проверки факта использования текущими и выходными операторами общих исходных данных.10. Formation for the current operator belonging to a certain time tier of a set of intersections of conjugate sets and checking the equality of the conjugate set to the empty set in order to verify the fact that the current and output operators use common source data.

11. Проверка достижения текущего количества фрагментов заданному и включение текущих операторов в состав следующего фрагмента.11. Checking whether the current number of fragments has reached the specified number and including the current statements in the next fragment.

12. Расчет сложности фрагментации для текущего фрагмента путем сложения сложности фрагментации фрагмента, рассчитанной на предыдущих этапах и мощности сопряженного и внешнего множеств операторов, включенных в фрагмент.12. Calculation of the fragmentation complexity for the current fragment by adding the fragmentation complexity of the fragment calculated at the previous stages and the power of the conjugate and external sets of operators included in the fragment.

13. Проверка окончания рассмотрения всех операторов временных ярусов.13. Checking the end of consideration of all operators of temporary tiers.

14. Формирование для операторов временных ярусов множеств пересечений его сопряженного множества и подмножеств операторов каждого из фрагмента текущего множества сформированных фрагментов.14. Formation for operators of temporary tiers of sets of intersections of its conjugate set and subsets of operators of each fragment of the current set of generated fragments.

15. Определение номера фрагмента, имеющего максимальное значение пересечений, включение оператора временного яруса в состав фрагмента и проведение расчета сложности фрагментации (согласно шага 12) для текущего фрагмента.15. Determining the number of the fragment that has the maximum intersection value, including the temporary tier operator in the fragment and calculating the fragmentation complexity (according to step 12) for the current fragment.

16. Оценка мощностей множеств операторов, включенных в каждый из фрагментов с целью обеспечения равномерности количества операторов во фрагментах и равномерности загрузки ресурса.16. Estimation of the power of the sets of operators included in each of the fragments in order to ensure uniformity of the number of operators in the fragments and uniformity of resource loading.

17. Определение номер фрагмента, имеющего минимальное значение мощностей множеств оператора и включение оператора временного яруса в состав фрагмента и проведение расчета сложности фрагментации (согласно шага 12) для текущего фрагмента.17. Determining the number of the fragment that has the minimum cardinality of the operator sets and including the temporary tier operator in the fragment and calculating the fragmentation complexity (according to step 12) for the current fragment.

18. Расчет общей сложности фрагментации задачи, окончание фрагментации.18. Calculation of the total complexity of task fragmentation, completion of fragmentation.

19. Кластеризация фрагментов в интересах минимизации суммарного времени обмена сообщениями.19. Clustering of fragments in the interests of minimizing the total time of message exchange.

20. Оценка текущего состояния ресурса процессоров и коммуникационной сети, заключающейся в определении состава свободного в текущий момент времени ресурса (с учетом возможности выполнения ранее начатых задач):20. Assessment of the current state of the resource of processors and the communication network, which consists in determining the composition of the currently free resource (taking into account the possibility of completing previously started tasks):

- количества и множества номеров свободных в текущий момент времени процессоров, имеющих признак занятости;- the number and set of numbers of currently free processors that have a busy sign;

- количества и множества номеров свободных линий связи, имеющих признак занятости.- the number and variety of numbers of free communication lines that have a busy sign.

21. Установление взаимно однозначного соответствия между кластерами фрагментов и подмножеством свободных в текущий момент времени процессоров вычислительных узлов, сопоставляющее каждому кластеру фрагментов выделенный процессор и обеспечивающее минимизацию суммарной длины межпроцессорных связей.21. Establishing a one-to-one correspondence between clusters of fragments and a subset of currently free processors of computing nodes, assigning a dedicated processor to each cluster of fragments and ensuring minimization of the total length of interprocessor connections.

22. Оценка конкретного типа процессоров (суперскалярные и со сверхдлинной командной строкой).22. Evaluation of a specific type of processor (superscalar and with a super-long command line).

23. Формирование множество фрагментов для суперскалярных процессоров и процессоров со сверхдлинной командной строкой.23. Formation of many fragments for superscalar processors and processors with a very long command line.

24. Оценка конкретной топологии (полносвязная, кольцо, гиперкуб, общая шина, решетка-тор и т.д.) коммуникационной сети вычислительной системы.24. Assessment of a specific topology (fully connected, ring, hypercube, common bus, lattice-torus, etc.) of the communication network of a computer system.

25. Оценка максимальных временных затрат на обмен сообщениями, т.е. на выполнение параллельного процесса с временной параметризацией в вычислительной системе, которые определяются величиной затрат процессорного времени и межпроцессорным обменом сообщениями:25. Estimation of the maximum time spent on messaging, i.e. to execute a parallel process with time parameterization in a computing system, which is determined by the amount of CPU time spent and interprocessor message exchange:

- цикл по номерам фрагментов, фрагментных основной и связной структур новой спецификации;- cycle by numbers of fragments, fragmentary main and coherent structures of the new specification;

- определение номера процессора, реализующего фрагмент;- determination of the number of the processor implementing the fragment;

- выделение для фрагмента его внешнего фрагментного множества и множества номеров внешних для процессора с номером процессоров, реализующих множество фрагментов;- allocation for a fragment of its external fragment set and a set of numbers external to the processor with the number of processors implementing the set of fragments;

- выделение из основной и связной структур новой спецификации топологии вычислительной системы внешнего множества номеров процессоров для основного процессора с соответствующим номером;- selection from the main and connected structures of the new computer system topology specification of an external set of processor numbers for the main processor with the corresponding number;

- формирование разности внешних множеств номеров процессоров, если она больше или равна пустому множеству, то это соответствует корректному закреплению основного и его внешних фрагментов за необходимым ресурсом топологии и время на реализацию обмена определяется как сумма общих затрат времени на обмен сообщениями при решении задачи и произведения времени передачи служебных данных на множество номеров внешних процессоров, реализующих внешнее фрагментное множество, при рассмотрении всех фрагментов задачи - завершение оценки временных коммуникационных затрат, в другом случае переход к выполнению цикла для очередного фрагмента;- formation of the difference between external sets of processor numbers, if it is greater than or equal to the empty set, then this corresponds to the correct assignment of the main and its external fragments to the necessary topology resource and the time for implementing the exchange is determined as the sum of the total time spent on exchanging messages when solving the problem and the product of time transferring service data to a set of numbers of external processors that implement the external fragment set, when considering all fragments of the task - completing the assessment of time communication costs, in another case, moving on to executing the cycle for the next fragment;

- формирование разности внешних множеств номеров процессоров, если она меньше пустого множества, то это соответствует недостаточности для основного процессора с соответствующим номером требуемого количества смежных процессоров для выполнения всех фрагментов множества и необходимости выбора дополнительных процессоров из имеющегося ресурса для реализации оставшихся нераспределенных фрагментов множества внешних фрагментов для фрагментов (с обеспечением возможности минимального увеличения дополнительных временных затрат на обмен при реализации фрагмента):- formation of the difference between external sets of processor numbers, if it is less than an empty set, then this corresponds to the insufficiency for the main processor with the corresponding number of the required number of adjacent processors to execute all fragments of the set and the need to select additional processors from the available resource to implement the remaining unallocated fragments of the set of external fragments for fragments (providing the possibility of a minimal increase in additional time spent on exchange when implementing a fragment):

- цикл по номерам процессоров, образующих внешнее множество процессоров для соответствующего основного процессора;- a cycle through the numbers of processors forming an external set of processors for the corresponding main processor;

- выделение из основной и связной структур новой спецификации топологии вычислительной системы внешнего множества номеров процессоров для процессора с соответствующим номером;- selection from the main and connected structures of the new specification of the topology of the computer system of an external set of processor numbers for the processor with the corresponding number;

- цикл по элементам внешнего множества процессора с соответствующим номером;- loop through elements of the external set of the processor with the corresponding number;

- проверка условия занятости процессора с соответствующим номером;- checking the busy condition of the processor with the corresponding number;

- при условии занятости процессора переход к следующему элементу внешнего множества процессора, в другом случае - назначение текущего незакрепленного фрагмента на процессор с его переводом в состояние занят;- if the processor is busy, transition to the next element of the external set of the processor, in another case - assignment of the current unassigned fragment to the processor with its transfer to the busy state;

- формирование текущего значения временных затрат на реализацию обмена определяется как как сумма общих затрат времени на обмен сообщениями при решении задачи и произведения времени передачи служебных данных на сумму сложностей обмена данными между фрагментами и их текущими внешними фрагментами;- the formation of the current value of the time spent on the implementation of the exchange is defined as the sum of the total time spent on exchanging messages when solving a problem and the product of the time for transferring service data by the sum of the difficulties of data exchange between fragments and their current external fragments;

- формирование параллельных фрагментов с временной параметризацией процесса решения задачи вычислительной системой с определением реальных временных затрат на параллельное выполнение задачи определяются возможностями совмещения выполнения во времени различных процессорных операций и операций обмена сообщениями и учитывают: количества вычислительных узлов и процессоров; значения длительностей выполнения различных типов операций, в т.ч. операций обращения к индивидуальной памяти процессора; количества портов приема/передачи данных произвольного процессора; топологии вычислительной системы; методы параллельной обработки данных;- formation of parallel fragments with time parameterization of the process of solving a problem by a computer system with determination of the real time costs for parallel execution of the task are determined by the possibilities of combining the execution of various processor operations and message exchange operations in time and take into account: the number of computing nodes and processors; values of durations for performing various types of operations, incl. operations accessing individual processor memory; number of data reception/transmission ports of an arbitrary processor; computer system topology; parallel data processing methods;

- оценка показателей эффективности сформированных фрагментов с временной параметризацией процесса решения задачи вычислительной системой: последовательное и параллельное время решения задачи, прирост во времени, показатель эффективности распараллеливания, коэффициент загрузки оборудования.- assessment of efficiency indicators of generated fragments with time parameterization of the process of solving a problem by a computer system: sequential and parallel time for solving a problem, time gain, parallelization efficiency indicator, equipment load factor.

26. Разработка текстовых спецификаций нитей параллельной программы с временной параметризацией включает разработку следующих объектов:26. Development of text specifications of threads of a parallel program with time parametrization includes the development of the following objects:

- текстовых спецификаций временных нитей параллельной программы со встроенными средствами (операторами sleep) временной синхронизации параллельных процессов вычислительных узлов;- text specifications of temporary threads of a parallel program with built-in means (sleep operators) for time synchronization of parallel processes of computing nodes;

- текстовых спецификаций временных нитей параллельной программы со встроенными средствами (операторами send, receive), учитывающими время передачи/приема данных в буферы передающей и принимающей сторон, времени приема пересылаемых данных в память;- text specifications of temporary threads of a parallel program with built-in tools (send, receive operators), taking into account the time of transmitting/receiving data into the buffers of the sending and receiving parties, the time of receiving sent data into memory;

- структур временной спецификации параллельной программы в виде индивидуальных текстов нитей программ процессоров вычислительной системы со встроенными средствами (операторами sleep, send, receive);- structures of the temporary specification of a parallel program in the form of individual texts of program threads of computer system processors with built-in tools (sleep, send, receive operators);

- оценок суммарного количества обращений к каждому данному, необходимое для выполнения исходной программы.- estimates of the total number of calls to each data required to execute the original program.

Исходные данные этапа разработки текстовых спецификаций временных нитей параллельной программы: новые структуры (основная, связная, временная) параллельного процесса с временной параметризацией, удовлетворяющей заданным требованиям (время реализации параллельной программы на заданном ресурсе, коэффициент загрузки процессоров от топологии вычислительной системы, типа и числа процессоров); закрепление операторов за нитями или процессорами; характеристики архитектуры вычислительной системы: тип топологии вычислительной системы, количество вычислительных узлов, количество и тип процессоров, значения длительностей выполнения различных типов операций и операций обращения к индивидуальной памяти процессора, количество портов одновременного параллельного ввода-вывода, методы параллельной обработки данных.Initial data of the stage of development of text specifications of temporary threads of a parallel program: new structures (main, connected, temporary) of a parallel process with temporary parameterization that meets the specified requirements (implementation time of a parallel program on a given resource, processor load factor depending on the topology of the computer system, type and number of processors ); assigning operators to threads or processors; characteristics of the computing system architecture: type of computer system topology, number of computing nodes, number and type of processors, duration values for various types of operations and operations for accessing individual processor memory, number of simultaneous parallel I/O ports, methods of parallel data processing.

Основные этапы разработки текстовых спецификаций временных нитей программ параллельной программы включают:The main stages of developing text specifications of temporary threads of parallel program programs include:

- цикл по номерам операторов основной структуры;- cycle through operator numbers of the main structure;

- определение номера процессора, реализующего оператор;- determination of the number of the processor implementing the operator;

- формирование текстовой спецификации оператора/операции, включающее:- formation of a text specification of the operator/operation, including:

1) определение с помощью основной структуры типа процессорной команды;1) determination using the main structure of the type of processor instruction;

2) выборка из связной структуры имен сопряженных операторов и запись текстовой спецификации оператора;2) sampling the names of conjugate operators from the coherent structure and recording the text specification of the operator;

3) выборка из связной структуры имен сопряженных операторов, замена имен операторов на их действительные адреса и запись текстовой спецификации оператора;3) sampling the names of conjugate operators from the coherent structure, replacing the names of operators with their actual addresses and recording the text specification of the operator;

4) временная параметризация операторов процессорных нитей программ;4) temporary parameterization of operators of processor threads of programs;

5) представление текстовых спецификаций нитей программ с временной параметризацией каждого из процессоров вычислительного узла для программ в виде совокупности строк следующего вида: номера команд нитей программ процессоров; признак класса операции; имя операции; адреса первого и второго операндов; имена фрагментов; значение текущего дискретного времени, соответствующего началу реализации операторов задачи, содержащего конкретное данное, используемое при выполнении операции.5) presentation of text specifications of program threads with time parameterization of each of the processors of the computing node for programs in the form of a set of lines of the following form: numbers of commands of processor program threads; operation class attribute; operation name; addresses of the first and second operands; fragment names; the value of the current discrete time corresponding to the beginning of the implementation of the task operators, containing the specific data used when performing the operation.

27. Оценка корректности результатов разработки параллельных программ с временной параметризацией, т.е. оценка корректности типов данных, типов операций/функций над данными, связей операций по данным и по управлению, корректность единиц измерения физических величин, корректность моментов начала и длительности вычислительных операций/функций и операторов передач управления и синхронизации временных параллельных процессов.27. Assessing the correctness of the results of developing parallel programs with time parameterization, i.e. assessment of the correctness of data types, types of operations/functions on data, connections between data and control operations, correctness of units of measurement of physical quantities, correctness of the start and duration of computational operations/functions and operators of control transfers and synchronization of temporary parallel processes.

28. Компилирование созданного параллельного кода с временной параметризацией.28. Compiling the generated parallel code with time parameterization.

Таким образом, для повышения эффективности цифровой обработки информации (снижения времени решения задачи) следует разработать временные нити программы с учетом требования оптимизации равномерности загрузки вычислительных узлов в процессе параллельного решения задачи, тем самым, обеспечить необходимую временную синхронизацию работы вычислительных узлов.Thus, in order to increase the efficiency of digital information processing (reduce the time for solving a problem), temporary program threads should be developed taking into account the requirement to optimize the uniformity of loading of computing nodes in the process of parallel solution of the problem, thereby ensuring the necessary time synchronization of the operation of computing nodes.

Новыми признаками, обладающими существенными отличиями, являются:New features with significant differences are:

1. Учет архитектуры массивно-параллельных вычислительных систем с распределенной памятью.1. Taking into account the architecture of massively parallel computing systems with distributed memory.

2. Учет вариантов фрагментации задач.2. Taking into account options for task fragmentation.

3. Учет параметра времени начала выполнения фрагментов/операторов параллельного алгоритма.3. Taking into account the parameter of the start time of execution of fragments/operators of the parallel algorithm.

Данные признаки обладают существенными отличиями, так как в известных способах не обнаружены.These signs have significant differences, since they were not found in known methods.

Применение новых признаков, в совокупности с известными позволит повысить эффективность цифровой обработки информации за счет оптимизации равномерности загрузки вычислительных узлов в процессе параллельного решения задачи.The use of new features, in combination with the known ones, will improve the efficiency of digital information processing by optimizing the uniformity of loading of computing nodes in the process of parallel solution of the problem.

Способ временной синхронизации работы массивно-параллельной вычислительной системы с распределенной памятью реализуется следующим образом.The method of time synchronization of the operation of a massively parallel computing system with distributed memory is implemented as follows.

На фиг. 1 показана схема основных компонентов массивно-параллельной вычислительной системы с распределенной памятью, состоящей из коммуникационной сети 3, вычислительных узлов 1, 2, которые включают интерфейсы ввода/вывода 11, 21, локальные памяти 12, 22, процессоры с кэш-памятью 13, 23, сетевые адаптеры 14, 24, системные шины 15, 25, и хост-компьютера 4, который включает блок выборки инструкции 41, блок фрагментации 42, устройство управления 43, арифметико-логическое устройство 44, память данных 45. Вычислительная система содержит два вычислительных узла 1, 2 и хост-компьютер 4, выполненные с возможностью оптимизации фрагментации задач и оптимизации обмена сообщений между вычислительными узлами 1, 2, предназначенными для создания параллельного кода с временной параметризацией посредством коммуникационной сети 3.In fig. Figure 1 shows a diagram of the main components of a massively parallel computing system with distributed memory, consisting of a communication network 3, computing nodes 1, 2, which include input/output interfaces 11, 21, local memories 12, 22, processors with cache memory 13, 23 , network adapters 14, 24, system buses 15, 25, and a host computer 4, which includes an instruction fetch unit 41, a fragmentation unit 42, a control device 43, an arithmetic-logical unit 44, a data memory 45. The computing system contains two computing nodes 1, 2 and host computer 4, configured to optimize task fragmentation and optimize message exchange between computing nodes 1, 2, designed to create parallel code with time parameterization through the communication network 3.

Рассмотрим пошаговое выполнение предложенного способа временной синхронизации работы массивно-параллельной вычислительной системы с распределенной памятью в описанной выше системе (Фиг. 1). В память данных 45 загружается последовательный код программы. Из памяти данных 45 устройство управления 43 получает упорядоченный выбор лексем исходной последовательной программы, формирует дескрипторы лексем (шаг 1). Блок выборки инструкций 41 получает упорядоченный выбор дескрипторов лексем, формирует для этой конструкции постфиксную спецификацию (шаг 2). Формирует новые структуры спецификации программы с детализацией до операций/функций (шаг 3). Формирует новые структуры спецификации программы с детализацией до фрагментов (шаг 4). Устройство управления 43 проверяет эквивалентность текстовой спецификации программы задачи и ее представления новыми структурами спецификации (шаг 5). Расширяет множества операторов и множества связной структур новой спецификации исходной программы (шаг 6). Блок фрагментации 42 реализует цикл по временным ярусам новой спецификации задачи (шаг 7). Формирует для множества операторов каждого яруса множества операторов ранжированных по убыванию мощностей их сопряженных множеств операторов (шаг 8). Формирует первый фрагмент (шаг 9). Формирует для текущего оператора принадлежащему определенному временному ярусу множества пересечений сопряженных множеств и проверяет равенства сопряженного множества пустому множеству (шаг 10). Проверяет достижения текущего количества фрагментов заданному и включает текущие операторы в состав следующего фрагмента (шаг 11). Арифметико-логическое устройство 44 рассчитывает сложность фрагментации для текущего фрагмента (шаг 12). Устройство управления 43 проверяет окончание рассмотрения всех операторов временных ярусов (шаг 13). Формирует для операторов временных ярусов множеств пересечений его сопряженного множества и подмножеств операторов каждого из фрагмента текущего множества сформированных фрагментов (шаг 14). Блок фрагментации 42 определяет номер фрагмента, имеющего максимальное значение пересечений, включает оператор временного яруса в состав фрагмента и арифметико-логическое устройство 44 рассчитывает сложность фрагментации для текущего фрагмента (шаг 15). Устройство управления 43 оценивает мощность множеств операторов, включенных в каждый из фрагментов (шаг 16). Блок фрагментации 42 определяет номер фрагмента, имеющего минимальное значение мощностей множеств оператора, включает оператор временного яруса в состав фрагмента и арифметико-логическое устройство 44 рассчитывает сложность фрагментации для текущего фрагмента (шаг 17). Рассчитывает общую сложность фрагментации задачи (шаг 18). Устройство управления 43 кластеризует фрагменты (шаг 19). Оценивает текущее состояния ресурса процессоров с кэш-памятью 13 и коммуникационной сети 3 (шаг 20). Устанавливает взаимно однозначного соответствия между кластерами фрагментов и подмножеством свободных в текущий момент времени процессоров с кэшпамятью 13, 23 вычислительных узлов 1, 2 (шаг 21). Устройство управления 6 оценивает конкретный тип процессоров с кэш-памятью 13, 23 (шаг 22). Формирует множество фрагментов для суперскалярных процессоров и процессоров со сверхдлинной командной строкой (шаг 23). Оценивает конкретный тип топологии коммуникационной сети 3 (шаг 24). Оценивает максимальные временные затраты на обмен сообщениями (шаг 25). Блок выборки инструкций 41 разрабатывает текстовые спецификации нитей параллельной программы с временной параметризацией, учитывающими время передачи/приема данных в буферы передающей и принимающей сторон и добавляет встроенные средства (операторы sleep, send, receive) временной синхронизации в код для каждого процессора с кэш-памятью 13, 23 вычислительных узлов 1, 2 (шаг 26). Устройство управления 43 оценивает корректность результатов разработки параллельных программ с временной параметризацией (шаг 27) и через коммуникационную сеть 3 подает команду процессорам с кэш-памятью 13, 23 на компилирование созданного параллельного кода с временной параметризацией (шаг 28). Ниже приведен пример созданного параллельного кода с временной параметризацией (фиг. 2).Let's consider the step-by-step implementation of the proposed method of time synchronization of the operation of a massively parallel computing system with distributed memory in the system described above (Fig. 1). The data memory 45 is loaded with serial program code. From data memory 45, control device 43 receives an ordered selection of tokens from the original sequential program and generates token descriptors (step 1). Instruction sampling unit 41 receives an ordered selection of lexeme descriptors and generates a postfix specification for this construction (step 2). Forms new program specification structures with detail down to operations/functions (step 3). Forms new program specification structures with detail down to fragments (step 4). The control device 43 checks the equivalence of the textual specification of the task program and its representation by the new specification structures (step 5). Expands the sets of operators and sets of connected structures of the new specification of the original program (step 6). The fragmentation block 42 implements a loop through the temporary tiers of the new task specification (step 7). For the set of operators of each tier, it generates sets of operators ranked in descending order of power of their conjugate sets of operators (step 8). Forms the first fragment (step 9). Forms for the current operator a set of intersections of conjugate sets belonging to a certain time tier and checks the equality of the conjugate set to the empty set (step 10). Checks whether the current number of fragments has reached the specified number and includes the current statements in the next fragment (step 11). Arithmetic logic unit 44 calculates the fragmentation complexity for the current fragment (step 12). The control device 43 checks that all temporary tiers have been processed (step 13). For operators of temporary tiers, it generates sets of intersections of its conjugate set and subsets of operators of each fragment of the current set of generated fragments (step 14). The fragmentation unit 42 determines the number of the fragment having the maximum intersection value, includes the temporary tier operator in the fragment, and the arithmetic logic unit 44 calculates the fragmentation complexity for the current fragment (step 15). The control device 43 evaluates the power of the sets of statements included in each of the fragments (step 16). The fragmentation block 42 determines the number of the fragment that has the minimum value of the cardinalities of the operator sets, includes the temporary tier operator in the fragment, and the arithmetic-logical unit 44 calculates the fragmentation complexity for the current fragment (step 17). Calculates the overall difficulty of task fragmentation (step 18). The control device 43 clusters the fragments (step 19). Evaluates the current state of the resource of processors with cache memory 13 and communication network 3 (step 20). Establishes a one-to-one correspondence between clusters of fragments and a subset of currently free processors with cache memory 13, 23 of computing nodes 1, 2 (step 21). The control device 6 evaluates the specific type of cache processors 13, 23 (step 22). Generates multiple fragments for superscalar and super-long-line processors (step 23). Evaluates the specific topology type of communication network 3 (step 24). Estimates the maximum time spent exchanging messages (step 25). Instruction sampling unit 41 develops text specifications of parallel program threads with time parametrization, taking into account the time of data transmission/reception into the buffers of the sending and receiving sides and adds built-in means (sleep, send, receive operators) of time synchronization to the code for each processor with cache memory 13 , 23 computing nodes 1, 2 (step 26). The control device 43 evaluates the correctness of the development results of parallel programs with time parameterization (step 27) and, through the communication network 3, sends a command to processors with cache memory 13, 23 to compile the created parallel code with time parameterization (step 28). Below is an example of the generated parallel code with time parameterization (Fig. 2).

Таким образом, предлагаемый способ позволит снизить время решения задачи на двух вычислительных узлах до 24% при работе массивно-параллельной вычислительной системы с распределенной памятью, то есть повысить эффективность цифровой обработки информации за счет временной синхронизации работы массивно-параллельной вычислительной системы с распределенной памятью.Thus, the proposed method will reduce the time required to solve a problem on two computing nodes by up to 24% when operating a massively parallel computing system with distributed memory, that is, increasing the efficiency of digital information processing due to time synchronization of the operation of a massively parallel computing system with distributed memory.

Источники информацииInformation sources

1. Яковлев СВ., Сафонов И.В., Быкова Т.В. Способ построения программы. Патент на изобретение №2406112, бюл. №34, 2010 г.(аналог).1. Yakovlev S.V., Safonov I.V., Bykova T.V. Method of constructing a program. Patent for invention No. 2406112, Bulletin. No. 34, 2010 (analogue).

2. Викторов Д.С., Брежнев Д.Ю., Толмачев А.А., Калачников А.С., Якунина Г.Р. Способ автоматического создания параллельной программы с временной параметризацией многопроцессорных вычислительных систем с одинаковым доступом к памяти. Патент на изобретение №2786347, бюл. №35, 2022 г. (прототип).2. Viktorov D.S., Brezhnev D.Yu., Tolmachev A.A., Kalachnikov A.S., Yakunina G.R. A method for automatically creating a parallel program with time parameterization of multiprocessor computer systems with identical memory access. Patent for invention No. 2786347, Bulletin. No. 35, 2022 (prototype).

Claims

A method for time synchronization of the operation of a massively parallel computing system with distributed memory, which consists in the fact that to implement the method, a communication network, computing nodes, which include input/output interfaces, local memories, processors with cache memory, network adapters, system buses and a host - a computer, which includes an instruction fetch unit, a fragmentation unit, a control device, an arithmetic-logical unit, and data memory, performs the following operations: an ordered selection of tokens of the original sequential program is obtained and their descriptors are formed for the tokens; obtaining an ordered selection of lexeme descriptors from the set of descriptors generated at the previous stage; form new program specification structures with detail down to operations/functions; form new program specification structures with detail down to fragments; check the equivalence of the text specification of the task program and its representation by new specification structures; expand the sets of operators and sets of connections between operators of the main and connected structures of the new specification of the original program; implement a cycle along the temporary tiers of the new task specification; for the set of operators of each tier, sets of operators are formed, ranked in descending order of power of their conjugate sets of operators; form the first fragment; for the current operator belonging to a certain time tier, sets of intersections of conjugate sets are formed and the equality of the conjugate set to the empty set is checked; check whether the current number of fragments has reached the specified number and include the current operators in the next fragment; calculate the fragmentation complexity for the current fragment; check the end of consideration of all operators of temporary tiers; form for the operators of temporary tiers a set of intersections of its conjugate set and a subset of operators of each of the fragments of the current set of generated fragments; determining the number of the fragment having the maximum intersection value, including the temporary tier operator in the fragment, and calculating the complexity of fragmentation for the current fragment; estimating the power of the sets of operators included in each of the fragments; determine the number of the fragment that has the minimum value of the cardinality of the operator sets, include the temporary tier operator in the fragment and calculate the complexity of fragmentation for the current fragment; calculate the overall complexity of task fragmentation; cluster fragments; evaluate the current state of the resource of processors with cache memory and the communication network; establish a one-to-one correspondence between clusters of fragments and a subset of currently free processors with cache memory of computing nodes; evaluate a specific type of processor with cache memory; form multiple fragments for superscalar processors and processors with super-long command line; evaluate the specific type of communication network topology; estimate the maximum time spent on messaging; develop text specifications of parallel program threads with time parametrization, taking into account the time of data transmission/reception into the buffers of the transmitting and receiving sides and adding built-in time synchronization tools to the code for each processor with cache memory of the computing nodes; evaluate the correctness of the results of developing parallel programs with time parameterization; compile the generated parallel code with time parameterization.