RU2820032C1

RU2820032C1 - Method of distributing data on monofunctional units of processors of computer system with data flow control

Info

Publication number: RU2820032C1
Application number: RU2024105066A
Authority: RU
Inventors: Алексей Александрович Толмачев; Дмитрий Сергеевич Викторов; Алексей Андреевич Хапёрский; Андрей Михайлович Дергунов
Filing date: 2024-02-27
Publication date: 2024-05-28

Abstract

FIELD: computer engineering.

SUBSTANCE: result is achieved due to execution by monofunctional units of actuators of subset of instructions and obtaining results of values of operands for subsequent commands; transfer by executive devices of the results of instruction execution for duplication of operand values using a unit for sampling the program fragment number to operand switches, data recording into memory cells and transmission by means of instruction sampling units and instruction switches to subsequent instructions, which use these results as the value of the first or second operand containing the obtained result value field, fields of two numbers of instructions using this value as an operand, and two fields of indicators, which determine the nature of using the value in question as the first operand and/or the second operand; execution of subset of instructions by monofunctional units of executive devices and obtaining result of parallel program execution with time parameterisation and transmitting the result by the program fragment number sampling unit for recording into the data memory.

EFFECT: reduced time of data processing.

1 cl, 1 dwg

Description

Изобретение относится к области обработки цифровых данных с помощью электронных устройств, а именно к способам распределения данных по монофункциональным блокам процессоров, предназначенных для вычислительных систем с управлением потоком данных (Data Flow Control (DFC)) и может быть использовано для сокращения времени обработки цифровых данных вычислительными системами DFC.The invention relates to the field of processing digital data using electronic devices, namely to methods for distributing data across monofunctional processor units designed for computing systems with Data Flow Control (DFC) and can be used to reduce the processing time of digital data by computers DFC systems.

Известен способ автоматического распараллеливания программ, заключающийся в том, что в алгоритмической части программы предварительно получают граф потока управления, дерево доминаторов, дерево циклов, граф потока данных; выполняют подстановки промежуточного представления процедур в места вызовов; выполняют межпроцедурный анализ потока данных; для обнаружения эквивалентных операций выполняют анализ потока данных, предпочтительно способом нумераций значений; выполняют анализ переменных цикла на инвариантность и индуктивность; выполняют анализ операций доступа в массивы, строят индексы доступа в массивы в виде канонических форм сумм произведений; выполняют слияния циклов; выполняют вынос инвариантных условий; изменяют порядок обхода итерационного пространства циклов; выполняют анализ параллельных циклов [1].There is a known method for automatically parallelizing programs, which consists in the fact that in the algorithmic part of the program a control flow graph, a dominator tree, a loop tree, and a data flow graph are first obtained; perform substitutions of intermediate representations of procedures into call places; perform interprocedural data flow analysis; to detect equivalent operations, data flow analysis is performed, preferably in a value numbering manner; perform analysis of loop variables for invariance and inductance; perform analysis of array access operations, construct array access indexes in the form of canonical forms of sums of products; perform cycle merges; carry out the removal of invariant conditions; change the order of traversal of the iterative loop space; perform analysis of parallel loops [1].

Известен способ построения программы, заключающийся в определении в исходном коде программы на ассемблере помеченные циклы и классифицируют их на несколько предопределенных типов, выравнивают адреса начала помеченных циклов, если это требуется для цикла данного типа путем добавления ассемблерных инструкций и, сохраняя исходный код на ассемблере в памяти, строят путем компиляции и компоновки модифицированный ассемблерный код для устройства назначения [2].There is a known method for constructing a program, which consists in defining marked loops in the source code of an assembler program and classifying them into several predefined types, aligning the start addresses of the marked loops, if required for a loop of a given type, by adding assembly instructions and storing the assembler source code in memory , build a modified assembly code for the destination device by compiling and linking [2].

Недостатками данных способов являются отсутствие учета конкретной архитектуры вычислительной системы и отсутствие учета параметра времени работы монофункциональных блоков процессов вычислительной системы.The disadvantages of these methods are the lack of consideration of the specific architecture of the computer system and the lack of consideration of the operating time parameter of monofunctional blocks of computer system processes.

Одним из возможных путей повышения эффективности данных является организация распределения данных в монофункциональных блоках процессов вычислительной системы. Однако при распределении данных программы в монофункциональных блоках процессоров возникают проблемы временной синхронизации, что приводит к снижению эффективности цифровой обработки данных, что необходимо учитывать при выборе моментов времени начала выполнения фрагментов и операторов при параллельном выполнении исходной программы.One of the possible ways to increase data efficiency is to organize data distribution in monofunctional blocks of computer system processes. However, when program data is distributed in monofunctional processor blocks, time synchronization problems arise, which leads to a decrease in the efficiency of digital data processing, which must be taken into account when choosing the start time of execution of fragments and operators during parallel execution of the original program.

Цель изобретения - повысить эффективность цифровой обработки данных (снизить время обработки данных) за счет оптимизации равномерности загрузки монофункциональных блоков процессоров вычислительной системы.The purpose of the invention is to increase the efficiency of digital data processing (reduce data processing time) by optimizing the uniformity of loading of monofunctional units of computer system processors.

Указанная цель достигается способом распределения данных по монофункциональным блокам процессоров вычислительной системы с управлением потоком данных, заключающимся в выполнении следующих процедур:This goal is achieved by distributing data across monofunctional blocks of processors of a computer system with data flow control, which consists in performing the following procedures:

1. Загрузка последовательного кода программы.1. Loading the serial code of the program.

2. Разделение последовательного кода программы на нумерованные фрагменты, распределение фрагментов на процессоры с учетом временной синхронизации выполнения фрагментов, выделение из фрагментов нумерованных последовательностей управляющих операций, определяющих дальнейшие направления вычислительного процесса и арифметических операций, определение связей операций между собой, передача и запись операций и связей.2. Dividing the sequential program code into numbered fragments, distributing the fragments to processors taking into account the time synchronization of fragment execution, isolating from the fragments numbered sequences of control operations that determine the further directions of the computational process and arithmetic operations, determining the connections between operations, transferring and recording operations and connections .

3. Загрузки имеющихся значений операндов в зависимости от распределения каждого фрагмента на процессор;3. Loading the available operand values depending on the distribution of each fragment on the processor;

4. Передача сигналов на дублирование значений соответствующих операндов для использования в командах различных этапов обработки данных;4. Transmission of signals to duplicate the values of the corresponding operands for use in commands at various stages of data processing;

5. Дублирование значений операндов и запись этих значений в поля первого или второго операндов соответствующих ячеек памяти команд и передача из памяти команд значений операндов в блоки выборки команд;5. Duplicating the values of the operands and writing these values into the fields of the first or second operands of the corresponding command memory cells and transferring the operand values from the command memory to the command sampling blocks;

6. Определение подмножества готовых к выполнению команд, т.е. имеющих значения всех ее операндов и передача для формирования команд готовых к выполнению;6. Defining a subset of commands ready to execute, i.e. having the values of all its operands and transmission to form commands ready for execution;

7. Формирование подмножества готовых команд, содержащих поле операции, поле значения первого операнда, поле значения второго операнда, адрес памяти для временного хранения результата выполнения операции, а также поля признаков наличия в команде значений соответствующих операндов и готовности команды к выполнению, и распределение на монофункциональные блоки исполнительных устройств для параллельного выполнения с учетом параметра времени, т.е. времени начала выполнения и длительности выполнения каждой операции, чем больше длительность выполнения операции, тем раньше операция должна начать выполнение;7. Formation of a subset of ready-made commands containing an operation field, a value field of the first operand, a value field of the second operand, a memory address for temporary storage of the result of the operation, as well as fields indicating the presence of the values of the corresponding operands in the command and the readiness of the command for execution, and distribution into monofunctional blocks of actuators for parallel execution taking into account the time parameter, i.e. the start time of execution and the duration of execution of each operation, the longer the duration of the operation, the earlier the operation should begin execution;

8. Выполнение подмножества команд и получение результатов значений операндов для последующих команд;8. Execute a subset of commands and obtain the results of operand values for subsequent commands;

9. Передача результатов выполнения команд для дублирования значений операндов, записи в ячейки памяти данных и передача в последующие команды, которые используют эти результаты в качестве значения первого или второго операнда, содержащего поле значения полученного результата, поля двух номеров команд, использующих данное значение в качестве операнда, и два поля указателей, определяющих характер использования рассматриваемого значения в качестве первого операнда и/или второго операнда;9. Transferring the results of command execution to duplicate the values of the operands, writing to data memory cells and transferring to subsequent commands that use these results as the value of the first or second operand containing the value field of the resulting result, the fields of two command numbers using this value as operand, and two pointer fields defining the nature of the use of the value in question as the first operand and/or second operand;

10. Выполнение подмножества команд и получение результата выполнения параллельной программы с временной параметризацией и передача результата для записи.10. Execution of a subset of commands and obtaining the result of executing a parallel program with time parameterization and transferring the result for recording.

Таким образом, для повышения эффективности цифровой обработки данных (снижения времени обработки данных) следует разработать временные нити программы с учетом требования оптимизации равномерности загрузки монофункциональных блоков процессоров в процессе параллельного решения задачи, тем самым, обеспечить необходимую временную синхронизацию монофункциональных блоков процессоров вычислительной системы.Thus, to increase the efficiency of digital data processing (reduce data processing time), it is necessary to develop temporary program threads taking into account the requirement to optimize the uniformity of loading of monofunctional processor units in the process of parallel solution of the problem, thereby ensuring the necessary time synchronization of monofunctional processor units of a computer system.

Новыми признаками, обладающими существенными отличиями, являются:New features with significant differences are:

1. Учет архитектуры вычислительной системы с управлением потоком данных.1. Taking into account the architecture of a computer system with data flow control.

2. Учет параметра времени начала выполнения операторов параллельного алгоритма в монофункциональных блоках процессоров вычислительной системы с управлением потоком данных.2. Taking into account the parameter of the start time of execution of parallel algorithm operators in monofunctional blocks of processors of a computer system with data flow control.

Данные признаки обладают существенными отличиями, так как в известных способах не обнаружены.These signs have significant differences, since they were not found in known methods.

Применение новых признаков, в совокупности с известными позволит повысить эффективность цифровой обработки данных за счет оптимизации равномерности загрузки монофункциональных блоков процессоров вычислительной системы в процессе параллельного решения задачи.The use of new features, in combination with the known ones, will improve the efficiency of digital data processing by optimizing the uniformity of loading of monofunctional units of computer system processors in the process of parallel solution of the problem.

Способ распределения данных по монофункциональным блокам процессоров вычислительной системы с управлением потоком данных реализуется следующим образом.The method of distributing data among monofunctional blocks of processors of a computer system with data flow control is implemented as follows.

На фиг. 1 показана схема основных компонентов вычислительной системы с управлением потоком данных, состоящей из памяти связей команд 7, блока выборки номера фрагмента 12, синхронизатора 13, памяти данных 14, двух процессоров, которые включают коммутаторы операндов 1, 2, памяти команд 3, 4, блоки выборки команд 5, 6, коммутаторы команд 7, 9 и исполнительные устройства 10, 11, которые реализуют предложенный способ. Коммутатор операндов 1, 2 обеспечивает распределение значений результатов выполнения операций и запись этих значений в поля первого и второго операндов соответствующих ячеек памяти команд для формирования команд готовых к выполнению. Памяти команд 3, 4 хранят множества командных ячеек программы в процессе решения задачи. Блоки выборки команд 5, 6 обеспечивают определение подмножества готовых к выполнению команд программы. Коммутаторы команд 7, 9 обеспечивают распределение множества готовых к выполнению команд на соответствующие монофункциональные блоки исполнительных устройств для их параллельного выполнения. Исполнительные устройства 10, 11 представляют собой множество монофункциональных блоков различных типов, обеспечивающих возможность решения задачи. Монофункциональные блоки 101, 111 представляют собой операционные блоки, выполняющие каждым блоком только один тип операции (деления, умножения, вычитания, суммирования, суммирования по модулю 2 и др.). Память связей команд 8 хранит массив количеств для рассматриваемых входных и связанных команд программы. Блок выборки номера фрагмента программы 12 обеспечивает разделение программы на фрагменты, распределение фрагментов выполняемой программы между процессорами и выделение из фрагментов нумерованных последовательностей управляющих и арифметических операций. Синхронизатор 13 синхронизирует работу блока выборки номера фрагмента программы и процессоров при вычислении и объединении результатов выполнении программы. Память данных 14 обеспечивает хранение всех используемых при решении задач данных.In fig. 1 shows a diagram of the main components of a computer system with data flow control, consisting of a command link memory 7, a fragment number sampling unit 12, a synchronizer 13, a data memory 14, two processors that include operand switches 1, 2, command memories 3, 4, blocks command samples 5, 6, command switches 7, 9 and actuators 10, 11, which implement the proposed method. The switch of operands 1, 2 ensures the distribution of values of the results of operations and the recording of these values in the fields of the first and second operands of the corresponding command memory cells to generate commands ready for execution. Command memories 3, 4 store multiple command cells of the program in the process of solving a problem. Command sampling blocks 5, 6 provide determination of a subset of program commands ready for execution. Command switches 7, 9 provide distribution of a plurality of ready-to-execute commands to the corresponding monofunctional units of actuators for their parallel execution. Actuators 10, 11 are a variety of monofunctional blocks of various types that provide the ability to solve the problem. Monofunctional blocks 101, 111 are operating blocks that perform only one type of operation (division, multiplication, subtraction, summation, modulo 2 summation, etc.) with each block. Command link memory 8 stores an array of quantities for the considered input and associated program commands. The program fragment number sampling unit 12 ensures the division of the program into fragments, the distribution of fragments of the executable program between processors and the selection of numbered sequences of control and arithmetic operations from the fragments. Synchronizer 13 synchronizes the operation of the program fragment number sampling unit and processors when calculating and combining the results of program execution. Data memory 14 provides storage of all data used in solving problems.

Рассмотрим пошаговое выполнение предложенного способа в описанной выше системе (Фиг. 1). Загрузка из памяти данных 14 в блок выборки номера фрагмента программы 12 последовательного кода программы (шаг 1). Разделение блоком выборки номера фрагмента программы 12 последовательного кода программы на нумерованные фрагменты, распределение фрагментов на процессоры с учетом временной синхронизации выполнения фрагментов посредством синхронизатора, выделение из фрагментов нумерованных последовательностей управляющих операций, определяющих дальнейшие направления вычислительного процесса и арифметических операций, определение связей операций между собой и передача в блок выборки команд 5, 6, и запись операций и связей в память связей команд 8 (шаг 2). Передача блоком выборки номера фрагмента программы 12 сигналов в память данных 14 для загрузки в соответствующие коммутаторы операндов 1, 2 имеющихся значений операндов в зависимости от распределения каждого фрагмента на процессор (шаг 3). Передача блоком выборки номера фрагмента программы 12 сигналов коммутаторам операндов 1, 2 на дублирование значений соответствующих операндов для использования в командах различных этапов обработки данных (шаг 4). Дублирование коммутаторами операндов 1, 2 значений операндов и запись этих значений в поля первого или второго операндов соответствующих ячеек памяти команд 3, 4 и передача из памяти команд 3, 4 значений операндов в блоки выборки команд 5, 6 (шаг 5). Определение блоками выборки команд 5, 6 подмножества готовых к выполнению команд, т.е. имеющих значения всех ее операндов и передача в коммутаторы команд 7, 9 для формирования команд готовых к выполнению (шаг 6). Формирование коммутаторами команд 7, 9 подмножества готовых команд, содержащих поле операции, поле значения первого операнда, поле значения второго операнда, адрес памяти для временного хранения результата выполнения операции, а также поля признаков наличия в команде значений соответствующих операндов и готовности команды к выполнению, и распределение на монофункциональные блоки 101, 111 исполнительных устройств 10, 11 для параллельного выполнения с учетом параметра времени, т.е. времени начала выполнения и длительности выполнения каждой операции, чем больше длительность выполнения операции, тем раньше операция должна начать выполнение (шаг 7). Выполнение монофункциональными блоками исполнительных устройств подмножества команд и получение результатов значений операндов для последующих команд (шаг 8). Передача исполнительными устройствами результатов выполнения команд для дублирования значений операндов посредством блока выборки номера фрагмента программы в коммутаторы операндов, записи в ячейки памяти данных и передача посредством блоков выборки команд и коммутаторов команд в последующие команды, которые используют эти результаты в качестве значения первого или второго операнда, содержащего поле значения полученного результата, поля двух номеров команд, использующих данное значение в качестве операнда, и два поля указателей, определяющих характер использования рассматриваемого значения в качестве первого операнда и/или второго операнда (шаг 9). Выполнение монофункциональными блоками исполнительных устройств подмножества команд и получение результата выполнения параллельной программы с временной параметризацией и передача результата посредством блока выборки номера фрагмента программы для записи в память данных (шаг 10).Let us consider the step-by-step implementation of the proposed method in the system described above (Fig. 1). Loading from data memory 14 into the block for sampling the number of program fragment 12 of the sequential program code (step 1). The sampling unit divides the program fragment number 12 of the sequential program code into numbered fragments, distributes the fragments to processors taking into account the time synchronization of fragment execution using a synchronizer, extracts from the fragments numbered sequences of control operations that determine the further directions of the computational process and arithmetic operations, determines the connections between the operations and transferring commands 5, 6 to the sampling block, and recording operations and connections into the memory of command connections 8 (step 2). The sampling unit transmits the number of the program fragment 12 signals to the data memory 14 for loading into the corresponding switches of operands 1, 2 the available operand values, depending on the distribution of each fragment to the processor (step 3). The sampling unit transmits the program fragment number 12 signals to the switches of operands 1, 2 to duplicate the values of the corresponding operands for use in commands at various stages of data processing (step 4). Duplicating operand values by switches 1, 2 and writing these values into the fields of the first or second operands of the corresponding memory cells of commands 3, 4 and transferring operand values from command memory 3, 4 to command sampling blocks 5, 6 (step 5). Determination by command sampling blocks 5, 6 of a subset of commands ready for execution, i.e. having the values of all its operands and transmitting commands 7, 9 to the switches to generate commands ready for execution (step 6). Formation by switches of commands 7, 9 of a subset of ready commands containing an operation field, a value field of the first operand, a value field of the second operand, a memory address for temporary storage of the result of the operation, as well as fields indicating the presence of the values of the corresponding operands in the command and the readiness of the command for execution, and distribution into monofunctional blocks 101, 111 of actuators 10, 11 for parallel execution taking into account the time parameter, i.e. the start time of execution and the duration of execution of each operation, the longer the duration of the operation, the earlier the operation should begin execution (step 7). Execute a subset of commands by monofunctional execution units and obtain the results of the operand values for subsequent commands (step 8). Transmission by executive devices of the results of execution of commands for duplicating the values of the operands through a block for sampling the number of a program fragment into operand commutators, writing to data memory cells and transmission through command sampling blocks and command commutators to subsequent commands that use these results as the value of the first or second operand, containing a field for the value of the result obtained, fields for two instruction numbers using this value as an operand, and two pointer fields that determine the nature of the use of the value in question as the first operand and/or second operand (step 9). Execution of a subset of commands by monofunctional blocks of executive devices and obtaining the result of executing a parallel program with time parameterization and transferring the result through a block for sampling the number of a program fragment for writing to the data memory (step 10).

Таким образом, предлагаемый способ позволит снизить время обработки данных на монофункциональных блоках вычислительной системы до 19% при распределении данных по монофункциональным блокам процессоров вычислительной системы с управлением потоком данных, то есть повысить эффективность цифровой обработки данных за счет оптимизации равномерности загрузки монофункциональных блоков вычислительной системы в процессе параллельного решения задачи.Thus, the proposed method will reduce the time of data processing on monofunctional blocks of a computer system to 19% when distributing data among monofunctional blocks of processors of a computer system with data flow control, that is, increase the efficiency of digital data processing by optimizing the uniformity of loading of monofunctional blocks of a computer system in the process parallel solution of the problem.

Источники информацииInformation sources

1. Дроздов А.Ю., Новиков С.В. Способ автоматического распараллеливания программ. Патент на изобретение №2411569, бюл. №4, 2011 г. (аналог).1. Drozdov A.Yu., Novikov S.V. A method for automatically parallelizing programs. Patent for invention No. 2411569, Bulletin. No. 4, 2011 (analogue).

2. Яковлев С.В., Сафонов И.В., Быкова Т.В. Способ построения программы. Патент на изобретение №2406112, бюл. №34, 2010 г. (прототип).2. Yakovlev S.V., Safonov I.V., Bykova T.V. Method of constructing a program. Patent for invention No. 2406112, bulletin. No. 34, 2010 (prototype).

Claims

A method of distributing data among monofunctional blocks of processors of a computer system with data flow control, which consists in the fact that to implement the method, the devices of the computer system perform the following operations:

loading data from memory into the block for sampling the program fragment number of the sequential program code;

dividing the program fragment number of the sequential program code into numbered fragments by the sampling block, distributing the fragments to processors taking into account the time synchronization of fragment execution through a synchronizer, isolating from the fragments numbered sequences of control operations that determine the further directions of the computational process and arithmetic operations, determining the connections between the operations and transfer into the command sampling block, and recording operations and connections into the command connection memory;

transmission by the sampling unit of the signal program fragment number to the data memory for loading the available operand values into the corresponding operand switches, depending on the distribution of each fragment to the processor;

transmission by the sampling unit of the program fragment number to the operand switches for duplication of the values of the corresponding operands for use in commands at various stages of data processing;

duplication of operand values by operand switches and writing these values into the fields of the first or second operands of the corresponding command memory cells and transferring operand values from the command memory to command fetch blocks;

determination by command sampling blocks of a subset of commands ready for execution, i.e., having the values of all its operands, and transmission of commands to the switches to generate commands ready for execution;

generation by command switches of a subset of ready-made commands containing an operation field, a value field of the first operand, a value field of the second operand, a memory address for temporary storage of the result of the operation, as well as fields indicating the presence in the command of the values of the corresponding operands and the readiness of the command for execution, and distribution into monofunctional blocks of actuators for parallel execution, taking into account the time parameter, i.e., the start time of execution and the duration of execution of each operation; the longer the duration of the operation, the earlier the operation should begin execution;

execution of a subset of commands by monofunctional blocks of executive devices and obtaining the results of operand values for subsequent commands;

transmission by executive devices of the results of command execution for duplicating the values of operands through a block for sampling the number of a program fragment into operand switches, writing to data memory cells and transmission through command sampling blocks and command switches to subsequent commands that use these results as the value of the first or second operand, containing a field for the value of the result obtained, fields for two instruction numbers using this value as an operand, and two pointer fields that determine the nature of the use of the value in question as the first operand and/or second operand;

execution of a subset of commands by monofunctional blocks of executive devices and obtaining the result of executing a parallel program with time parameterization and transmitting the result through a block for sampling the number of a program fragment for writing to data memory.