RU2815599C1

RU2815599C1 - Method of executing programs by processor using table of addresses of speculation on data

Info

Publication number: RU2815599C1
Application number: RU2023101191A
Authority: RU
Inventors: Ольга Александровна Четверина
Original assignee: Акционерное общество "МЦСТ"
Filing date: 2023-01-20
Publication date: 2024-03-19

Abstract

FIELD: physics.

SUBSTANCE: invention relates to computer engineering. Method of executing programs by a processor using a table of addresses of speculation according to data includes a computing system comprising a processor with a VLIW architecture, which implements a DAM or ALAT mechanism, supporting the execution of speculative reading operations with entering into the table of addresses of speculation by data, checking and normal readings, a compiler, which translates the program code to the processor, wherein when executing the program code, the processor for several read operations at a common address performs only one common speculative read operation and one common verification read operation, and storing data from the remaining readings for the compensating code is carried out without using speculative operations.

EFFECT: faster execution of the final machine code on architectures with a dynamic mechanism for breaking dependencies.

1 cl, 3 dwg

Description

Изобретение относится к области вычислительной техники и применяется для оптимизации программного кода.The invention relates to the field of computer technology and is used to optimize program code.

Один из способов повысить скорость работы кода за счет использования возможностей архитектуры вычислительных комплексов - это спланировать одновременное исполнение нескольких операций в одном такте, то есть повысить параллельность кода.One of the ways to increase the speed of code by using the capabilities of the architecture of computing systems is to plan the simultaneous execution of several operations in one clock cycle, that is, to increase the parallelism of the code.

Повышение скорости работы кода оптимизирующим компилятором достигается в значительной степени за счет планирования параллельного исполнения операций на устройствах вычислительных комплексов. Однако отсутствие информации о независимости чтений и записей существенно ограничивает возможности по изменению очередности исполнения. В этом случае нельзя менять исполнение чтения и записи местами, удалять повторные чтения и записи по тому же адресу, или одновременно исполнять зависимые операции, которые стоят до записи и после чтения. При динамическом планировании исполнения операций выявление информации о различии адресов чтений и записей может производится уже в процессе исполнения кода, что позволяет достаточно эффективно динамически спланировать исполнение операций. В случае же статического планирования основные доступные методы выявления независимости адресов - это применение различных методов анализа указателей и использование дополнительных правил работы с указателями (restrict, strict aliasing), информацию о соблюдении которых в исходном коде передается пользователем с помощью специальных опций или прагм. Несмотря на то, что анализ указателей - это обширный и развитый набор методов, его возможности весьма ограничены в случае передачи указателей между процедурами, наличия глобальных переменных и использования других типичных для программ С и С++ конструкций, и в случае использования в коде нерегулярно вычисляемых адресов от одного типа объектов.Increasing the speed of code operation by an optimizing compiler is achieved to a large extent by planning the parallel execution of operations on the devices of computing systems. However, the lack of information about the independence of reads and writes significantly limits the ability to change the execution order. In this case, you cannot swap read and write executions, delete repeated reads and writes at the same address, or simultaneously perform dependent operations that come before the write and after the read. When dynamically scheduling the execution of operations, information about the difference between read and write addresses can be detected already during code execution, which allows for quite efficient dynamic planning of the execution of operations. In the case of static planning, the main available methods for identifying address independence are the use of various methods of pointer analysis and the use of additional rules for working with pointers (restrict, strict aliasing), information about compliance with which in the source code is transmitted by the user using special options or pragmas. Despite the fact that pointer analysis is an extensive and developed set of methods, its capabilities are very limited in the case of passing pointers between procedures, the presence of global variables and the use of other constructs typical of C and C++ programs, and in the case of using irregularly calculated addresses from one type of object.

Для обеспечения более высокой параллельности на уровне операций в процессорах со статическим планированием разработан механизм поддержки динамического разрыва зависимостей на уровне устройств, позволяющий выполнять часть операций спекулятивно (speculative), то есть заранее. А в случае последующего получения информации о том, что по адресу для чтений была произведена запись, перевыполнять часть операций уже с использованием правильного значения. Такой механизм реализован в процессорах Эльбрус с VLIW-архитектурой (DAM или Disambiguation of Accessing Memory), а также, с небольшими отличиями, для архитектуры Itanium (Advanced Load Address Table или ALAT).To ensure higher parallelism at the level of operations in processors with static scheduling, a mechanism has been developed to support dynamic breaking of dependencies at the device level, allowing some operations to be performed speculatively, that is, in advance. And in the event of subsequent receipt of information that a record was made at the address for reading, re-perform some of the operations using the correct value. This mechanism is implemented in Elbrus processors with VLIW architecture (DAM or Disambiguation of Accessing Memory), as well as, with minor differences, for Itanium architecture (Advanced Load Address Table or ALAT).

Из уровня техники известен способ использования DAM и ALAT (прототип) для спекулятивного исполнения кода с помощью динамического разрыва зависимостей, который заключается в следующем: вместо исходного чтения до потенциально конфликтующей записи строится спекулятивная операция чтения (speculative load), а на месте исходной операции чтения строится операция проверочного чтения (check load). При этом адрес исходного чтения на уровне аппаратуры заносится в специальную таблицу, а потребители исходного чтения на вход получают данные, считанные спекулятивным чтением. В случае, если в дальнейшем произошел конфликт и данные по адресу спекулятивного чтения были изменены, то это отразится на операции проверочного чтения, которая проверяет данные в таблице. В результате будет произведен переход на код с повторным исполнением части операций (recovery code или компенсирующий код), в котором будут использованы данные, считанные проверочной операцией чтения, а затем произойдет обратный переход в точку проверки. Данный способ описан в «Руководство по эффективному программированию на платформе Эльбрус», Нейман-заде М. И., Королев С.Д. © 2020, АО «МЦСТ», стр. 90-91.In the prior art, a method is known for using DAM and ALAT (prototype) for speculative code execution using dynamic dependency breaking, which consists of the following: instead of the original read to the potentially conflicting record, a speculative read operation (speculative load) is built, and in place of the original read operation, a check read operation (check load). In this case, the address of the initial reading at the hardware level is entered into a special table, and the consumers of the initial reading receive as input the data read by speculative reading. If a conflict later occurs and the data at the speculative read address is changed, this will affect the test read operation, which checks the data in the table. As a result, a transition will be made to a code with repeated execution of part of the operations (recovery code or compensating code), in which the data read by the check read operation will be used, and then a return transition to the check point will occur. This method is described in the “Guide to Effective Programming on the Elbrus Platform”, Neiman-zade M.I., Korolev S.D. © 2020, MCST JSC, pp. 90-91.

Недостатком указанного способа является то, что вместо каждого чтения исходного кода исполняется по 2 операции чтения - операция спекулятивного чтения и операция проверочного чтения, что замедляет исполнение даже в случае отсутствия конфликта по адресам. Кроме того, есть ограничение на количество доступных спекулятивных чтений, поскольку их требуется отслеживать в ограниченной по размеру таблице (32 для обеих описанных выше архитектур), а в случае нехватки этого размера могут остаться лишние зависимости по данным, что тоже отрицательно скажется на скорости работы кода.The disadvantage of this method is that instead of each reading of the source code, 2 reading operations are performed - a speculative reading operation and a verification reading operation, which slows down execution even in the absence of a conflict in addresses. In addition, there is a limit on the number of available speculative reads, since they need to be tracked in a table limited in size (32 for both architectures described above), and if this size is not enough, unnecessary data dependencies may remain, which will also negatively affect the speed of the code. .

Целью способа является повышение скорости исполнения кода за счет сокращения количества операций, создающихся при использовании механизма спекулятивных чтений, а также за счет обеспечения применимости механизма спекулятивного исполнения для большего числа чтений исходного кода.The purpose of the method is to increase the speed of code execution by reducing the number of operations created when using the speculative reading mechanism, as well as by ensuring the applicability of the speculative execution mechanism for a larger number of source code readings.

Поставленная цель достигается за счет того, что при исполнении программного кода процессор для нескольких операций чтения по общему адресу исполняет только одну общую спекулятивную операцию чтения и одну общую операцию проверочного чтения, а сохранение данных от остальных чтений для компенсирующего кода реализует без использования спекулятивных операций.This goal is achieved due to the fact that when executing program code, the processor performs only one common speculative read operation and one common verification read operation for several reading operations at a common address, and saves data from the remaining readings for the compensating code without using speculative operations.

Краткое описание чертежей и схем, иллюстрирующих предлагаемый способ и его применение в процессе планирования машинного кода:Brief description of drawings and diagrams illustrating the proposed method and its application in the machine code planning process:

Фиг. 1 - Схема процесса компиляции кода с оптимизацией.Fig. 1 - Scheme of the code compilation process with optimization.

Фиг. 2 - Схема спекулятивного исполнения кода с указанием точки отличия от прототипа.Fig. 2 - Scheme of speculative code execution indicating the point of difference from the prototype.

Фиг. 3 - Сравнительная схема работы прототипа и предлагаемого способа.Fig. 3 - Comparative diagram of the operation of the prototype and the proposed method.

После первичной обработки исходного кода компилятор формирует промежуточное представление, а также получает информацию о наличии различных вычислительных устройств, с учетом которых может принимать решения об использовании механизма спекулятивного планирования (Фиг. 1). На этапе планирования операций сформированных в процессе компиляции гиперузлов в случае наличия потенциальных конфликтов между операциями записи и чтения необходимо произвести проверку наличия нескольких операций чтения по одному адресу (Фиг. 2). Если они нашлись, то к ним предлагается применять способ исполнения процессором программ с использованием таблицы адресов спекулятивности по данным. Для остальных чтений при этом используется традиционный способ (прототип).After initial processing of the source code, the compiler generates an intermediate representation and also receives information about the presence of various computing devices, taking into account which it can make decisions about using the speculative planning mechanism (Fig. 1). At the stage of planning operations of hypernodes formed during the compilation process, in the event of potential conflicts between write and read operations, it is necessary to check for the presence of several read operations at the same address (Fig. 2). If they are found, then it is proposed to apply to them the method of executing programs by the processor using a data speculative address table. For other readings, the traditional method (prototype) is used.

Осуществление способа исполнения процессором программ с использованием таблицы адресов спекулятивности по данным заключается в следующем:The implementation of a method for executing programs by a processor using a data speculative address table is as follows:

1) Строится одна операция спекулятивного чтения по общему адресу перед потенциально конфликтующими с чтениями операциями записи.1) One speculative read operation is constructed at a common address before write operations that potentially conflict with reads.

2) Вместо результатов исходных чтений для всех соответствующих потребителей в основном коде (не компенсирующем) используется результат построенного спекулятивного чтения.2) Instead of the results of the original reads for all corresponding consumers, the result of the constructed speculative read is used in the main code (not compensating).

3) В точке самого последнего исходного чтения по общему адресу строится операция проверочного чтения, во всех остальных точках чтений по общему адресу строятся обычные (не спекулятивные) чтения, результат которых будет использоваться на общем компенсирующем коде (Фиг 3).3) At the point of the most recent initial reading at the common address, a verification read operation is constructed; at all other reading points at the common address, ordinary (non-speculative) readings are constructed, the result of which will be used in the general compensating code (Figure 3).

4) Строится компенсирующий код для всех спекулятивно исполненных операций. В нем для того потребителя чтения (use), который использовал результат последнего чтения, используется результат проверочного чтения. Для остальных потребителей используются результаты соответствующих обычных чтений.4) A compensating code is built for all speculatively executed operations. In it, for the read consumer (use) that used the result of the last read, the result of the test read is used. For other consumers, the results of the corresponding routine readings are used.

При применении описанного способа к n исходным чтениям по общему адресу строится 1 операция спекулятивного чтения (speculative load), 1 операция проверочного чтения (check load) и (n-1) обычная операция чтения (load), то есть суммарно n+1 операция чтения. При этом в таблице (DAM или ALAT) для всех n исходных чтений создается только одна запись.When applying the described method to n initial reads at a common address, 1 speculative read operation (speculative load), 1 check read operation (check load) and (n-1) regular read operation (load) are built, that is, a total of n+1 read operations . In this case, only one record is created in the table (DAM or ALAT) for all n source reads.

В случае же использования прототипа для n исходных чтений по общему адресу строится n спекулятивных операций чтения и n проверочных чтений, то есть 2*n операций чтения. При этом в таблице (DAM или ALAT) для каждого исходного чтения создается своя запись, то есть для n чтений создается n записей.In the case of using a prototype for n initial reads at a common address, n speculative read operations and n test reads are constructed, that is, 2*n read operations. In this case, in the table (DAM or ALAT), for each initial read, its own record is created, that is, for n readings, n records are created.

Итого применение описанного способа для каждой группы из п чтений по общему адресу позволяет по сравнению с прототипом сократить суммарное количество требуемых операций чтения с 2*n до (n+1) операций, а также сократить использование таблицы (DAM или ALAT) с n записей до одной. Сокращение суммарного количества операций чтения позволяет ускорить исполнение кода за счет меньшего количества требуемых вычислений. Сокращение использования таблицы позволяет спланировать спекулятивное исполнение большего количества чтений исходного кода по сравнению с прототипом, то есть ускорять исполнение кода за счет повышения его спекулятивности.In total, the application of the described method for each group of n readings at a common address allows, in comparison with the prototype, to reduce the total number of required read operations from 2*n to (n+1) operations, as well as to reduce the use of a table (DAM or ALAT) from n records to one. Reducing the total number of read operations allows for faster code execution due to fewer calculations required. Reducing the use of the table allows you to plan the speculative execution of more readings of the source code compared to the prototype, that is, speed up the execution of the code by increasing its speculativeness.

Разработанный способ позволяет максимально минимизировать использование операций спекулятивного чтения и операций проверок результата чтения, что повышает скорость исполнения итогового машинного кода на архитектурах с динамическим механизмом разрыва зависимостей по типу DAM или ALAT.The developed method makes it possible to minimize the use of speculative reading operations and operations of checking the reading result, which increases the speed of execution of the final machine code on architectures with a dynamic mechanism for breaking dependencies such as DAM or ALAT.

Claims

A method for executing programs by a processor using a table of data speculative addresses, including a computer complex containing a processor with VLIW architecture, implementing a DAM or ALAT mechanism that supports the execution of speculative reading operations with entry into the data speculative address table, verification and regular reading, a compiler that translates program code to the processor, characterized in that when executing the program code, the processor for several read operations at a common address performs only one common speculative read operation and one common verification read operation, and saves data from the remaining readings for the compensating code without using speculative operations.