RU2820021C1

RU2820021C1 - Method for pipeline processing of instructions for computer with vliw processor and optimizing compiler and computer for implementing method

Info

Publication number: RU2820021C1
Application number: RU2024103298A
Authority: RU
Inventors: Фёдор Анатольевич Груздов; Мурад Искендер-оглы Нейман-заде; Виктор Евгеньевич Шампаров
Original assignee: Акционерное общество "МЦСТ"
Filing date: 2024-02-09
Publication date: 2024-05-28

Abstract

FIELD: computer engineering.

SUBSTANCE: preparatory conveyor performs preparatory processing of a sequence of wide commands, each of which includes S personalized commands, and i-th execution conveyor is able to execute sequence of i-th personalized commands, where i takes values from 1 to S, wherein the sequence of wide commands includes those S of the initial branches, which have the highest inherent probabilities, and those S subsequent branches, which have the highest joint probabilities with any initial branches from S initial branches included in the wide instruction.

EFFECT: shorter time spent by a computer equipped with a VLIW processor to execute a branched program, which increases speed and efficiency of the computer.

4 cl, 4 dwg, 2 tbl

Description

Область техникиTechnical field

[1] Изобретение относится к компьютерной технике, в частности к компьютерам, снабженным микропроцессорами с параллельным исполнением нескольких команд, а более точно к компьютерам с микропроцессорами, выполненными в архитектуре VLIW (Very Long Instruction Word, далее - VLIW-процессоры).[1] The invention relates to computer technology, in particular to computers equipped with microprocessors with parallel execution of several instructions, and more precisely to computers with microprocessors made in the VLIW architecture ( Very Long Instruction Word , hereinafter referred to as VLIW processors).

Предпосылки к созданию изобретенияPrerequisites for creating an invention

[2] Настоящее изложение раскрывает конфигурацию компьютера, создающую наилучшие условия для работы VLIW-процессора и позволяющую в максимальной степени реализовать его преимущества. Соответственно, предпосылки к созданию изобретения, показанные на примере VLIW-процессора, справедливы и для компьютера, оснащенного VLIW-процессором.[2] This presentation reveals the computer configuration that creates the best conditions for the operation of the VLIW processor and allows you to realize its benefits to the maximum extent. Accordingly, the prerequisites for the creation of the invention, shown in the example of a VLIW processor, are also valid for a computer equipped with a VLIW processor.

[3] VLIW-процессор реализует общеизвестный конвейерный процесс обработки команд (далее также - конвейерный процесс), в котором каждая команда последовательно проходит через несколько стадий обработки, таких как выборка, дешифрация, исполнение, обращение к запоминающему устройству и обратная запись результата. В дальнейшем изложении комплекс технических средств, входящих в состав процессора и непосредственно связанных с осуществлением конвейерного процесса, именуется «конвейер».[3] The VLIW processor implements the well-known pipelined command processing process (hereinafter also referred to as the pipeline process), in which each command sequentially passes through several processing stages, such as fetching, decryption, execution, accessing the storage device and writing back the result. In the following discussion, the complex of technical means included in the processor and directly related to the implementation of the conveyor process is called a “conveyor”.

[4] Классический конвейер позволяет осуществлять одновременную обработку нескольких команд, причем каждая из команд в определенный момент времени находится на одной из стадий обработки, и на каждой стадии в определенный момент времени может обрабатываться только одна команда. Тем не менее, VLIW-процессор снабжен несколькими функциональными блоками, позволяющими, начиная со стадии исполнения, обрабатывать несколько команд одновременно уже на каждой стадии. Другими словами, конвейер типичного VLIW-процессора включает в себя один подготовительный конвейер, выполняющий стадии выборки и дешифрации, и несколько исполнительных конвейеров, выполняющих стадии исполнения, обращения к памяти и обратной записи.[4] The classic pipeline allows for simultaneous processing of several commands, with each command at a certain point in time being at one of the processing stages, and at each stage only one command can be processed at a certain point in time. However, the VLIW processor is equipped with several functional blocks that allow, starting from the execution stage, to process several commands simultaneously at each stage. In other words, the pipeline of a typical VLIW processor includes one preparation pipeline that performs the fetch and decryption stages, and several execution pipelines that perform the execution, memory access, and writeback stages.

[5] Однако один подготовительный конвейер не может вывести на стадию исполнения несколько команд одновременно, поскольку на каждой своей стадии он может обрабатывать только одну команду. В компьютере, оснащенном VLIW-процессором, данная проблема решается с помощью операции, выполняемой на этапе компиляции программы, а именно - операции объединения нескольких команд (далее - персонализированная команда) в одну команду (далее - широкая команда). Соответственно, на стадиях выборки и дешифрации широкая команда проходит обработку как одна команда, а на стадии исполнения разделяется на несколько персонализированных команд, каждая из которых поступает на исполнение в предписанный ей функциональный блок. Одно из названий этой широкой команды на английском языке по существу и зашифровано в аббревиатуре VLIW. Возможность использования лишь одного подготовительного конвейера для одновременной обработки нескольких персонализированных определяет преимущество VLIW-процессора над суперскалярными процессорами, которое выражается в минимизации числа компонентов процессора, снижении энергопотребления и тепловыделения. Описанная конфигурация VLIW-процессора известна специалисту в данной области, например, из патентной публикации US2012151192A1, 14.06.2012.[5] However, one preparation pipeline cannot output several commands to the execution stage at the same time, since it can only process one command at each stage. In a computer equipped with a VLIW processor, this problem is solved by an operation performed at the program compilation stage, namely, the operation of combining several commands (hereinafter referred to as a personalized command) into one command (hereinafter referred to as a wide command). Accordingly, at the sampling and decryption stages, a wide command is processed as one command, and at the execution stage it is divided into several personalized commands, each of which is sent for execution to its assigned functional block. One of the names of this broad command in English is essentially encrypted in the acronym VLIW. The ability to use only one preparation pipeline for simultaneous processing of several personalized ones determines the advantage of the VLIW processor over superscalar processors, which is expressed in minimizing the number of processor components, reducing power consumption and heat dissipation. The described configuration of a VLIW processor is known to a person skilled in the art, for example, from patent publication US2012151192A1, 06/14/2012.

[6] Следует отметить, что наиболее ярко сильные стороны VLIW-процессора проявляются при ветвлении программы, когда из нескольких альтернативных ветвей персонализированных команд должна быть выбрана одна целевая ветвь. В этом случае посредством включения персонализированных команд из каждой альтернативной ветви в одну широкую команду и обработки последовательности таких широких команд, VLIW-процессор обеспечивает обработку всех альтернативных ветвей до того, как из них будет выбрана целевая ветвь. Благодаря данному свойству VLIW-процессора, при выполнении команды передачи управления VLIW-процессор, как правило, уже располагает прошедшей через стадии выборки и дешифрации целевой ветвью персонализированных команд независимо от того, какая из альтернативных ветвей персонализированных команд будет выбрана в качестве целевой ветви. Наличие целевой ветви, подготовленной к передаче исполнительному конвейеру, существенно увеличивает быстродействие VLIW-процессора при прохождении разветвляющегося участка программы.[6] It should be noted that the strengths of the VLIW processor are most clearly manifested when branching a program, when one target branch must be selected from several alternative branches of personalized commands. In this case, by including the personalized instructions from each alternative branch into one broad instruction and processing a sequence of such wide instructions, the VLIW processor ensures that all alternative branches are processed before the target branch is selected from them. Due to this property of the VLIW processor, when executing a control transfer command, the VLIW processor, as a rule, already has a target branch of personalized commands that has passed through the stages of sampling and decryption, regardless of which of the alternative branches of personalized commands is selected as the target branch. The presence of a target branch prepared for transmission to the executive pipeline significantly increases the speed of the VLIW processor when passing through a branching section of the program.

[7] Тем временем программа может содержать несколько последовательных ветвлений, в которых каждая команда передачи управления (branch instruction) может порождать множество альтернативных ветвей, превышающее количество исполнительных конвейеров. В этой ситуации, широкие команды уже не могут вместить в себя все альтернативные ветви персонализированных команд по той причине, что число персонализированных команд в широкой команде ограничено числом исполнительных конвейеров. При передаче управления на целевую ветвь, персонализированные команды которой не включались в широкие команды, подготовительный конвейер должен быть загружен заново, и в связи с этим быстродействие VLIW-процессора естественным образом замедляется. Обусловленное перезагрузкой подготовительного конвейера увеличение времени, необходимого VLIW-процессору для выполнения программы с таким развитым ветвлением, вызывает избыточное потребление электроэнергии и повышенное тепловыделение, требующее принятия дополнительных мер по организации охлаждения.[7] Meanwhile, a program can contain several sequential branches, in which each branch instruction can generate many alternative branches, exceeding the number of execution pipelines. In this situation, broad teams can no longer accommodate all the alternative branches of personalized teams for the reason that the number of personalized teams in a broad team is limited by the number of execution pipelines. When transferring control to a target branch whose personalized commands were not included in the broad commands, the preparation pipeline must be reloaded, and therefore the performance of the VLIW processor naturally slows down. The increase in the time required for the VLIW processor to execute a program with such advanced branching due to the reboot of the preparation pipeline causes excessive power consumption and increased heat generation, requiring additional cooling measures.

[8] Ввиду данного обстоятельства особенно важное значение для VLIW-процессора приобретает точность подбора методики, используемой для определения вероятности передачи управления на каждую из альтернативных ветвей. Методически верное определение вероятности передачи управления необходимо в целях включения в последовательность широких команд только наиболее вероятных альтернативных ветвей, чтобы при выполнении команды передачи управления именно одна из них оказалась целевой ветвью. В оснащенных VLIW-процессорами известных компьютерах общепринятым считается подход, в котором вероятность передачи управления на конкретную альтернативную ветвь оценивается по статистике результатов команды передачи управления, порождающей эту альтернативную ветвь, а именно на основе соотношения числа передач управления, выполненных на эту альтернативную ветвь, и общего числа выполненных передач управления. Тем не менее, число промахов при загрузке альтернативных ветвей в подготовительный конвейер остается достаточно большим, что заметным образом отрицательно сказывается на быстродействии VLIW-процессора.[8] In view of this circumstance, the accuracy of the selection of the methodology used to determine the probability of control transfer to each of the alternative branches becomes especially important for the VLIW processor. A methodically correct determination of the probability of control transfer is necessary in order to include in the sequence of broad commands only the most probable alternative branches, so that when the control transfer command is executed, exactly one of them turns out to be the target branch. In known computers equipped with VLIW processors, a generally accepted approach is in which the probability of control transfer to a specific alternative branch is estimated based on the statistics of the results of the control transfer command that generates this alternative branch, namely, based on the ratio of the number of control transfers performed to this alternative branch and the total number of control transfers performed. However, the number of misses when loading alternative branches into the preparation pipeline remains quite large, which significantly negatively affects the performance of the VLIW processor.

[9] Обратим внимание, что современные суперскалярные процессоры способны сохранять команды значительного числа альтернативных ветвей уже после их обработки подготовительными конвейерами. Соответственно, если целевая ветвь не была включена в поступающую во VLIW-процессор последовательность широких команд, то время, требуемое VLIW-процессору на передачу целевой ветви в исполнительные конвейеры, повышается в сравнении с такими суперскалярными процессорами по меньшей мере на период обработки целевой ветви подготовительным конвейером. Учитывая характерную для VLIW-процессора высокую цену ошибки, связанной с невключением целевой ветви в последовательность широких команд, представляется весьма желательным усовершенствовать методику определения вероятности передачи управления на альтернативные ветви с тем, чтобы состав наиболее вероятных альтернативных ветвей, включенных в последовательность широких команд, в подавляющем большинстве случаев включал бы целевую ветвь.[9] Please note that modern superscalar processors are capable of storing commands of a significant number of alternative branches after they have been processed by preparation pipelines. Accordingly, if the target branch was not included in the sequence of wide instructions arriving at the VLIW processor, then the time required for the VLIW processor to transfer the target branch to the execution pipelines increases in comparison with such superscalar processors, at least for the period of processing of the target branch by the preparation pipeline . Taking into account the high cost of error characteristic of a VLIW processor associated with the non-inclusion of the target branch in the sequence of wide instructions, it seems highly desirable to improve the methodology for determining the probability of transfer of control to alternative branches so that the composition of the most probable alternative branches included in the sequence of wide instructions is overwhelmingly in most cases would include the target branch.

[10] Техническая проблема, на решение которой направлено изобретение, состоит в поиске способа, позволяющего увеличить производительность оснащенного VLIW-процессором компьютера при выполнении разветвленных программ и благодаря этому снизить потребление электроэнергии и уменьшить выделение тепла, а также поиске конфигурации компьютера для осуществления такого способа.[10] The technical problem to be solved by the invention is to find a way to increase the performance of a computer equipped with a VLIW processor when executing branched programs and thereby reduce power consumption and reduce heat generation, as well as to find a computer configuration to implement such a method.

Сущность изобретенияThe essence of the invention

[11] Для решения указанной технической проблемы предложены два объекта изобретения. В качестве первого объекта изобретения предложен способ конвейерной обработки команд, предназначенный для компьютера, процессор которого содержит подготовительный конвейер, а также S исполнительных конвейеров. Подготовительный конвейер при этом способен осуществлять подготовительную обработку последовательности широких команд, каждая из которых включает в себя S персонализированных команд, а i-й исполнительный конвейер способен выполнять последовательность i-х персонализированных команд, где i принимает значения от 1 до S. Согласно предложенному способу для ветвящегося потока команд, содержащего первую команду передачи управления, результатом которой является выбор одной исходной ветви из Q исходных ветвей, команду слияния всех исходных ветвей и вторую команду передачи управления, результатом которой является выбор одной последующей ветви из R последующих ветвей, где Q>S и R>S, собирают статистику результатов первой команды передачи управления и для каждого результата первой команды передачи управления собирают статистику результатов второй команды передачи управления. На основе собранных статистик определяют собственную вероятность для каждой исходной ветви и совместную вероятность для каждой пары из одной исходной и одной последующей ветви. Далее в последовательность широких команд включают те S исходных ветвей, которые имеют наибольшие по величине собственные вероятности, и те S последующих ветвей, которые имеет наибольшие по величине совместные вероятности с любыми исходными ветвями из S исходных ветвей, включенных в широкую команду.[11] To solve this technical problem, two objects of the invention are proposed. As a first object of the invention, a method for pipeline processing of commands is proposed, intended for a computer whose processor contains a preparation pipeline, as well as S execution pipelines. The preparation pipeline is capable of performing preparatory processing of a sequence of wide commands, each of which includes S personalized commands, and the i-th executive pipeline is capable of executing a sequence of i-th personalized commands, where i takes values from 1 to S. According to the proposed method for a branching command stream containing a first control transfer command, the result of which is the selection of one source branch from Q source branches, a command to merge all source branches, and a second control transfer command, the result of which is the selection of one subsequent branch from R subsequent branches, where Q>S and R>S, statistics of the results of the first control transfer command are collected, and for each result of the first control transfer command, statistics of the results of the second control transfer command are collected. Based on the collected statistics, the own probability is determined for each initial branch and the joint probability for each pair from one initial and one subsequent branch. Next, the sequence of broad commands includes those S initial branches that have the largest own probabilities, and those S subsequent branches that have the largest joint probabilities with any initial branches from the S initial branches included in the broad command.

[12] В качестве второго объекта изобретения предложен компьютер, содержащий процессор, способный обрабатывать ветвящийся поток команд, регистратор передачи управления и машиночитаемый носитель, который может быть прочитан процессором, и на котором записан компилятор. Процессор содержит подготовительный конвейер, а также S исполнительных конвейеров. Подготовительный конвейер способен осуществлять подготовительную обработку последовательности широких команд, каждая из которых включает в себя S персонализированных команд. В свою очередь, i-й исполнительный конвейер из S исполнительных конвейеров способен выполнять последовательность i-х персонализированных команд из S последовательностей персонализированных команд, где i может принимать значения от 1 до S. Ветвящийся поток команд содержит первую команду передачи управления, результатом которой является выбор одной исходной ветви из Q исходных ветвей, команду слияния всех исходных ветвей и вторую команду передачи управления, результатом которой является выбор одной последующей ветви из R последующих ветвей, где Q>S и R>S. Регистратор передачи управления способен собирать статистику результатов первой команды передачи управления и для каждого результата первой команды передачи управления способен собирать статистику результатов второй команды передачи управления. Компилятор способен на основе собранных статистик определять собственную вероятность для каждой исходной ветви и совместную вероятность для каждой пары из одной исходной и одной последующей ветви. В последовательность широких команд компилятор включает те S исходных ветвей, которые имеют наибольшие по величине собственные вероятности, и те S последующих ветвей, которые имеет наибольшие по величине совместные вероятности с любыми исходными ветвями из S исходных ветвей, включенных в широкую команду.[12] As a second aspect of the invention, there is provided a computer comprising a processor capable of processing a branching instruction stream, a control transfer recorder, and a computer readable medium that can be read by the processor and on which a compiler is recorded. The processor contains a preparation pipeline, as well as S execution pipelines. The preparation pipeline is capable of preparatory processing of a sequence of broad commands, each of which includes S personalized commands. In turn, the i-th execution pipeline from S execution pipelines is capable of executing a sequence of i-th personalized commands from S sequences of personalized commands, where i can take values from 1 to S. The branching command stream contains the first control transfer command, the result of which is the choice one source branch from Q source branches, a command to merge all source branches and a second control transfer command, the result of which is the selection of one subsequent branch from R subsequent branches, where Q>S and R>S. The control transfer recorder is capable of collecting statistics on the results of the first control transfer command and, for each result of the first control transfer command, is capable of collecting statistics on the results of the second control transfer command. The compiler is able, based on the collected statistics, to determine the own probability for each initial branch and the joint probability for each pair of one initial and one subsequent branch. In the sequence of broad commands, the compiler includes those S source branches that have the largest own probabilities, and those S subsequent branches that have the largest joint probabilities with any source branches from the S source branches included in the wide command.

[13] Технический результат изобретения состоит в уменьшении времени, затрачиваемого компьютером, оснащенным VLIW-процессором, на выполнение разветвленной программы, что повышает быстродействие и производительность компьютера, снижает потребление энергии и выделение тепла, т.е. является решением поставленной перед изобретением технической проблемы. Следует отметить, что в контексте настоящего изложения понятие «выполнение программы» означает выполнение тех входящих в программу персонализированных команд, которые позволяют пройти путь от начальной персонализированной команды до конечной. Поскольку программа может содержать несколько таких путей, то понятие «выполнение программы» не подразумевает обязательное выполнение всех входящих в программу персонализированных команд.[13] The technical result of the invention is to reduce the time spent by a computer equipped with a VLIW processor on executing a branched program, which increases the speed and performance of the computer, reduces energy consumption and heat generation, i.e. is a solution to the technical problem posed by the invention. It should be noted that in the context of this presentation, the concept of “program execution” means the execution of those personalized commands included in the program that allow you to go from the initial personalized command to the final one. Since a program can contain several such paths, the concept of “program execution” does not imply the mandatory execution of all personalized commands included in the program.

[14] Причинно-следственная связь между признаками изобретения и техническим результатом заключается в следующем. Выбор альтернативных ветвей, а именно последующих ветвей для загрузки в подготовительный конвейер осуществляется на основе их совместной вероятности с каждой из загружаемых в подготовительный конвейер исходных ветвей. Благодаря этому вероятность того, что среди загруженных альтернативных ветвей окажется целевая ветвь, повышается, а значит снижается количество перезагрузок подготовительного конвейера, вызывающих длительные остановки.[14] The cause-and-effect relationship between the features of the invention and the technical result is as follows. The selection of alternative branches, namely subsequent branches for loading into the preparation pipeline, is carried out on the basis of their joint probability with each of the initial branches loaded into the preparation pipeline. Thanks to this, the probability that the target branch will be among the loaded alternative branches increases, which means that the number of reloads of the preparation pipeline that cause long stops is reduced.

[15] Поскольку повышенное быстродействие VLIW-процессора обеспечивает предложенному компьютеру, реализующему предложенный способ, возможность выполнения программы за меньшее время, то по сравнению с известными компьютерами для выполнения одной и той же программы предложенный компьютер, в частности его VLIW-процессор, расходует меньше электроэнергии и выделяет меньше тепла.[15] Since the increased speed of the VLIW processor provides the proposed computer that implements the proposed method with the ability to execute a program in less time, then compared to known computers for executing the same program, the proposed computer, in particular its VLIW processor, consumes less electricity and produces less heat.

[16] В частном случае первого объекта изобретения в последовательность i-х персонализированных команд включают ту одну исходную ветвь из множества исходных ветвей, которая имеет i-ю по величине собственную вероятность, и ту одну последующую ветвь из оставшегося множества последующих ветвей, которая имеет i-ю по величине совместную вероятность с любой исходной ветвью от 1-й до i-й.[16] In the special case of the first object of the invention, the sequence of i-th personalized commands includes that one initial branch from the set of initial branches that has the i-th largest intrinsic probability, and that one subsequent branch from the remaining set of subsequent branches that has i the -th largest joint probability with any initial branch from the 1st to the i-th.

[17] В частном случае второго объекта изобретения в последовательность i-х персонализированных команд компилятор включает ту одну исходную ветвь из множества исходных ветвей, которая имеет i-ю по величине собственную вероятность, и ту одну последующую ветвь из оставшегося множества последующих ветвей, которая имеет i-ю по величине совместную вероятность с любой исходной ветвью от 1-й до i-й.[17] In the special case of the second object of the invention, in the sequence of i-th personalized commands, the compiler includes that one initial branch from the set of initial branches that has the i-th largest intrinsic probability, and that one subsequent branch from the remaining set of subsequent branches that has the i-th largest joint probability with any initial branch from the 1st to the i-th.

[18] Технический результат частных случаев первого и второго объектов изобретения выраженно проявляется при выполнении программ, в которых альтернативные ветви являются аналогичными друг другу и содержат равное количество команд. Достигается упрощение алгоритма компилятора, а также за счет включения в широкие команды однотипных персонализированных команд повышается производительность процессора.[18] The technical result of special cases of the first and second objects of the invention is clearly manifested when executing programs in which alternative branches are similar to each other and contain an equal number of commands. The compiler algorithm is simplified, and processor performance is increased by including personalized commands of the same type in broad commands.

Краткое описание чертежейBrief description of drawings

[19] Осуществление изобретения будет пояснено ссылками на фигуры:[19] The implementation of the invention will be explained by reference to the figures:

Фиг. 1 - блок-схема компьютера, выполненного согласно второму объекту изобретения;Fig. 1 is a block diagram of a computer configured in accordance with a second aspect of the invention;

Фиг. 2 - схема выполняемой компьютером программы с иллюстрацией принципа формирования широких команд на основе известных решений (Сравнительный пример);Fig. 2 is a diagram of a computer-executed program illustrating the principle of generating broad commands based on known solutions (Comparative Example);

Фиг. 3 - схема выполняемой компьютером программы с иллюстрацией принципа формирования широких команд на основе изобретения (Пример 1);Fig. 3 is a diagram of a computer-executed program illustrating the principle of generating broad commands based on the invention (Example 1);

Фиг. 4 - схема выполняемой компьютером программы с иллюстрацией принципа формирования широких команд на основе изобретения (Пример 2).Fig. 4 is a diagram of a computer-executed program illustrating the principle of generating broad commands based on the invention (Example 2).

[20] Следует отметить, что форма и размеры отдельных элементов, отображенных на фигурах, являются условными и показаны так, чтобы наиболее наглядно проиллюстрировать взаимное расположение элементов предложенного компьютера, а также их причинно-следственную связь с техническим результатом. Кроме того, во избежание избыточного усложнения фигур некоторые элементы, как и взаимосвязи элементов, очевидные специалисту в данной области техники, могут быть не отображены. Фигуры также дополнены выполненными на английском языке буквенными и словесными обозначениями, которые являются общепринятыми в данной области техники, и которые способствуют более быстрому восприятию фигур специалистом в данной области техники.[20] It should be noted that the shape and dimensions of the individual elements displayed in the figures are conditional and are shown in such a way as to most clearly illustrate the relative arrangement of the elements of the proposed computer, as well as their cause-and-effect relationship with the technical result. In addition, in order to avoid unnecessary complication of the figures, some elements, as well as relationships between elements, obvious to a person skilled in the art may not be displayed. The figures are also supplemented by alphabetic and verbal symbols written in English, which are generally accepted in the art, and which facilitate faster perception of the figures by a person skilled in the art.

Осуществление изобретенияCarrying out the invention

[21] Осуществление изобретения будет показано на наилучших примерах его реализации, которые не являются ограничениями в отношении объема охраняемых прав.[21] The implementation of the invention will be shown using the best examples of its implementation, which are not restrictions on the scope of protected rights.

[22] На Фиг. 1 представлена блок-схема компьютера 100, предложенного в качестве второго объекта изобретения и реализующего предложенный способ по первому объекту изобретения. Компьютер 100 содержит VLIW-процессор 1, машиночитаемый носитель 2, который может быть прочитан VLIW-процессором 1, и на котором записан компилятор (compiler), и регистратор 3 передачи управления (BR - branch recorder).[22] In FIG. 1 is a block diagram of a computer 100 proposed as a second aspect of the invention and implementing the proposed method according to the first aspect of the invention. The computer 100 includes a VLIW processor 1, a machine-readable medium 2 that can be read by the VLIW processor 1 and on which a compiler is recorded, and a branch recorder 3.

[23] VLIW-процессор 1 содержит счетчик 10 команд (PC1 - program counter), память 20 команд (IM1 - instruction memory), регистровый файл 30 (RF1 - register file), первый арифметико-логический функциональный блок 40 (ALC1 - arithmetic-logical channel), второй арифметико-логический функциональный блок 50 (ALC2), память 60 данных (DM - data memory). Первый и второй арифметико-логические функциональные блоки 40 и 50 далее также кратко именуются как «первый и второй функциональные блоки 40 и 50».[23] VLIW processor 1 contains a 10 instruction counter (PC1 - program counter ), a 20 instruction memory (IM1 - instruction memory ), a register file 30 (RF1 - register file ), the first arithmetic-logical functional block 40 (ALC1 - arithmetic- logical channel ), second arithmetic-logical functional block 50 (ALC2), data memory 60 (DM - data memory ). The first and second arithmetic logic function blocks 40 and 50 are also referred to briefly as “first and second function blocks 40 and 50” hereinafter.

[24] Указанные элементы VLIW-процессора 1 представляют собой основные компоненты конвейера, задействованные в соответствующих стадиях конвейерного процесса, которые показаны в левой части Фиг. 1: выборка команды (IF - instruction fetch), дешифрация (D - decode), исполнение (EX - execute), обращение к памяти (M - memory) и обратная запись результата (WB - write back).[24] These elements of the VLIW processor 1 represent the main pipeline components involved in the respective stages of the pipeline process, which are shown on the left side of FIG. 1: instruction fetch (IF - instruction fetch ), decryption (D - decode ), execution (EX - execute ), memory access (M - memory ) and writing back the result (WB - write back ).

[25] Обратим внимание, что наличие во VLIW-процессоре 1 по одному элементу из перечисленных выше элементов 10, 20 и 30 эквивалентно наличию во VLIW-процессоре 1 одного подготовительного конвейера, способного в каждый момент времени обрабатывать на каждой из стадий IF и D только по одной команде, которая представляет собой широкую команду. Тем временем наличие во VLIW-процессоре 1 двух функциональных блоков 40 и 50 эквивалентно наличию во VLIW-процессоре 1 двух исполнительных конвейеров, способных в каждый момент времени обрабатывать на каждой из стадий E, M и WB сразу по две команды, которые представляют собой персонализированные команды, выделенные на стадии D из широкой команды.[25] Please note that the presence in VLIW processor 1 of one element each from the elements 10, 20 and 30 listed above is equivalent to the presence in VLIW processor 1 of one preparation pipeline capable of processing at each stage IF and D only one team at a time, which is a wide team. Meanwhile, the presence in the VLIW processor 1 of two functional blocks 40 and 50 is equivalent to the presence in the VLIW processor 1 of two execution pipelines capable of processing two commands at each stage E, M and WB at any given time, which are personalized commands , allocated at stage D from the wider team.

[26] Хотя на Фиг. 1 показаны только два исполнительных конвейера, VLIW-процессор 1 может содержать, по существу, любое количество исполнительных конвейеров, при этом как правило, не все из этих исполнительных конвейеров содержат упомянутые выше арифметико-логические функциональные блоки. Некоторые исполнительные конвейеры VLIW-процессора 1 вместо арифметико-логических функциональных блоков могут содержать непоказанные на Фиг. 1 предикатно-логические функциональные блоки (PLC - predicate logical channel), назначение и функционирование которых известны специалисту в данной области. Например, VLIW-процессор 1 может содержать шесть арифметико-логических функциональных блоков и три предикатно-логических функциональных блока, а значит может выполнять широкую команду, включающую девять персонализированных команд.[26] Although in FIG. 1 shows only two execution pipelines, the VLIW processor 1 can contain essentially any number of execution pipelines, and typically not all of these execution pipelines contain the arithmetic logic function blocks mentioned above. Some execution pipelines of VLIW processor 1 may contain, instead of arithmetic-logical functional blocks, not shown in FIG. 1 predicate logical functional blocks (PLC - predicate logical channel ), the purpose and operation of which are known to a person skilled in the art. For example, VLIW processor 1 may contain six arithmetic logic function blocks and three predicate logic function blocks, and therefore can execute a broad instruction including nine personalized instructions.

[27] Однако упомянутые предикатно-логические функциональные блоки и выполняемые ими персонализированные команды, включенные компилятором в широкие команды, имеют лишь вспомогательные функции для выполнения программы и не находятся во взаимосвязи с содержанием программы. Исходя из этого, в контексте настоящего изложения в количество S исполнительных конвейеров включены только исполнительные конвейеры, содержащие арифметико-логические функциональные блоки, а в количество S персонализированных команд включены только персонализированные команды, предназначенные к выполнению арифметико-логическими функциональными блоками. Для VLIW-процессора 1 S=2, т.е. каждая широкая команда содержит первую и вторую персонализированные команды, которые поступают на выполнение соответственно в первый и второй арифметико-логические функциональные блоки 40 и 50.[27] However, the mentioned predicate-logical functional blocks and the personalized commands they execute, included by the compiler in broad commands, have only auxiliary functions for program execution and are not related to the content of the program. Based on this, in the context of this presentation, the number S of execution pipelines includes only execution pipelines containing arithmetic-logical functional blocks, and the number S of personalized commands includes only personalized commands intended to be executed by arithmetic-logical functional blocks. For VLIW processor 1 S=2, i.e. each wide command contains first and second personalized commands, which are supplied for execution to the first and second arithmetic-logical functional blocks 40 and 50, respectively.

[28] Следует также отметить, что часть конвейерного процесса, реализуемая подготовительным конвейером VLIW-процессора 1, может содержать гораздо больше стадий, чем указанные выше стадии IF и D. Аналогичное утверждение справедливо и для первого и второго исполнительных конвейеров VLIW-процессора 1, которые помимо стадий EX, M и WB могут осуществлять иные стадии конвейерного процесса. Принципы увеличения числа стадий конвейерного процесса, как правило, основанные на разделении указанных выше стадий на ряд более мелких стадий в целях уменьшения длительности такта, известны специалисту в данной области техники.[28] It should also be noted that the part of the pipeline process implemented by the preparation pipeline of VLIW processor 1 may contain many more stages than the above stages IF and D. A similar statement is true for the first and second execution pipelines of VLIW processor 1, which in addition to the EX, M and WB stages, they can carry out other stages of the conveyor process. The principles of increasing the number of stages in a conveyor process, typically based on dividing the above stages into a number of smaller stages in order to reduce cycle time, are known to one skilled in the art.

[29] Далее, в состав VLIW-процессора 1 входит блок управления (не показан), обеспечивающий выработку и передачу управляющих сигналов на перечисленные выше элементы. Кроме того, VLIW-процессор 1 содержит множество элементов, исполняющих тривиальные функции в конвейерном процессе и являющихся очевидными специалисту в данной области, таких как регистры, мультиплексоры, шины передачи данных и т.п. Некоторые из таких элементов отображены на Фиг. 1 и будут раскрыты по ходу изложения.[29] Further, the VLIW processor 1 includes a control unit (not shown) that ensures the generation and transmission of control signals to the elements listed above. In addition, the VLIW processor 1 contains many elements that perform trivial functions in the pipeline process and are obvious to a person skilled in the art, such as registers, multiplexers, data buses, and the like. Some of these elements are shown in FIG. 1 and will be revealed as the presentation progresses.

[30] Память 20 команд представляет собой раздел кэш-памяти, в котором сохранен массив широких команд, подлежащих выполнению в ближайшее время. По сигналу, поступающему от счетчика команд 10 по шине 11, из указанного массива широких команд производится выборка той широкой команды, которая должна быть выполнена следующей. Под выборкой широкой команды на стадии IF понимается выдача из памяти 20 команд N-битового сигнала, который указывает адреса as11, as12 регистров для исходных операндов первой персонализированной команды, адреса as21, as22 регистров для исходных операндов второй персонализированной команды, адреса ad1, ad2 регистров для результирующих операндов соответственно первой и второй персонализированных команд, а также коды opc1, opc2 операций, осуществляемых соответственно первой и второй персонализированными командами.[30] The 20 instruction memory is a cache section that stores an array of broad instructions to be executed in the near future. Based on the signal coming from the command counter 10 on the bus 11, from the specified array of wide commands, the wide command that must be executed next is selected. By fetching a wide instruction at the IF stage we mean issuing from memory 20 instructions an N-bit signal that indicates the addresses as11, as12 of registers for the source operands of the first personalized instruction, the addresses of as21, as22 registers for the source operands of the second personalized instruction, the addresses ad1, ad2 of registers for the resulting operands of the first and second personalized commands, respectively, as well as the codes opc1, opc2 of the operations carried out by the first and second personalized commands, respectively.

[31] Адреса as11, as12, as21, as22, ad1, ad2 регистров поступают из памяти 20 команд по шине 21 в регистровый файл 30, который представляет собой набор регистров, способных сохранять числовые данные целочисленного типа, с плавающей запятой и т.д. В свою очередь, коды opc1, opc2 операций поступают из памяти 20 команд в упомянутый выше блок управления. Регистровый файл 30 направляет исходные операнды src11 и src12, прочитанные в регистрах по адресам as11, as12, в первый функциональный блок 40 по шине 31, и по шине 32 направляет во второй функциональный блок 50 исходные операнды src21 и src22, прочитанные в регистрах по адресам as21, as22. Поступление соответствующих исходных операндов в первый и второй функциональные блоки 40 и 50, а также поступление кодов операций opc1 и opc2 в блок управления завершает стадию D, а вместе с ней и работу подготовительного конвейера по подготовительной обработке широкой команды и выделению из широкой команды первой и второй персонализированных команд.[31] Register addresses as11, as12, as21, as22, ad1, ad2 come from instruction memory 20 via bus 21 to register file 30, which is a set of registers capable of storing numeric data of integer type, floating point, etc. In turn, the opc1, opc2 operation codes come from the 20 command memory to the above-mentioned control unit. Register file 30 sends the source operands src11 and src12, read in registers at addresses as11, as12, to the first functional block 40 via bus 31, and via bus 32 sends source operands src21 and src22, read in registers at addresses as21, to the second functional block 50 , as22. The arrival of the corresponding source operands into the first and second functional blocks 40 and 50, as well as the arrival of the opc1 and opc2 operation codes into the control unit, completes stage D, and with it the work of the preparatory pipeline for the preparatory processing of the wide command and the selection of the first and second from the wide command personalized commands.

[32] В состав первого функционального блока 40 включено арифметико-логическое устройство 45 (АЛУ, ALU - arithmetic-logical unit). Исходные операнды src11 и src12 поступают в АЛУ 45, в котором над ними выполняется операция, соответствующая коду opc1, после чего результат ALUres1 выполнения первой персонализированной команды по шине 41 направляется в память 60 данных или в регистровый файл 30, где записывается в регистр ad1 в качестве результирующего операнда первой персонализированной команды.[32] The first functional block 40 includes an arithmetic-logical unit 45 (ALU, ALU - arithmetic-logical unit ). The source operands src11 and src12 enter the ALU 45, in which an operation corresponding to the opc1 code is performed on them, after which the result ALUres1 of the execution of the first personalized instruction on the bus 41 is sent to the data memory 60 or to the register file 30, where it is written to the ad1 register as the resulting operand of the first personalized instruction.

[33] Аналогично, исходные операнды src21 и src22 поступают в АЛУ 55, в котором над ними осуществляется операция, соответствующая коду opc2. Затем результат ALUres2 выполнения второй персонализированной команды по шине 51 направляется в память 60 данных или в регистровый файл 30 для записи в регистр ad2. На этом стадия EX конвейерного процесса, предусматривающая одновременное выполнение первой и второй персонализированных команд при помощи первого и второго исполнительных конвейеров, завершается.[33] Similarly, the source operands src21 and src22 are supplied to the ALU 55, in which the operation corresponding to the opc2 code is performed on them. The result ALUres2 of executing the second personalized instruction is then sent via bus 51 to data memory 60 or register file 30 to be written to register ad2. At this point, the EX stage of the pipeline process, which involves simultaneous execution of the first and second personalized commands using the first and second execution pipelines, is completed.

[34] Далее, конвейерный процесс обработки некоторых персонализированных команд, таких как ld (load - загрузка (также - чтение) данных из памяти) или st (store - сохранение данных в память), включает обращение к памяти 60 данных, выполняемое с использованием шин 42 и 52, соединяющих память 60 данных соответственно с шинами 41 и 51. Память 60 данных представляет собой раздел кэш-памяти, сохраняющий массив данных, которые с большой вероятностью будут затребованы в ближайшее время. Передача данных по шинам 42 и 52 в память 60 данных или из нее представляет собой суть того действия, которое выполняется первым и вторым исполнительными конвейерами на стадии M.[34] Further, the pipeline process for processing certain personalized commands, such as ld ( load - loading (also reading) data from memory) or st ( store - storing data into memory), involves accessing the data memory 60, performed using buses 42 and 52 connecting data memory 60 to buses 41 and 51, respectively. Data memory 60 is a cache section storing an array of data that is likely to be requested in the near future. Transferring data on buses 42 and 52 to or from data memory 60 is the essence of the action that is performed by the first and second execution pipelines in the M stage.

[35] На стадии WB данные, прочитанные из памяти 60 данных или являющиеся результатом выполненной АЛУ математической операции, передаются по шинам 41 и 51 для записи в регистры ad1, ad2 регистрового файла 30. Вместе со стадией WB на этом завершается весь цикл конвейерного процесса обработки персонализированных команд.[35] At the WB stage, data read from the data memory 60 or resulting from a mathematical operation performed by the ALU is transferred along buses 41 and 51 for writing to registers ad1, ad2 of the register file 30. Together with the WB stage, this completes the entire cycle of the pipelined processing process personalized commands.

[36] Следует отметить, что во VLIW-процессоре 1 под первым и вторым исполнительными конвейерами понимается совокупность элементов, являющихся необходимыми для осуществления стадий E, M, WB конвейерного процесса в отношении первой и второй персонализированных команд. В частности первый исполнительный конвейер включает в себя по меньшей мере первый функциональный блок 40, шины 41 и 42, а также входящие в блок управления устройства, управляющие первым функциональным блоком 40 и коммутирующими компонентами, которые реализуют функции обращения к памяти 60 данных и записи в регистровый файл 30. Аналогично второй исполнительный конвейер включает в себя по меньшей мере второй функциональный блок 50, шины 51 и 52, а также упомянутые управляющие устройства, входящие в блок управления, и коммутирующие компоненты.[36] It should be noted that in the VLIW processor 1, the first and second execution pipelines are understood to be a set of elements that are necessary to implement stages E, M, WB of the pipeline process in relation to the first and second personalized instructions. In particular, the first execution pipeline includes at least the first functional block 40, buses 41 and 42, as well as devices included in the control unit that control the first functional block 40 and switching components that implement the functions of accessing data memory 60 and writing to the register file 30. Likewise, the second execution pipeline includes at least a second functional block 50, buses 51 and 52, as well as the aforementioned control devices included in the control unit and switching components.

[37] Обратим внимание, что термины «первый» и «второй», используемые в отношении первого и второго исполнительных конвейеров VLIW-процессора 1, не имеют какого-либо иного смысла помимо указания на их различие, т.е. любой из двух исполнительных конвейеров может быть принят в качестве первого или второго исполнительного конвейера. Аналогичное утверждение справедливо и для первой и второй персонализированных команд.[37] Please note that the terms “first” and “second” used in relation to the first and second execution pipelines of the VLIW processor 1 do not have any meaning other than indicating their difference, i.e. either of the two execution pipelines may be adopted as the first or second execution pipeline. A similar statement is true for the first and second personalized commands.

[38] Входящий в состав компьютера 100 машиночитаемый носитель 2 представляет собой машиночитаемый носитель информации, который может быть прочитан VLIW-процессором 1. В качестве машиночитаемого носителя 2 может выступать оперативная память, жесткий диск или любое другое запоминающее устройство компьютера, соединенное с VLIW-процессором 1, например шинами 22 и 23. Машиночитаемый носитель 2 содержит в себе запись компилятора, который представляет собой вспомогательную программу, предназначенную для перевода исходного программного кода, написанного на языке программирования, в машинный код, воспринимаемый VLIW-процессором 1. В дальнейшем изложении компилятор, записанный на машиночитаемом носителе 2, также именуется как «компилятор 2».[38] The computer readable medium 2 included in the computer 100 is a computer readable medium that can be read by the VLIW processor 1. The computer readable medium 2 can be a random access memory, a hard disk, or any other computer storage device connected to the VLIW processor. 1, for example by buses 22 and 23. The computer-readable medium 2 contains a compiler record, which is an auxiliary program designed to translate source program code written in a programming language into machine code readable by the VLIW processor 1. In the following, the compiler, recorded on the computer readable medium 2, is also referred to as “compiler 2”.

[39] Следует отметить, что помимо осуществления простой трансформации программного кода, компилятор 2 способен преобразовывать исходный алгоритм программы так, чтобы преобразованный алгоритм был в максимальной степени оптимизирован под описанную выше конфигурацию VLIW-процессора 1, которая согласно терминологии, используемой в профессиональной литературе, представляет собой микроархитектуру VLIW-процессора 1. Например, компилятор 2 способен распознавать в исходном алгоритме программы такие команды, которые по исходным операндам не зависят друг от друга, и включать эти команды в одну широкую команду в качестве персонализированных команд. Более того, компилятор 2 способен прослеживать в исходном алгоритме программы альтернативные ветви команд, и включать эти альтернативные ветви в последовательность широких команд в качестве последовательностей персонализированных команд.[39] It should be noted that in addition to performing a simple transformation of the program code, the compiler 2 is capable of transforming the original program algorithm so that the converted algorithm is optimized to the maximum extent for the above-described configuration of the VLIW processor 1, which, according to the terminology used in the professional literature, represents is the microarchitecture of VLIW processor 1. For example, compiler 2 is able to recognize in the source algorithm of the program such commands that are independent of each other by their source operands, and include these commands in one broad command as personalized commands. Moreover, the compiler 2 is able to trace alternative instruction branches in the original program algorithm, and include these alternative branches in the broad instruction sequence as personalized instruction sequences.

[40] Далее, компилятор 2 способен дополнять преобразованный алгоритм программы новыми вспомогательными командами, такими как команда по расчету вероятности передачи управления, и использовать в качестве исходных операндов таких команд статистические данные, собранные регистратором 3 передачи управления. Другими словами, компилятор 2 способен определять вероятность выбора каждой из альтернативных ветвей, и в том случае, когда число альтернативных ветвей превышает число персонализированных команд в широкой команде, компилятор 2 способен включать в широкую команду персонализированные команды только тех альтернативных ветвей, передача управления на которые определена им как наиболее вероятная.[40] Further, the compiler 2 is capable of supplementing the converted program algorithm with new auxiliary instructions, such as an instruction for calculating the probability of control transfer, and using statistical data collected by the control transfer recorder 3 as the initial operands of such instructions. In other words, compiler 2 is able to determine the probability of choosing each of the alternative branches, and in the case where the number of alternative branches exceeds the number of personalized instructions in the wide instruction, compiler 2 is able to include in the wide instruction the personalized instructions of only those alternative branches to which control transfer is determined to them as the most probable.

[41] Регистратор 3 передачи управления представляет собой известный специалисту в данной области техники регистратор событий, который при наступлении заданного события способен регистрировать это событие. Более точно, регистратор 3 передачи управления способен сохранять в памяти факт наступления этого события, а также регистрировать несколько событий, предшествующих этому событию. Естественным образом, регистрируемыми событиями для регистратора 3 передачи управления являются результаты выполнения команд передачи управления, а именно вызываемые команды, на которые передается управление. Поскольку вызываемые команды являются первыми командами своих альтернативных ветвей, то передача управления на вызываемую команду эквивалентна выбору содержащей ее альтернативной ветви.[41] The control transfer recorder 3 is an event recorder known to one skilled in the art, which, when a predetermined event occurs, is capable of recording that event. More precisely, the control transfer recorder 3 is capable of storing in memory the occurrence of this event, as well as recording several events preceding this event. Naturally, the events recorded for the control transfer recorder 3 are the results of the execution of control transfer commands, namely the called commands to which control is transferred. Since called commands are the first commands of their alternative branches, transferring control to the called command is equivalent to selecting the alternative branch containing it.

[42] Во VLIW-процессоре 1 регистратор 3 передачи управления получает информацию о фактах выполнения команд передачи управления от первого и второго функциональных блоков 40 и 50 по шинам 43 и 53, и получает информацию о командах, на которые передается управление, от регистрового файла 30 по шине 33. Собранные статистические данные или информацию об их наличии регистратор 3 передачи управления передает компилятору 2 по шине 35.[42] In the VLIW processor 1, the control transfer recorder 3 receives information about the execution of control transfer commands from the first and second functional blocks 40 and 50 via buses 43 and 53, and obtains information about the commands to which control is transferred from the register file 30 via bus 33. The collected statistical data or information about their availability is transmitted by the control transfer recorder 3 to the compiler 2 via bus 35.

[43] Далее со ссылками на Фиг. 2 более подробно раскрыта техническая проблема, решаемая изобретением, а со ссылками на Фиг. 3 и 4 подробно описаны функционирование компьютера 100 и реализуемый им предложенный способ. На Фиг. 2-4 представлена схема исходного алгоритма одной и той же программы 200, при этом на Фиг. 2 проиллюстрирован результат компиляции широких команд, полученный согласно известному способу, реализуемому известным компьютером (Сравнительный пример), а на Фиг. 3 и 4 показан результат компиляции широких команд, полученный согласно предложенному способу (Примеры 1 и 2).[43] Next with reference to FIG. 2 describes in more detail the technical problem solved by the invention, and with reference to FIG. 3 and 4 describe in detail the operation of the computer 100 and the proposed method it implements. In FIG. 2-4 shows a diagram of the original algorithm of the same program 200, while FIG. 2 illustrates the result of compiling broad instructions obtained according to a known method implemented by a known computer (Comparative Example), and FIG. 3 and 4 show the result of compiling wide commands obtained according to the proposed method (Examples 1 and 2).

[44] Программа 200 представляет собой ветвящийся поток команд, который содержит первую команду X передачи управления, формирующую множество из Q исходных ветвей А, В и С, команду Y слияния всех исходных ветвей и вторую команду Z передачи управления, формирующую множество из R последующих ветвей L, M и N. Обратим внимание, что каждая из исходных ветвей А, В, С и каждая из последующих ветвей L, M и N может содержать множество последовательно выполняемых команд, при этом между командой Y слияния и второй командой Z передачи управления также может располагаться множество последовательно выполняемых команд. В зависимости от используемого языка программирования каждая из первой и второй команд X и Z передачи управления может представлять собой одну команду, формирующую три ветви, или представлять собой последовательность из двух команд, каждая из которых формирует две ветви.[44] Program 200 is a branching instruction stream that contains a first control transfer instruction X generating a set of Q initial branches A, B and C, a merge instruction Y of all initial branches, and a second control transfer instruction Z generating a set of R subsequent branches L, M and N. Note that each of the initial branches A, B, C and each of the subsequent branches L, M and N can contain many sequentially executed commands, while between the merge command Y and the second command Z, control transfers can also contain many sequentially executed commands. Depending on the programming language used, each of the first and second control transfer instructions X and Z may be a single instruction forming three branches, or may be a sequence of two instructions each forming two branches.

[45] Что касается команды Y слияния, то в отличие от первой и второй команд X и Z передачи управления, согласно которым в зависимости от входящего условия производится выбор одной из трех операций для одного и того же набора исходных операндов, согласно команде Y слияния в зависимости от входящего условия производится выбор одного из трех наборов исходных операндов для одной и той же операции. В зависимости от используемого языка программирования команда Y слияния может быть объединена со второй командой Z передачи управления или может представлять собой отдельную команду.[45] As for the Y merge instruction, in contrast to the first and second X and Z control transfer instructions, according to which, depending on the input condition, one of three operations is selected for the same set of source operands, according to the Y merge instruction in Depending on the input condition, one of three sets of initial operands is selected for the same operation. Depending on the programming language used, the merge command Y may be combined with a second control transfer command Z, or it may be a separate command.

[46] Программа 200 предназначена для выполнения VLIW-процессором 1, который по существу является одинаковым для компьютера 100 и известного компьютера. Обратим внимание, что в программе 200 Q=3, R=3, а во VLIW-процессоре 1 S=2, т.е. на каждом из двух ветвлений программы 200 VLIW-процессор 1 может обработать только две альтернативные ветви.[46] The program 200 is designed to be executed by the VLIW processor 1, which is essentially the same for the computer 100 and the known computer. Please note that in program 200 Q=3, R=3, and in VLIW processor 1 S=2, i.e. on each of the two branches of the program 200, the VLIW processor 1 can process only two alternative branches.

[47] Регистратор 3 передачи управления, как и известный регистратор передачи управления, входящий в состав известного компьютера, при предварительных циклах выполнения программы, т.е. при так называемом сэмплировании, или при начальных циклах выполнения программы, производимых в реальных условиях, собирает статистику результатов первой команды X передачи управления. Другими словами, на заданном промежутке времени регистратор 3 передачи управления определяет число выполненных передач управления на каждую из исходных ветвей А, В и С. На основе этих данных как известный компилятор, входящий в состав известного компьютера, так и компилятор 2 рассчитывают собственные вероятности Р(А), Р(В) и Р(С) передачи управления на исходные ветви А, В и С как отношение числа выполненных передач управления на соответствующую исходную ветвь к суммарному числу выполненных передач управления на все исходные ветви.[47] Control transfer recorder 3, like the known control transfer recorder included in a known computer, during preliminary program execution cycles, i.e. during so-called sampling, or during the initial cycles of program execution carried out in real conditions, it collects statistics on the results of the first command X of control transfer. In other words, over a given period of time, control transfer recorder 3 determines the number of completed control transfers to each of the source branches A, B and C. Based on these data, both the known compiler included in the known computer and compiler 2 calculate their own probabilities P( A), P(B) and P(C) control transfers to the source branches A, B and C as the ratio of the number of completed control transfers to the corresponding source branch to the total number of completed control transfers to all source branches.

[48] Учитывая, что число персонализированных команд в широкой команде S=2, и число исходных ветвей Q=3, известный компилятор и компилятор 2 выбирают для включения в последовательность широких команд те исходные ветви, которые имеют наибольшие собственные вероятности, т.е. исходные ветви А и В. На Фиг. 2-4 и в представленных ниже Таблицах 1 и 2 выбранные исходные ветви, подлежащие включению в последовательность широких команд, отмечены затемнением. До этого момента компьютер 100 и предложенный способ не имеют отличий от известных решений.[48] Considering that the number of personalized instructions in a wide instruction is S=2, and the number of source branches Q=3, the known compiler and compiler 2 select for inclusion in the sequence of wide instructions those source branches that have the highest intrinsic probabilities, i.e. source branches A and B. In FIG. 2-4 and in Tables 1 and 2 below, selected source branches to be included in the broad command sequence are indicated by shading. Up to this point, the computer 100 and the proposed method are no different from known solutions.

[49] Тем временем выбор двух последующих ветвей, подлежащих включению в последовательность широких команд, из общего числа R=3 последующих ветвей известный компьютер осуществляет на основе собственных вероятностей Р(L), Р(M), Р(N) всех последующих ветвей L, M, N, которые он получает тем же самым образом, как и собственные вероятности Р(А), Р(В), Р(С) исходных ветвей А, В и С. Более точно известный компилятор выбирает из трех последующих ветвей L, M, N две ветви с наибольшими собственными вероятностями, которыми в данном случае являются последующие ветви L и M, как это показано на Фиг. 2.[49] Meanwhile, the choice of two subsequent branches to be included in the sequence of broad commands from the total number R=3 of subsequent branches is carried out by a known computer on the basis of its own probabilities P(L), P(M), P(N) of all subsequent branches L , M, N, which it receives in the same way as the own probabilities P(A), P(B), P(C) of the original branches A, B and C. More precisely, the known compiler selects from the three subsequent branches L, M, N are the two branches with the highest intrinsic probabilities, which in this case are the subsequent branches L and M, as shown in Fig. 2.

[50] Обратим внимание, что например, полученная известным компьютером собственная вероятность Р(L) учитывает все собственные вероятности исходных ветвей, а также все корреляции между передачей управления на каждую из исходных ветвей А, В, С и передачей управления на последующую ветвь L. Другими словами, для последующей ветви L справедливо соотношение:[50] Let us note that, for example, the intrinsic probability P(L) obtained by a known computer takes into account all the intrinsic probabilities of the initial branches, as well as all correlations between the transfer of control to each of the initial branches A, B, C and the transfer of control to the subsequent branch L. In other words, for the subsequent branch L the relation is valid:

где например, P(A∩L) - совместная вероятность передачи управления на исходную ветвь A и передачи управления на последующую ветвь L, а P(B∩L) и P(C∩L) имеют аналогичную трактовку.where, for example, P(A∩L) is the joint probability of transferring control to the initial branch A and transferring control to the subsequent branch L, and P(B∩L) and P(C∩L) have a similar interpretation.

Однако известный компьютер не способен учесть влияние каждой из совместных вероятностей P(A∩L), P(B∩L), P(C∩L) на собственную вероятность P(L), что часто приводит к неэффективному выбору последующих ветвей для включения в последовательность широких команд.However, the known computer is not able to take into account the influence of each of the joint probabilities P(A∩L), P(B∩L), P(C∩L) on its own probability P(L), which often leads to inefficient selection of subsequent branches for inclusion in sequence of broad commands.

[51] Согласно предложенному способу компьютер 100 не определяет собственные вероятности Р(L), Р(M), Р(N) для последующих ветвей L, M, N, а вместо этого определяет каждую из совместных вероятностей исходных ветвей А, В, С и последующих ветвей L, M, N отдельно. Например, для последующей ветви L компьютер 100 определяет P(A∩L), P(B∩L), P(C∩L) в отдельности, исходя при этом из того, что[51] According to the proposed method, computer 100 does not determine its own probabilities P(L), P(M), P(N) for subsequent branches L, M, N, but instead determines each of the joint probabilities of the original branches A, B, C and subsequent branches L, M, N separately. For example, for the subsequent branch L, computer 100 determines P(A∩L), P(B∩L), P(C∩L) separately, based on the fact that

где например, P(L|A) - условная вероятность передачи управления на последующую ветвь L при выполненной передаче управления на исходную ветвь A, а P(L|B) и P(L|C) имеют аналогичную трактовку.where, for example, P(L|A) is the conditional probability of control transfer to the subsequent branch L when control is transferred to the initial branch A, and P(L|B) and P(L|C) have a similar interpretation.

Следует отметить, что для предложенного способа принимается модель, в которой передача управления на исходную ветвь А и передача управления на последующую ветвь L при ранее выполненной передаче управления на исходную ветвь А являются независимыми событиями, т.е. при некотором изменении собственной вероятности P(A) условная вероятность P(L|A) остается неизменной.It should be noted that for the proposed method, a model is adopted in which the transfer of control to the source branch A and the transfer of control to the subsequent branch L with a previously completed transfer of control to the source branch A are independent events, i.e. with some change in the intrinsic probability P(A), the conditional probability P(L|A) remains unchanged.

[52] Для реализации предложенного способа регистратор 3 передачи управления способен собирать статистику результатов второй команды Z передачи управления для каждого результата первой команды X передачи управления. Более точно, регистратор 3 передачи управления отдельно определяет число передач управления на последующую ветвь L, выполненных при условии ранее выполненной передаче управления на исходную ветвь А, отдельно определяет число передач управления на последующую ветвь L, выполненных при условии ранее выполненной передаче управления на исходную ветвь B, и отдельно определяет число передач управления на последующую ветвь L, выполненных при условии ранее выполненной передаче управления на исходную ветвь C. На основе данной статистики передачи управления компилятор 2 рассчитывают каждую из условных вероятностей P(L|A), P(L|В), P(L|С) передачи управления на последующую ветвь L как отношение числа передач управления на последующую ветвь L, выполненных при условии ранее выполненной передаче управления на соответствующую исходную ветвь, к суммарному числу выполненных передач управления на последующую ветвь L.[52] To implement the proposed method, the control transfer recorder 3 is capable of collecting statistics of the results of the second control transfer command Z for each result of the first control transfer command X. More precisely, the control transfer recorder 3 separately determines the number of control transfers to the subsequent branch L, performed under the condition of a previously executed control transfer to the source branch A, separately determines the number of control transfers to the subsequent branch L, performed under the condition of a previously executed control transfer to the source branch B , and separately determines the number of control transfers to the subsequent branch L, performed under the condition of a previously completed control transfer to the initial branch C. Based on these control transfer statistics, compiler 2 calculates each of the conditional probabilities P(L|A), P(L|B) , P(L|C) of control transfers to the subsequent branch L as the ratio of the number of control transfers to the subsequent branch L, performed under the condition of a previously completed transfer of control to the corresponding initial branch, to the total number of completed control transfers to the subsequent branch L.

[53] Аналогичную статистику регистратор 3 собирает для последующих ветвей M и N, и аналогичным образом компилятор 2 рассчитывает условные вероятности P(М|A), P(М|В), P(М|С) и P(N|A), P(N|В), P(N|С). Располагая определенными, как было описано выше, собственными вероятностями Р(А), Р(В), Р(С) исходных ветвей А, В и С, компилятор 2 в соответствии с приведенными выше соотношениями (2)-(4) рассчитывает совместные вероятности P(A∩L), P(B∩L), P(C∩L) для последующей ветви L, совместные вероятности P(A∩M), P(B∩M), P(C∩M) для последующей ветви M и совместные вероятности P(A∩N), P(B∩N), P(C∩N) для последующей ветви N.[53] Recorder 3 collects similar statistics for subsequent branches M and N, and similarly compiler 2 calculates the conditional probabilities P(M|A), P(M|B), P(M|C) and P(N|A) , P(N|B), P(N|C). Having certain, as described above, own probabilities P(A), P(B), P(C) of the initial branches A, B and C, compiler 2, in accordance with the above relations (2)-(4), calculates the joint probabilities P(A∩L), P(B∩L), P(C∩L) for the subsequent branch L, joint probabilities P(A∩M), P(B∩M), P(C∩M) for the subsequent branch M and joint probabilities P(A∩N), P(B∩N), P(C∩N) for the subsequent branch N.

[54] Далее компилятор 2 включает в первую последовательность персонализированных команд широкой команды, предназначенной к выполнению VLIW-процессором 1, первую исходную ветвь из множества исходных ветвей А, В, С, которая имеет первую (i=1) по величине собственную вероятность, и ту одну последующую ветвь из множества последующих ветвей L, M, N, которая имеет первую по величине совместную вероятность с этой первой исходной ветвью. Кроме того, компилятор 2 включает во вторую последовательность персонализированных команд широкой команды, предназначенной к выполнению VLIW-процессором 1, вторую исходную ветвь из множества исходных ветвей А, В, С, которая имеет вторую (i=2) по величине собственную вероятность, и ту одну последующую ветвь из оставшегося множества последующих ветвей L, M, N, которая имеет вторую по величине совместную вероятность с любой из этих первой и второй исходных ветвей.[54] Next, the compiler 2 includes in the first sequence of personalized instructions a wide instruction intended for execution by the VLIW processor 1, the first source branch from the set of source branches A, B, C, which has the first (i=1) largest intrinsic probability, and that one subsequent branch from the set of subsequent branches L, M, N, which has the first highest joint probability with this first initial branch. In addition, the compiler 2 includes in the second sequence of personalized commands a wide command intended for execution by the VLIW processor 1, a second source branch from the set of source branches A, B, C, which has the second (i=2) largest intrinsic probability, and that one subsequent branch from the remaining set of subsequent branches L, M, N, which has the second highest joint probability with any of these first and second original branches.

[55] Как было показано выше со ссылками на Фиг. 2, при использовании известного способа (Сравнительный пример) выбор исходных и последующих ветвей для включения в первую и во вторую последовательности персонализированных команд широкой команды был произведен на основе собственных вероятностей исходных и последующих ветвей. Как следует из Фиг. 2, при использовании известного способа в первую последовательность персонализированных команд включены исходная ветвь A и последующая ветвь L, а во вторую последовательность персонализированных команд включены исходная ветвь B и последующая ветвь M.[55] As discussed above with reference to FIG. 2, when using the known method (Comparative Example), the selection of the initial and subsequent branches for inclusion in the first and second sequences of personalized broad command commands was made based on the own probabilities of the initial and subsequent branches. As can be seen from FIG. 2, when using the known method, the first sequence of personalized commands includes the initial branch A and the subsequent branch L, and the second sequence of personalized commands includes the initial branch B and the subsequent branch M.

[56] Таблицы 1 и 2 содержат два различных набора результатов сэмплирования, используемых соответственно для Примеров 1 и 2, в которых выбор исходных и последующих ветвей сделан согласно предложенному способу для той же самой программы 200, что являлась предметом анализа в Сравнительном примере. В Таблицах 1 и 2 столбцы 1, 3, 5 показывают события (events), а столбцы 2, 4, 6 - вероятности этих событий. Строки 1-5 показывают вероятности, полученные на основе непосредственного подсчета числа соответствующих событий. Строки 6-8 показывают вероятности, полученные на основе математической операции, произведенной согласно соотношениям (2)-(4) с соответствующими вероятностями из строк 1-5. Столбец 7 и строка 9 являются проверочными.[56] Tables 1 and 2 contain two different sets of sampling results used respectively for Examples 1 and 2, in which the selection of initial and subsequent branches is made according to the proposed method for the same program 200 that was the subject of analysis in the Comparative Example. In Tables 1 and 2, columns 1, 3, 5 show events, and columns 2, 4, 6 show the probabilities of these events. Lines 1-5 show the probabilities obtained by directly counting the number of corresponding events. Lines 6-8 show the probabilities obtained based on the mathematical operation performed according to relations (2)-(4) with the corresponding probabilities from lines 1-5. Column 7 and line 9 are testing.

[57] Исходными данными программы 200 в Примерах 1 и 2 по-прежнему являются: S=2, Q=3, R=3, Р(А)=0,6, Р(В)=0,25, Р(С)=0,15, Р(L)=0,5, Р(M)=0,3, Р(N)=0,2. Тем не менее, Примеры 1 и 2 различаются значениями определенных регистратором 3 передачи управления и компилятором 2 условных вероятностей P(L|A), P(L|В), P(L|С), P(М|A), P(М|В), P(М|С), P(N|A), P(N|В), P(N|С), а значит и значениями рассчитанных на их основе совместными вероятностями P(A∩L), P(B∩L), P(C∩L), P(A∩M), P(B∩M), P(C∩M), P(A∩N), P(B∩N), P(C∩N).[57] The input data of program 200 in Examples 1 and 2 are still: S=2, Q=3, R=3, P(A)=0.6, P(B)=0.25, P(C )=0.15, P(L)=0.5, P(M)=0.3, P(N)=0.2. However, Examples 1 and 2 differ in the values of the conditional probabilities P(L|A), P(L|B), P(L|C), P(M|A), P( М|В), P(М|С), P(N|A), P(N|В), P(N|С), and therefore the values of the joint probabilities P(A∩L) calculated on their basis, P(B∩L), P(C∩L), P(A∩M), P(B∩M), P(C∩M), P(A∩N), P(B∩N), P (C∩N).

[58] Пример 1[58] Example 1

Как следует из Таблицы 1, первую по величине (i=1) совместную вероятность из оставшегося множества последующих ветвей, т.е. из множества L, M и N, имеет последующая ветвь L: P(A∩L)=0,327. Вторую по величине (i=2) совместную вероятность из оставшегося множества последующих ветвей, т.е. из множества M и N, имеет последующая ветвь N: P(A∩N)=0,192. Таким образом, включению в первую последовательность персонализированных команд по-прежнему подлежит последующая ветвь L, для которой i=1, а включению во вторую последовательность персонализированных команд подлежит последующая ветвь N, для которой i=2, вместо последующей ветви М, как это было в Сравнительном примере. Обратим внимание, что наибольшая совместная вероятность последующей ветви М с включенными в широкую команду исходными ветвями А и В составляет P(A∩М)=0,0838 против P(A∩N)=0,192, что свидетельствует в пользу более вероятной передачи управления от исходных ветвей А и В на последующую ветвь N. В Таблице 1 и на Фиг. 3, соответствующей Примеру 1, выбранные последующие ветви L и N отмечены затемнением.As follows from Table 1, the first largest (i=1) joint probability from the remaining set of subsequent branches, i.e. from the set L, M and N, has a subsequent branch L: P(A∩L)=0.327. The second largest (i=2) joint probability from the remaining set of subsequent branches, i.e. from the set M and N, has a subsequent branch N: P(A∩N)=0.192. Thus, the subsequent branch L, for which i=1, is still subject to inclusion in the first sequence of personalized commands, and the subsequent branch N, for which i=2, is subject to inclusion in the second sequence of personalized commands, instead of the subsequent branch M, as was the case in Comparative example. Let us note that the highest joint probability of the subsequent branch M with the initial branches A and B included in the wide command is P(A∩M)=0.0838 versus P(A∩N)=0.192, which indicates a more likely transfer of control from initial branches A and B to the subsequent branch N. In Table 1 and Fig. 3, corresponding to Example 1, the selected subsequent branches L and N are indicated by shading.

[59] Таблица 1[59] Table 1

[60] Пример 2[60] Example 2

Как следует из Таблицы 2, первую по величине (i=1) совместную вероятность из оставшегося множества последующих ветвей, т.е. из множества L, M и N, имеет последующая ветвь L: P(A∩L)=0,438. Вторую по величине (i=2) совместную вероятность из оставшегося множества последующих ветвей, т.е. из множества M и N, имеет последующая ветвь N: P(В∩N)=0,1375. Таким образом, включению в первую последовательность персонализированных команд по-прежнему подлежит последующая ветвь L, для которой i=1, а включению во вторую последовательность персонализированных команд подлежит последующая ветвь N, для которой i=2, вместо последующей ветви М, как это было в Сравнительном примере. Обратим внимание, что и здесь наибольшая совместная вероятность последующей ветви М с включенными в широкую команду исходными ветвями А и В составляет P(A∩М)=0,102 против P(В∩N)=0,1375, что свидетельствует в пользу более вероятной передачи управления от исходных ветвей А и В на последующую ветвь N. В Таблице 2 и на Фиг. 4, соответствующей Примеру 2, выбранные последующие ветви L и N отмечены затемнением.As follows from Table 2, the first largest (i=1) joint probability from the remaining set of subsequent branches, i.e. from the set L, M and N, has a subsequent branch L: P(A∩L)=0.438. The second largest (i=2) joint probability from the remaining set of subsequent branches, i.e. from the set M and N, has a subsequent branch N: P(В∩N)=0.1375. Thus, the subsequent branch L, for which i=1, is still subject to inclusion in the first sequence of personalized commands, and the subsequent branch N, for which i=2, is subject to inclusion in the second sequence of personalized commands, instead of the subsequent branch M, as was the case in Comparative example. Let us note that here, too, the highest joint probability of the subsequent branch M with the initial branches A and B included in the broad team is P(A∩M)=0.102 versus P(B∩N)=0.1375, which indicates a more probable transmission control from the initial branches A and B to the subsequent branch N. In Table 2 and Fig. 4, corresponding to Example 2, the selected subsequent branches L and N are indicated by shading.

[61] Таблица 2[61] Table 2

[62] Сопоставление Примеров 1 и 2 со Сравнительным примером доказывает, что при невключении маловероятной исходной ветви, в данном случае - исходной ветви С, в последовательность широких команд, выбор последующих ветвей для включения в последовательность широких команд, сделанный на основе собственных вероятностей последующих ветвей, уже не позволяет максимизировать вероятность того, что среди этих последующих ветвей окажется целевая ветвь. Компьютер 100 и реализуемый им предложенный способ минимизируют число ошибок в выборе целевой последующей ветви, и тем самым обеспечивают компьютеру повышенные быстродействие и производительность при выполнении разветвленной программы, благодаря чему снижается потребление электроэнергии и количество выделяемого тепла.[62] Comparison of Examples 1 and 2 with the Comparative Example proves that if an unlikely initial branch, in this case initial branch C, is not included in the wide instruction sequence, the selection of subsequent branches for inclusion in the wide instruction sequence is made on the basis of the subsequent branches' own probabilities , no longer allows us to maximize the probability that among these subsequent branches there will be a target branch. The computer 100 and its proposed method minimize the number of errors in selecting the target subsequent branch, and thereby provide the computer with increased speed and performance when executing a branched program, thereby reducing power consumption and the amount of heat generated.

[63] Следует отметить, что предложенный способ обеспечивает достижение указанных технических результатов для компьютера, содержащего VLIW-процессор с любым количеством S функциональных блоков, и выполняющего программу, в которой первая команда передачи управления формирует любое количество Q исходных ветвей, и в которой вторая команда передачи управления которой формирует любое количество R последующих ветвей, при условии, что Q>S и R>S. Первая команда передачи управления при этом может представлять собой, например, первую по счету команду передачи управления в программе, либо команду передачи управления, в отношении которой компилятор 2 установил отсутствие корреляционных связей с предыдущей командой передачи управления, либо команду передачи управления, которая выбрана компилятором 2 произвольно.[63] It should be noted that the proposed method ensures the achievement of the specified technical results for a computer containing a VLIW processor with any number of S functional blocks, and executing a program in which the first control transfer command generates any number of Q source branches, and in which the second command transfer of control which forms any number R of subsequent branches, provided that Q>S and R>S. The first control transfer command may be, for example, the first control transfer command in the program, or a control transfer command for which the compiler 2 has determined that there is no correlation with the previous control transfer command, or a control transfer command that is selected by the compiler 2 arbitrarily.

[64] Обратим также внимание, что приведенные выше примеры реализации изобретения подразумевают включение в каждую широкую команду по одной персонализированной команде из каждой выбранной исходной ветви, а затем по одной персонализированной команде из каждой последующей ветви. Однако это не является обязательным, компилятор может формировать широкие команды произвольным образом. Принципиально важным остается выбор для включения в последовательность широких команд только тех исходных ветвей, которые имеют наибольшую собственную вероятность, и только тех последующих ветвей, которые имеют наибольшую совместную вероятность с любыми выбранными исходными ветвями, при этом как исходные, так и последующие ветви выбираются в количестве, равном количеству исполнительных конвейеров.[64] Note also that the above examples of implementation of the invention involve the inclusion in each broad command of one personalized command from each selected initial branch, and then one personalized command from each subsequent branch. However, this is not mandatory; the compiler can generate broad commands in any way it wishes. What remains fundamentally important is the choice to include in the sequence of broad commands only those initial branches that have the highest intrinsic probability, and only those subsequent branches that have the highest joint probability with any selected initial branches, while both initial and subsequent branches are selected in number , equal to the number of execution pipelines.

Claims

1. A method for pipelined processing of commands, intended for a computer whose processor contains a preparation pipeline, as well as S execution pipelines, and

the preparatory pipeline is capable of performing preparatory processing of a sequence of wide commands, each of which includes S personalized commands, and the i-th executive pipeline is capable of executing a sequence of i-th personalized commands, where i takes values from 1 to S, while

according to the method, for a branching command stream containing a first control transfer command, the result of which is the selection of one source branch from Q source branches, a command to merge all source branches and a second control transfer command, the result of which is the selection of one subsequent branch from R subsequent branches, where Q>S and R>S, collect statistics of the results of the first control transfer command and for each result of the first control transfer command collect statistics of the results of the second control transfer command, after which, based on the collected statistics, determine the own probability for each initial branch and the joint probability for each pair from one initial and one subsequent branch, and

The sequence of broad commands includes those S initial branches that have the largest individual probabilities, and those S subsequent branches that have the largest joint probabilities with any initial branches from the S initial branches included in the broad command.

2. The method according to claim 1, in which the sequence of i-th personalized commands includes that one initial branch from the set of initial branches that has the i-th largest intrinsic probability, and that one subsequent branch from the remaining set of subsequent branches that has the i-th largest joint probability with any initial branch from the 1st to the i-th.

3. A computer comprising a processor capable of processing a branching instruction stream, a control transfer recorder, and a computer-readable medium that can be read by the processor and on which a compiler is written, wherein

the processor contains a preparatory pipeline, as well as S execution pipelines, and the preparatory pipeline is capable of performing preparatory processing of a sequence of wide commands, each of which includes S personalized commands, and the i-th executive pipeline is capable of executing a sequence of i-th personalized commands, where i can take values from 1 to S, while

the branching command stream contains a first control transfer command, the result of which is the selection of one source branch from Q source branches, a merge command of all source branches, and a second control transfer command, the result of which is the selection of one subsequent branch from R subsequent branches, where Q>S and R >S, wherein the control transfer recorder is capable of collecting statistics of the results of the first control transfer command and for each result of the first control transfer command is capable of collecting statistics of the results of the second control transfer command, and the compiler is capable of, based on the collected statistics, determining its own probability for each initial branch and the joint probability for each pair from one initial and one subsequent branch, and

In the sequence of wide instructions, the compiler includes those S initial branches that have the largest own probabilities, and those S subsequent branches that have the largest joint probabilities with any initial branches from the S initial branches included in the wide instruction.

4. The computer according to claim 3, in which in the sequence of i-th personalized commands the compiler includes that one initial branch from the set of initial branches that has the i-th largest intrinsic probability, and that one subsequent branch from the remaining set of subsequent branches that has the i-th largest joint probability with any initial branch from the 1st to the i-th.