RU2643622C1

RU2643622C1 - Computer module

Info

Publication number: RU2643622C1
Application number: RU2017118227A
Authority: RU
Inventors: Александр Иванович Глухов; Лев Рафаилович Карев; Валерий Николаевич Конотопцев; Игорь Сергеевич Сергеев; Роман Георгиевич Тусеев; Иван Владимирович Шевченко
Original assignee: Федеральное государственное унитарное предприятие "Научно-исследовательский институт "Квант"
Priority date: 2017-05-25
Filing date: 2017-05-25
Publication date: 2018-02-02

Abstract

FIELD: physics.

SUBSTANCE: device contains an interface unit, a task separation unit, a task header memory unit, a task data memory unit, a task arbiter, a computational field from a group of N computing cores 6₁, …, 6_N, a group of N memory units of the computing core task numbers 7₁, …, 7_N, a multiplexer result unit, a result arbiter, a result header memory unit, a result data memory unit, an external interface, wherein each computational core 6₁, …, 6_N consists of input buffer memory 6-1, an operation unit 6-2, output buffer memory 6-3, and a control unit 6-4.

EFFECT: increasing the performance of the computational module.

1 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

Изобретение относится к области вычислительной техники, в частности к высокопроизводительным вычислительным устройствам для решения трудоемких задач с использованием распараллеливания по данным на множество независимых подзадач.The invention relates to the field of computing, in particular to high-performance computing devices for solving labor-intensive tasks using parallelization according to many independent subtasks.

ПРЕДШЕСТВУЮЩИЙ УРОВЕНЬ ТЕХНИКИBACKGROUND OF THE INVENTION

Известна адаптивная система обработки данных (RU №105487 U1, МПК G06F 15/16, заявл. 25.02.2011, опубл. 10.06.2011, БИ №16), содержащая N линеек обработки, состоящих из M_n модулей обработки, M_n блоков выбора каналов и выходной магистрали передачи сигналов, блок памяти заявок, регистр задания режимов работы модулей обработки, мультиплексор/демультиплексор выходных магистралей, N дешифраторов адреса, приоритетный шифратор.A known adaptive data processing system (RU No. 105487 U1, IPC G06F 15/16, application form 25.02.2011, publ. 06/10/2011, BI No. 16), containing N processing lines consisting of M _n processing modules, M _n selection blocks channels and output signal transmission line, application memory block, register for setting operating modes of processing modules, multiplexer / demultiplexer of output lines, N address decoders, priority encoder.

Недостатком известной системы является большой объем аппаратных средств и сложная структура распараллеливания потока заявок системы.A disadvantage of the known system is the large amount of hardware and the complex structure of parallelizing the flow of applications of the system.

Известен базовый модуль многопроцессорной вычислительной системы (RU №86332 U1, МПК G06F 15/16, заявл. 21.05.2009, опубл. 27.08.2009, БИ №24), содержащий вычислительное поле, состоящее из 16 вычислительных элементов, расположенных в узлах двумерной решетки 4×4 и соединенных между собой ортогональной системой связей по близкодействию, контроллер обращений, предназначенный для передачи информации между базовым модулем и управляющей ЭВМ, и блоки распределенной памяти.A well-known basic module of a multiprocessor computing system (RU No. 86332 U1, IPC G06F 15/16, stated May 21, 2009, published August 27, 2009, BI No. 24), comprising a computing field consisting of 16 computing elements located in nodes of a two-dimensional lattice 4 × 4 and interconnected by an orthogonal system of short-range communications, a call controller designed to transfer information between the base module and the host computer, and distributed memory blocks.

Недостатками данного модуля являются сложные матрично-конвейерная структура и многоразрядные информационные связи между вычислительными элементами.The disadvantages of this module are the complex matrix-pipelined structure and multi-bit information links between computing elements.

Наиболее близким устройством того же назначения к заявленному изобретению по совокупности признаков является принятый за прототип вычислительный модуль (RU №139326 U1, МПК G06F 15/16, заявл. 10.12.2013, опубл. 10.04.2014, БИ №10), содержащий вычислительное поле из L вычислительных элементов, соответствующих количеству каналов распараллеливания, и каждый вычислительный элемент содержит S ступеней конвейера, блок памяти FIFO, блок генератора переменной X, входной и выходной буферы, группу из L схем сравнения, элемент ИЛИ, четыре внешних входа и четыре внешних выхода.The closest device of the same purpose to the claimed invention in terms of features is the computing module adopted for the prototype (RU No. 139326 U1, IPC G06F 15/16, application form. 10.12.2013, publ. 04/10/2014, BI No. 10), containing a computing field of L computing elements corresponding to the number of parallelization channels, and each computing element contains S conveyor steps, a FIFO memory block, a variable X generator block, input and output buffers, a group of L comparison circuits, an OR element, four external inputs and four external outputs a.

Недостатками данного вычислительного модуля являются возможность его использования для обработки данных всеми вычислительными элементами только по одному алгоритму, синхронная работа всех вычислительных элементов при обработке единого для всех задания и формирование единого для всех вычислительных элементов множества результатов.The disadvantages of this computational module are the possibility of using it to process data by all computational elements according to only one algorithm, the synchronous operation of all computational elements when processing a single task for all, and the formation of a single set of results for all computational elements.

Причиной, препятствующей достижению технического результата, является отсутствие средств для индивидуального обмена с вычислительными элементами при загрузке пакетов входных данных и считывании пакетов результатов.The reason that impedes the achievement of the technical result is the lack of means for individual exchange with computing elements when loading input data packets and reading result packets.

Задача, на решение которой направлено предлагаемое изобретение, заключается в создании вычислительного модуля, имеющего единую параметризуемую вычислительную среду для подключения вычислительных ядер, за счет чего упрощается сложность разработки вычислительных систем и повышается производительность за счет сокращения времени обмена и сокращения времени ожидания обслуживания.The problem to which the invention is directed, is to create a computing module having a single parameterizable computing environment for connecting computing cores, thereby simplifying the complexity of developing computing systems and increasing productivity by reducing exchange time and reducing service waiting time.

Техническим результатом предлагаемого изобретения является уменьшение трудоемкости и сроков разработки и повышение производительности.The technical result of the invention is to reduce the complexity and development time and increase productivity.

КРАТКОЕ ОПИСАНИЕ СУЩНОСТИ ИЗОБРЕТЕНИЯSUMMARY OF THE INVENTION

Указанный технический результат при осуществлении изобретения достигается тем, что вычислительный модуль содержит группу из N вычислительных ядер 6₁, …, 6_N, интерфейсный блок 1, блок разделения заданий 2, блок памяти заголовков заданий 3, блок памяти данных заданий 4, арбитр заданий 5, группу из N блоков памяти номеров заданий вычислительных ядер 7₁,…, 7_N, блок мультиплексоров результатов 8, арбитр результатов 9, блок памяти заголовков результатов 10, блок памяти данных результатов 11, внешний интерфейс 12, при этом каждое вычислительное ядро 6₁, …, 6_N состоит из входной буферной памяти 6-1, операционного блока 6-2, выходной буферной памяти 6-3 и блока управления 6-4,The specified technical result during the implementation of the invention is achieved by the fact that the computing module contains a group of N computing cores 6 ₁ , ..., 6 _N , an interface unit 1, a task separation unit 2, a task header memory block 3, a task data memory block 4, a task arbiter 5 , a group of N blocks of job numbers of computational cores 7 ₁ , ..., 7 _N , a block of multiplexers of results 8, an arbiter of results 9, a block of memory of headers of results 10, a block of memory of data of results 11, an external interface 12, with each computational core 6 ₁ , 6 includes _N input buffer memory 6-1, the operation unit 6-2, the output buffer control unit 6-3 and 6-4,

причем внешний интерфейс 12 соединен с интерфейсным блоком 1, выход сброса 13 которого соединен с соответствующими входами сброса блока разделения заданий 2, блока памяти заголовков заданий 3, блока памяти данных заданий 4, арбитра заданий 5, блоков вычислительных ядер 6₁, …, 6_N, блоков памяти номеров заданий 7₁, …, 7_N, арбитра результатов 9, блока памяти заголовков результатов 10 и блока памяти данных результатов 11, также интерфейсным блоком 1 шиной кода режима чтения результатов 14 соединен с арбитром результатов 9, шиной общих параметров 15 соединен с операционными блоками 6-2 всех вычислительных ядер 6₁, …, 6_N, шиной кода режима записи заданий 16 соединен с арбитром заданий 5, шиной потоковой записи заданий 17 соединен с блоком разделения заданий 2, шиной кода количества заданий 18 подключен к выходу блока памяти заголовков заданий 3, шиной запросов на запись заданий 19 и шиной запросов на чтение результатов 20 подключен к выходам всех вычислительных ядер 6₁, …, 6_N, шиной кода состояния 21 подключен к выходу блока памяти заголовков результатов 10 и выходу общего признака «Пустой» 21' блока мультиплексоров результатов 8, шиной потокового чтения данных результатов 24 подключен к выходу блока памяти данных результатов 11, шиной потокового чтения заголовков результатов 25 подключен к выходу блока памяти заголовков результатов 10, а выход сигнала чтения данных результатов 22 соединен с соответствующим входом блока памяти данных результатов 11 и выход сигнала чтения заголовков результатов 23 соединен с соответствующим входом блока памяти заголовков результатов 10,moreover, the external interface 12 is connected to the interface unit 1, the reset output 13 of which is connected to the corresponding reset inputs of the task separation unit 2, task header memory block 3, task data memory block 4, task arbiter 5, blocks of computing cores 6 ₁ , ..., 6 _N memory blocks job numbers 7 _1, ..., 7 _N, arbiter results 9, the memory block header result 10 and data storage unit results 11, as the interface unit 1 reading mode code bus results 14 is connected to the arbiter results 9, bus common parameters 15 Cpd ene with operating units 6-2 all cores 6 _1, ... 6 _N, bus assignments recording mode code 16 is connected to the arbiter 5 jobs, job streaming write bus 17 is connected to the job separation unit 2, the amount of code assignments bus 18 connected to the output block of memory headers for tasks 3, a bus for requests to write tasks 19 and a bus for requests for reading results 20 is connected to the outputs of all computing cores 6 ₁ , ..., 6 _N , with a bus of status code 21 is connected to the output of the block of memory for headers of results 10 and the output of the common characteristic Empty '21' mule block multiplexers of results 8, a bus for streaming reading the data of results 24 is connected to the output of the memory block of the data of the results 11, a bus for streaming reading of headers of the results 25 is connected to the output of the memory block of the headers of the results 10, and the output of the signal for reading the data of the results 22 is connected to the corresponding input of the block of results data memory 11 and the output of the signal for reading the headers of the results 23 is connected to the corresponding input of the memory block of the headers of the results 10,

блок разделения заданий 2 шиной записываемых заголовков заданий 42 соединен с соответствующим входом блока памяти заголовков заданий 3 и шиной записываемых данных заданий 43 соединен с соответствующим входом блока памяти данных заданий 4,a task separation unit 2 is connected to the corresponding input of the task header memory block 3 by a bus of recorded task headers 42 and a bus of recorded task data 43 is connected to a corresponding input of the task data memory block 4,

блок памяти заголовков заданий 3 шиной считанных заголовков заданий 26 и выходом признака «Пустой» памяти заданий соединен с соответствующими входами арбитра заданий 5, выход которого шиной управления чтением заголовков заданий 27 подключен к соответствующему входу блока памяти заголовков заданий 3,the block of headers of tasks 3 by the bus of read headers of tasks 26 and the output of the “Empty” flag of the task memory is connected to the corresponding inputs of the task arbiter 5, the output of which by the bus to read the headers of tasks 27 is connected to the corresponding input of the block of memory of the headers of tasks 3,

арбитр заданий 5 шиной управления записью заданий 31 соединен с входными буферными памятями 6-1 соответствующих вычислительных ядер 6₁, …, 6_N по шинам 31₁, 31_N записи заданий и соединен с блоками памяти номеров заданий 7₁, …, 7_N по шинам 31'₁, …, 31'_N, кроме того, арбитр заданий 5 шиной записываемых номеров заданий 32 соединен по шинам 32₁, …, 32_N с соответствующими входами блоков памяти номеров заданий 7₁, …, 7_N, соответствующий вход арбитра заданий 5 соединен с шиной запросов на запись заданий 19, включающей выходы запросов на запись заданий от вычислительных ядер 19₁, …, 19_N, а также арбитр заданий 5 шиной управления чтением данных заданий 29 соединен с соответствующим входом блоком памяти данных заданий 4, выход которого по шине считанных данных заданий 30 соединен с соответствующими входами входных буферных памятей 6-1 соответствующих вычислительных ядер 6₁, …, 6_N,the job arbiter 5 with the job recording control bus 31 is connected to the input buffer memories 6-1 of the corresponding computing cores 6 ₁ , ..., 6 _N via the job recording buses 31 ₁ , 31 _N and connected to the memory blocks of the job numbers 7 ₁ , ..., 7 _N by tires 31 ' ₁ , ..., 31' _N , in addition, the task arbiter 5 is connected to the corresponding inputs of the memory blocks of task numbers 7 ₁ , ..., 7 _N via the buses 32 ₁ , ..., 32 _N, with the corresponding input of the arbiter of tasks 5 is connected to the bus of requests for recording tasks 19, including the outputs of requests for recording tasks from the calculation Yelnia cores 19 _1, ..., 19 _N, and the arbiter 5 jobs data bus read control tasks 29 is connected to the corresponding input data storage unit 4 jobs which output the read data bus 30 is connected with assignments corresponding inputs of the input buffer memory 6-1 corresponding computing cores 6 ₁ , ..., 6 _N ,

соответствующие выходы блоков управления 6-3 вычислительных ядер 6₁, …, 6_N шинами результатов 33₁, …, 33_N подключены к соответствующим входам блока мультиплексоров результатов 8, соответствующий выход которого по шине результатов выбранного вычислительного ядра 33 соединен с соответствующим входом блока памяти данных результатов 11, кроме того, соответствующие выходы окончания считывания пакетов результатов 34₁, …, 34_N блоков управления 6-3 вычислительных ядер 6₁, …, 6_N подключены к соответствующим входам блока мультиплексоров результатов 8, соответствующий выход окончания считывания пакета результатов 34 которого подключен к соответствующим входам арбитра результатов 9, блока памяти заголовков результатов 10 и блока памяти данных результатов 11, соответствующие выходы окончания заданий 35₁, …, 35_N блоков управления 6-3 вычислительных ядер 6₁, …, 6_N подключены к соответствующим входам блоков памяти номеров заданий 7₁, …, 7_N и подключены к соответствующим входам блока мультиплексоров результатов 8, соответствующий выход окончания задания 35 которого подключен к соответствующим входам блока памяти заголовков результатов 10,the corresponding outputs of the control units 6-3 of the computing cores 6 ₁ , ..., 6 _N with the result buses 33 ₁ , ..., 33 _{N are} connected to the corresponding inputs of the unit of the multiplexers of results 8, the corresponding output of which is connected to the corresponding input of the memory block via the results bus of the selected computing core 33 results data 11, moreover, respective outputs closure results read packet 34 _1, ..., 34 _N control units 6-3 cores 6 _1, ... 6 _N are connected to respective inputs of multiplexers unit 8 results, The appropriate output results closure reading package 34 which is connected to the corresponding inputs results arbiter 9, the storage unit 10 and results of header data result storage unit 11, respective outputs closure tasks 35 _1, ..., 35 _N control units 6-3 cores June _1, ... , _N 6 are connected to respective inputs of the storage units 7 job numbers _1, ..., _N, and 7 are connected to respective inputs of which multiplexers unit 8 results corresponding output event end 35 is connected to the corresponding I ladies memory block header of the results 10

соответствующие выходы блоков памяти номеров заданий 7₁, …, 7_N шинами считываемых номеров заданий 36₁, …, 36_N подключены к соответствующим входам блока мультиплексоров результатов 8, соответствующий выход которого по шине считываемого номера задания 36 соединен с соответствующим входом блока памяти заголовков результатов 10,the corresponding outputs of the blocks of memory of the numbers of tasks 7 ₁ , ..., 7 _N with the buses of the read numbers of the tasks 36 ₁ , ..., 36 _{N are} connected to the corresponding inputs of the block of multiplexers of results 8, the corresponding output of which is connected via the bus of the read numbers of the task 36 to the corresponding input of the block of memory of the results headers 10,

соответствующие выходы арбитра результатов 9 выходами сигналов чтения результатов 37₁, …, 37_N соединены с соответствующими входами блоков выходной буферной памяти 6-3 соответствующих вычислительных ядер 6₁, …, 6_N, а также соответствующие выходы арбитра результатов 9 по шине 38 кода номера выбранного вычислительного ядра подключены к входам управления блока мультиплексоров результатов 8, а по шине количества слов в пакете результатов 39 подключены к соответствующим входам блока памяти заголовков результатов 10, выход признака переполнения 41 которого соединен с соответствующим входом арбитра результатов 9, соответствующий вход которого соединен с выходом 40 признака переполнения памяти результатов 11.the corresponding outputs of the arbiter of the results 9 outputs of the signals of reading the results of 37 ₁ , ..., 37 _{N are} connected to the corresponding inputs of the blocks of the output buffer memory 6-3 of the corresponding computing cores 6 ₁ , ..., 6 _N , as well as the corresponding outputs of the arbiter of the results of 9 via the 38 number code bus of the selected computing core are connected to the control inputs of the block of results multiplexers 8, and through the bus of the number of words in the result package 39 are connected to the corresponding inputs of the memory block of the headers of the results 10, the output of the overflow sign 41 which is connected to the corresponding input of the result arbiter 9, the corresponding input of which is connected to the output 40 of the overflow sign of the result memory 11.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF THE DRAWINGS

На фиг. 1 приведена функциональная схема вычислительного модуля.In FIG. 1 shows a functional diagram of a computing module.

На фиг. 1 приняты следующие обозначения:In FIG. 1 the following notation is accepted:

1 - интерфейсный блок,1 - interface unit

2 - блок разделения заданий,2 - block separation tasks

3 - блок памяти заголовков заданий,3 - memory block headers tasks

4 - блок памяти данных заданий,4 - block data memory tasks

5 - арбитр заданий,5 - task arbiter,

6₁, …, 6_N - группа из N вычислительных ядер (ВЯ),6 ₁ , ..., 6 _N - a group of N computing cores (VY),

6-1 - входная буферная память ВЯ,6-1 - input buffer memory VYA,

6-2 - операционный блок ВЯ,6-2 - operational unit VJ,

6-3 - выходная буферная память ВЯ,6-3 - output buffer memory VYA,

6-4 - блок управления ВЯ,6-4 - control unit VYA,

7₁, …, 7_N - группа из N блоков памяти номеров заданий ВЯ,7 _1, ..., 7 _N - group of N memory blocks VYa job numbers,

8 - блок мультиплексоров результатов,8 - block multiplexers of results,

9 - арбитр результатов,9 - the arbiter of the results,

10 - блок памяти заголовков результатов,10 - memory block headers of the results,

11 - блок памяти данных результатов,11 is a block of memory data data,

12 - внешний интерфейс,12 - external interface

13 - выход сброса,13 - reset output,

14 - шина кода режима чтения результатов,14 - bus code reading mode results,

15 - шина общих параметров, передаваемых параллельным кодом,15 - bus general parameters transmitted by the parallel code,

16 - шина кода режима записи заданий,16 - bus code mode recording tasks

17 - шина потоковой записи заданий,17 - bus streaming recording jobs

18 - шина кода количества заданий во входной памяти,18 - bus code number of jobs in the input memory,

19 - шина запросов на запись заданий,19 - bus requests for recording tasks

19₁, …, 19_N - запросы на запись заданий от ВЯ,19 ₁ , ..., 19 _N - requests for recording tasks from VY,

20 - шина запросов на чтение результатов,20 - bus requests to read the results,

20₁, …, 20_N - запросы на чтение результатов от ВЯ,20 ₁ , ..., 20 _N - requests for reading the results from VY,

21 - шина кода состояния памятей результатов,21 - bus status code memory results,

21' - выход общего признака «Пустой» всех блоков памяти номеров заданий 7₁, …, 7_N,21 'is the output of the common feature "Empty" of all memory blocks of the job numbers 7 ₁ , ..., 7 _N ,

22 - выход сигнала чтения данных результатов,22 - output signal read data data,

23 - выход сигнала чтения заголовков результатов,23 - output signal read headers of the results,

24 - шина потокового чтения данных результатов,24 - bus streaming read data data,

25 - шина потокового чтения заголовков результатов,25 - bus streaming reading of the headers of the results,

26 - шина считанных заголовков заданий,26 - bus read job headers,

27 - шина управления чтением заголовков заданий,27 - bus control reading the titles of tasks

28 - выход признака «Пустой» памяти заданий,28 - exit sign "Empty" memory tasks

29 - шина управления чтением данных заданий,29 - bus control read data tasks

30 - шина считанных данных заданий,30 - bus read data tasks,

31 - шина управления записью заданий,31 - bus control recording tasks

31₁, …, 31_N - шины управления записью заданий в ВЯ,31 _1, ..., 31 _N - account management tasks in the tire load cells,

31'₁, …, 31'_N - выходы окончания записи задания в ВЯ,31 ' ₁ , ..., 31' _N - outputs the end of the job recording in VY,

32 - шина записываемых номеров заданий,32 - bus recorded numbers of tasks,

32₁, …, 32_N - шины записываемых номеров заданий ВЯ,32 _1, ..., 32 _N - tire recordable VYa job numbers,

33₁, …, 33_N - шины результатов от ВЯ,33 ₁ , ..., 33 _N - result buses from VY,

33 - шина результатов выбранного ВЯ,33 - bus results selected VYA,

33' - выход признака достоверности результата,33 '- output sign of reliability of the result,

34₁, …, 34_N - выходы окончания считывания пакетов результатов от ВЯ,34 ₁ , ..., 34 _N - the outputs of the end of reading the result packages from VY,

34 - выход окончания считывания пакета результатов выбранного ВЯ,34 - output end of reading the result package of the selected VJ,

35₁, …, 35_N - выходы окончания заданий от ВЯ,35 ₁ , ..., 35 _N - outputs the end of tasks from VY,

35 - выход окончания задания выбранного ВЯ,35 - output end of the job selected VY,

36₁, …, 36_N - шины считываемых номеров заданий от ВЯ,36 ₁ , ..., 36 _N - bus readable job numbers from VV,

36 - шина считываемого номера задания выбранного ВЯ,36 - bus readable job number of the selected VL,

37₁, …, 37_N - выходы сигналов чтения результатов ВЯ,37 ₁ , ..., 37 _N - outputs of signals for reading the results of VJ,

38 - шина кода номера выбранного схемой арбитража ВЯ,38 - bus code number selected by the arbitration scheme VY,

39 - шина количества слов в пакете результатов,39 - bus number of words in the result package,

40 - выход признака переполнения памяти результатов,40 - output sign of memory overflow results,

41 - выход признака переполнения памяти заголовков результатов,41 - output flag overflow memory headers of the results,

42 - шина записываемых заголовков заданий,42 - bus recorded titles of tasks,

43 - шина записываемых данных заданий.43 - bus recorded data tasks.

ПОДРОБНОЕ ОПИСАНИЕ СУЩНОСТИ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

Предлагаемый вычислительный модуль (ВМ) предназначен для объединения множества из N вычислительных ядер 6₁, …, 6_N. На данных ВЯ реализуются независимые подзадачи, для решения которых предназначен предлагаемый вычислительный модуль. Загрузка заданий для ВЯ и считывание результатов их работы осуществляются управляющим процессором (УП).The proposed computing module (VM) is designed to combine the set of N computing cores 6 ₁ , ..., 6 _N. Independent sub-tasks are implemented on the data of the VL, and the proposed computational module is intended for solving them. Loading tasks for VYA and reading the results of their work are carried out by the control processor (UP).

Предлагаемый вычислительный модуль обеспечивает унифицированный параметризуемый интерфейс для встраивания вычислительных ядер, что сводит подготовку модуля только к разработке вычислительных ядер 6₁, …, 6_N.The proposed computing module provides a unified parameterizable interface for embedding computing cores, which reduces the preparation of the module only to the development of computing cores 6 ₁ , ..., 6 _N.

Предлагаемый вычислительный модуль поддерживает автоматическую загрузку заданий в ВЯ из входной буферной памяти заданий и автоматическое считывание результатов из ВЯ в выходную буферную память результатов. При автоматической загрузке заданий и автоматическом считывании результатов, управляющий процессор не управляет каждым ВЯ, а только загружает пакеты заданий во входную буферную память и считывает пакеты результатов из выходной буферной памяти вычислительного модуля.The proposed computational module supports automatic loading of tasks into the VL from the input buffer memory of tasks and automatic reading of the results from the VL into the output buffer memory of the results. With automatic loading of tasks and automatic reading of results, the control processor does not control each VL, but only loads task packages into the input buffer memory and reads the result packages from the output buffer memory of the computing module.

Интерфейсный блок 1 преобразует сигналы интерфейса 12 от управляющего процессора в сигналы внутренних интерфейсов модуля. Шины внутреннего интерфейса модуля состоят из 4 групп шин: шин кодов команд и постоянных параметров 13, 14, 15, 16, шины потоковой записи заданий 17, шин сигналов состояний модуля 18, 19, 20, 21 и шин потокового чтения результатов с сигналами управления 22, 23, 24, 25.The interface unit 1 converts the signals of the interface 12 from the control processor into the signals of the internal interfaces of the module. The buses of the internal interface of the module consist of 4 groups of buses: buses of command codes and constant parameters 13, 14, 15, 16, buses of streaming recording of tasks 17, buses of status signals of the module 18, 19, 20, 21 and buses of streaming reading of results with control signals 22 , 23, 24, 25.

Коды команд и постоянных параметров состоят из сигнала сброса (начальной инициализации) модуля 13, кода режима чтения результатов 14, кода общих (одинаковых) параметров 15, передаваемых параллельным кодом для всех ВЯ 6₁, …, 6_N, и кода режима записи заданий 16. Эти коды загружаются от УП в соответствующие регистры интерфейсного блока 1. Для изменения команды требуется повторная запись в тот же регистр или повторные записи в ту же группу регистров.The codes of commands and constant parameters consist of a reset signal (initial initialization) of module 13, a code for reading the results 14, a code for general (identical) parameters 15 transmitted by a parallel code for all VL 6 ₁ , ..., 6 _N , and a code for the mode of recording tasks 16 These codes are downloaded from the control unit into the corresponding registers of the interface unit 1. To change a command, it is necessary to re-write to the same register or re-write to the same group of registers.

Интерфейсный блок 1 преобразует поступающие от УП по внешнему интерфейсу 12 команды потоковой записи и пакеты заданий в потоки данных, передаваемые по шине 17. Шина потоковой записи заданий 17 состоит из непосредственно шины записываемых данных и сигнала записи. Каждый из пакетов заданий состоит из заголовка пакета и следующих за ними данных.The interface unit 1 converts the streaming recording commands and job packages received from the UE via the external interface 12 into data streams transmitted via the bus 17. The streaming recording bus of the jobs 17 consists of a directly recorded data bus and a recording signal. Each of the job packages consists of a package header and the data that follows.

Коды состояний модуля включают код количества пакетов заданий во входной памяти 18, код числа пакетов результатов в выходной памяти и признак «Пустой» памяти номеров заданий 21, запросы на запись заданий 19 и чтение результатов 20 от ВЯ 6₁, …, 6_N. Интерфейсный блок 1 обеспечивает считывание данных кодов состояния модуля командами одиночного чтения внешнего интерфейса от УП.The module status codes include the code for the number of job packets in the input memory 18, the code for the number of result packets in the output memory and the “Empty” flag for job numbers 21, requests to record jobs 19 and read the results 20 from VL 6 ₁ , ..., 6 _N. The interface unit 1 provides the reading of the status codes of the module with single read commands of the external interface from the unitary enterprise.

Сигнал чтения данных результатов 22, шина потокового чтения данных результатов 24, сигнал чтения заголовков результатов 23 и шина чтения заголовков результатов 25 обеспечивают чтение результатов на внешний интерфейс 12 к УП. Интерфейсный блок 1 обеспечивает преобразование команд потокового чтения внешнего интерфейса 12 в сигналы чтения 22 и 23 и прием результатов с шин чтения 24 и 25.The signal for reading the results data 22, the bus for streaming reading the data for results 24, the signal for reading the headers of the results 23 and the bus for reading the headers of the results 25 read the results to the external interface 12 to the control unit. The interface unit 1 converts the streaming read commands of the external interface 12 into read signals 22 and 23 and receives results from the read buses 24 and 25.

Блок разделения заданий 2 предназначен для разделения пакетов на заголовки и данные 2. После сигнала сброса 13 блок разделения заданий 2 принимает первое слово по шине потоковой записи 17 как заголовок пакета задания, выделяет из данного слова код количества следующих за ним данных пакета задания, запоминает этот код, записывает заголовок задания по шине записываемых заголовков заданий 42 в память заголовков заданий 3. При поступлении следующих слов по шине потоковой записи 17 блок разделения заданий 2 форматирует их в зависимости от задаваемой параметром ширины шины данных заданий, записывает переформатированные слова по шине записываемых данных заданий 43 в память данных заданий 4 и проводит счет числа записанных данных в данном пакете заданий. При достижении количеством записанных слов данных значения, заданного в заголовке пакета задания, счетчик записанных слов пакета задания сбрасывается, а следующее слово из потока записи по шине потоковой записи заданий 17 принимается как заголовок нового пакета задания. Если разрядность шины данных ВЯ 30 больше разрядности шины потоковой записи 17, то блок 2 преобразует формат данных, поступающих по шине 17, объединяя по несколько входных слов шины 17 в одно записываемое слово на шине 43.The task separation unit 2 is used to separate the packets into headers and data 2. After the reset signal 13, the task separation unit 2 receives the first word on the stream recording bus 17 as the header of the task package, extracts from this word the code for the amount of data following the task package, remembers this code, writes the task header on the bus of the recorded task headers 42 to the memory of the task headers 3. When the following words arrive on the stream recording bus 17, the task separation unit 2 formats them depending on the set with the parameter of the width of the job data bus, writes the reformatted words on the bus of the recorded job data 43 to the job data memory 4 and counts the number of recorded data in this job package. When the number of recorded words of the data reaches the value specified in the header of the job package, the counter of the recorded words of the job package is reset, and the next word from the recording stream via the stream recording bus of tasks 17 is taken as the header of the new job package. If the bit width of the data bus VYA 30 is greater than the bit width of the stream recording bus 17, then block 2 converts the format of the data coming on the bus 17, combining several input words of the bus 17 into one recordable word on the bus 43.

Блоки входной памяти заголовков заданий 3 и данных заданий 4 являются двухпортовыми ОЗУ, по одному порту которых осуществляется только запись, а по другому порту - только чтение. Запись в данные блоки памяти осуществляется только последовательно, с дисциплиной записи FIFO, а чтение осуществляется последовательно или с произвольным доступом для повторного считывания пакетов задания в режиме последовательной загрузки одного задания в ВЯ 6₁, …, 6_N.The blocks of the input memory of the headers of tasks 3 and the data of tasks 4 are dual-port RAM, on one port of which only recording is performed, and on the other port is read-only. Writing to these memory blocks is carried out only sequentially, with the discipline of writing FIFO, and reading is carried out sequentially or with random access for re-reading job packets in the sequential loading mode of one job in the VL 6 ₁ , ..., 6 _N.

В блок памяти заголовков заданий 3 записываются заголовки заданий по шине 42, которые затем по сигналам по шине управления 27 передаются по шине считанных заголовков 26 на арбитр заданий 5. Блок 3 также формирует и выдает для чтения на интерфейсный блок 1 код количества заданий по шине 18, а также признак «Пустой» памяти заданий по выходу 28 на арбитр заданий 5. Блок памяти заголовков заданий 3 по сигналу сброса 13 обнуляет внутренний счетчик адреса записи в память и счетчик числа пакетов заданий в памяти. В режиме многократного чтения одного и того же задания для работы счетчика числа пакетов во входной памяти 3 по шине 17 передается также сигнал последнего чтения заголовка пакета.In the block of memory of the headers of tasks 3 are recorded the headers of tasks on the bus 42, which are then transmitted via the signals on the control bus 27 via the bus of the read headers 26 to the task arbiter 5. Block 3 also generates and issues for reading to the interface block 1 the code of the number of tasks on the bus 18 , as well as the sign of "Empty" job memory by output 28 to the job arbiter 5. The memory block of the job headers 3 by the reset signal 13 resets the internal counter of the write address to the memory and the counter of the number of job packets in the memory. In the mode of repeated reading of the same task for the operation of the counter of the number of packets in the input memory 3, the signal of the last reading of the packet header is also transmitted via bus 17.

Блок памяти данных заданий 4 записывает данные пакетов заданий по шине 43, по адресу, передаваемому арбитром заданий 5 по шине управления 29, а считываются данные пакетов заданий на ВЯ 6₁, …, 6_N по шине считанных данных заданий 30. Блок памяти данных заданий 4 по сигналу сброса 13 обнуляет счетчик адреса записи в память. Блок памяти 4, при разрядности шины данных ВЯ 30 меньше разрядности шины потоковой записи 17, при чтении преобразует формат считываемых данных к формату шины 30, путем мультиплексирования.The data block of the job data 4 writes the data of the job packages on the bus 43, to the address transmitted by the task arbiter 5 on the control bus 29, and the data of the job packages on the VL 6 ₁ , ..., 6 _N are read out on the bus of the read data of the tasks 30. The memory block of the data of tasks 4, by a reset signal 13, resets the counter of the memory write address. The memory block 4, when the width of the data bus VYA 30 is less than the bitness of the stream recording bus 17, when reading converts the format of the read data to the format of the bus 30, by multiplexing.

Арбитр заданий 5 считывает пакеты заданий из блоков входной памяти заголовков заданий 3 и данных заданий 4 и записывает данные заданий в вычислительные ядра 6₁, …, 6_N, а соответствующие им номера для идентификации того, к какому заданию относится результат, в N блоков памяти номеров заданий 7₁, …, 7_N. При этом арбитр заданий 5 принимает с интерфейсного блока 1 по шине 16 код режима записи заданий, а по шине 19 запросы на запись заданий от вычислительных ядер. По шине 31 с арбитра заданий 5 на ВЯ 6₁, …, 6_N поступают сигналы управления записью заданий 31'₁, …, 31'_N, а по шине 32 с арбитра заданий 5 поступают на блоки памяти номеров заданий 7₁, …, 7_N записываемые номера заданий, которые записываются по сигналам 31'₁, …, 31'_N. По сигналу сброса 13 арбитр заданий 5 обнуляет счетчики адресов блоков памяти 3 и 4 и переходит в режим ожидания снятия признака «Пустой» на выходе 28.Task Arbitrator 5 reads task packages from blocks of input memory for task headers 3 and task data 4 and writes task data to computing cores 6 ₁ , ..., 6 _N , and the corresponding numbers to identify which task the result relates to, in N memory blocks job numbers 7 ₁ , ..., 7 _N. In this case, the task arbiter 5 receives from the interface unit 1 on the bus 16 the code for the mode of recording tasks, and on the bus 19 requests to record tasks from the computing cores. Via bus 31 from task arbiter 5, VIA 6 ₁ , ..., 6 _N receives control signals for recording tasks 31 ' ₁ , ..., 31' _N , and via bus 32 from task arbiter 5, they are sent to memory blocks of task numbers 7 ₁ , ..., 7 _N recordable job numbers, which are recorded by signals 31 ' ₁ , ..., 31' _N. According to the reset signal 13, the task arbiter 5 resets the address counters of the memory blocks 3 and 4 and goes into standby mode for removing the “Empty” sign at the output 28.

Задание в общем случае может состоять из нескольких пакетов. В заголовке первого пакета задания содержится код режима записи задания и номер задания. В заголовке последнего пакета задания содержится признак последнего пакета, по которому формируется сигнал окончания загрузки задания. Сигналы окончания загрузки заданий по выходам 31'₁, …, 31'_N сообщают соответствующему ВЯ 6₁, …, 6_N, что во входной памяти 6-1 загружено очередное задание для выполнения, и инициализируют запись номера этого задания в соответствующий блок памяти номера задания 7₁, …, 7_N. Если в задании содержатся только параметры, а не данные для расчета, то в заголовке последнего пакета задания не содержится признака, инициализирующего выдачу сигнала окончания загрузки задания. В заголовках всех пакетов заданий содержится код управления записью задания (в какую часть входной памяти ВЯ должны быть направлены данные) и количество слов данных в пакете.A task can generally consist of several packages. The header of the first job package contains the job recording mode code and the job number. The header of the last job package contains the attribute of the last package, which forms the signal for the end of the job loading. The signals for terminating the loading of tasks at the outputs 31 ' ₁ , ..., 31' _N inform the corresponding VL 6 ₁ , ..., 6 _N that the next task is loaded in the input memory 6-1 for execution, and initialize the recording of the number of this task in the corresponding memory block number tasks 7 ₁ , ..., 7 _N. If the task contains only parameters, and not the data for calculation, then the header of the last package of the task does not contain a sign that initiates the issuance of a signal to finish loading the task. The headers of all job packages contain the code for managing the job record (to which part of the input memory the data should be sent) and the number of data words in the package.

Режим записи задания определяется кодом режима записи задания в заголовке пакета задания и кодом режима записи заданий, поступающим с интерфейсного блока 1 по шине 16. Предлагаемым модулем поддерживаются следующие режимы записи заданий в ВЯ 6₁, …, 6_N:The job recording mode is determined by the job recording mode code in the header of the job package and the job recording mode code coming from interface unit 1 via bus 16. The proposed module supports the following job recording modes in VL 6 ₁ , ..., 6 _N :

- Запись задания в соответствующее ВЯ 6₁, …, 6_N, номер которого передается от интерфейсного блока 1 по шине 16, без ожидания наличия запроса на запись задания от этого ядра. Режим предназначен только для записи параметров в ВЯ.- Recording in the corresponding reference load cell 6 _1, ... 6 _N, the number of which is transmitted from the interface unit 1 via the bus 16, without waiting for the availability of a write request from the task of this nucleus. The mode is intended only for recording parameters in VJ.

- Запись задания в соответствующее ВЯ 6₁, …, 6_N, номер которого передается от интерфейсного блока 1 по шине 16, после считывания по шине 19 на УП наличия запроса на запись задания от этого ядра.- Writing the task to the corresponding VL 6 ₁ , ..., 6 _N , the number of which is transmitted from the interface unit 1 via bus 16, after reading on the bus 19 to the UE the presence of a request to write the task from this core.

- Запись задания в соответствующее ВЯ 6₁, …, 6_N, номер которого указан в заголовке первого пакета задания, без ожидания наличия запроса на запись задания от этого ядра. Режим предназначен только для записи параметров в ВЯ.- Recording a task in the corresponding VL 6 ₁ , ..., 6 _N , whose number is indicated in the header of the first task packet, without waiting for a request to write a task from this kernel. The mode is intended only for recording parameters in VJ.

- Запись задания в соответствующее ВЯ 6₁, …, 6_N, номер которого указан в заголовке первого пакета задания, с ожиданием наличия запроса на запись задания от этого ядра по шине 19. Анализ запросов на запись заданий выполняется автоматически арбитром заданий 5.- Recording a task in the corresponding VL 6 ₁ , ..., 6 _N , the number of which is indicated in the header of the first task packet, with the expectation of a request to write a task from this core via bus 19. Analysis of requests to record tasks is performed automatically by the task arbiter 5.

- Запись задания параллельно во все ВЯ 6₁, …, 6_N, без ожидания наличия запросов на запись заданий от всех ядер. Режим предназначен только для записи общих параметров в ВЯ 6₁, …, 6_N.- Recording tasks in parallel to all VL 6 ₁ , ..., 6 _N , without waiting for requests to record tasks from all cores. The mode is intended only for recording general parameters in VL 6 ₁ , ..., 6 _N.

- Запись задания параллельно во все ВЯ 6₁, …, 6_N, с ожиданием наличия запросов на запись задания от всех ВЯ. При этом анализ запросов на запись заданий выполняется автоматически арбитром заданий 5.- Recording a task in parallel to all VL 6 ₁ , ..., 6 _N , with the expectation of the presence of requests to record the task from all VL. Moreover, the analysis of requests for recording tasks is performed automatically by the task arbiter 5.

- Запись задания последовательно во все ВЯ 6₁, …, 6_N, с ожиданием наличия запросов на запись задания от каждого ядра перед записью в него задания. Анализ запросов на запись заданий выполняется автоматически арбитром заданий 5. Данный режим может быть предпочтительнее режима параллельной записи задания, если ВЯ 6₁, …, 6_N не могут совмещать запись задания и обработку предыдущего задания, так как не требует готовности к записи задания всех ядер одновременно, и пока в одно ядро будет записываться задание, остальные могут выполнять обработку заданий.- Recording a task sequentially in all VL 6 ₁ , ..., 6 _N , with the expectation of the availability of requests to record a task from each core before writing the task to it. An analysis of requests for recording tasks is performed automatically by the task arbiter 5. This mode may be preferable to the parallel recording mode of a task if VL 6 ₁ , ..., 6 _N cannot combine the task recording and processing of the previous task, since it does not require readiness to write the task of all cores simultaneously, and while a task will be recorded in one core, the rest can process tasks.

- Запись задания в соответствующее ВЯ 6₁, …, 6_N, выбранное арбитром заданий 5. Арбитр заданий 5 осуществляет автоматический выбор соответствующего ВЯ 6₁, …, 6_N для записи задания в соответствии с приоритетом запросов на запись от ВЯ 6₁, …, 6_N. При этом от каждого ВЯ 6₁, …, 6_N поступает и анализируется 2-разрядный код запроса на запись по шинам 19₁, …, 19_N. Нулевой код соответствует отсутствию запроса. Большее значение кода соответствует более высокому приоритету запроса. При равных значениях кодов запросов больший приоритет имеет ВЯ 6₁, …, 6_N с меньшим порядковым номером.- Recording the task in the corresponding VL 6 ₁ , ..., 6 _N , selected by the task arbiter 5. The task arbiter 5 automatically selects the corresponding VL 6 ₁ , ..., 6 _N for recording the task in accordance with the priority of the write requests from the VL 6 ₁ , ... , 6 _N. In this case, from each VL 6 ₁ , ..., 6 _N , a 2-bit write request code is received and analyzed on the buses 19 ₁ , ..., 19 _N. Zero code means no request. A larger code value corresponds to a higher request priority. With equal values of the request codes, VL 6 ₁ , ..., 6 _N with a lower serial number has a higher priority.

Вычислительные ядра 6₁, …, 6_N должны иметь одинаковый интерфейс с остальными блоками модуля, т.е. одинаковую разрядность шин 15, 30 и 33, но могут иметь некоторые отличия во внутренней структуре. Это отмечено на фиг. 1 номерами "1"…"N", которые передаются в качестве параметров на ВЯ и могут учитываться при синтезе ВЯ.Computing cores 6 ₁ , ..., 6 _N must have the same interface with the rest of the module blocks, i.e. the same tire capacity 15, 30 and 33, but may have some differences in the internal structure. This is noted in FIG. 1 by numbers "1" ... "N", which are transmitted as parameters to the VW and can be taken into account in the synthesis of VV.

На входы ВЯ 6₁, …, 6_N с интерфейсного блока 1 поступают сигнал сброс 13 и одинаковые для всех ВЯ 6₁, …, 6_N параметры по шине 15. От арбитра заданий 5 на ВЯ 6₁, …, 6_N приходят шины управления записью 31₁, …, 31_N, а с блока памяти данных заданий 4 по шине считывания 30 данные заданий для записи. От арбитра результатов 9 на ВЯ 6₁, …, 6_N поступают сигналы чтения результатов 37₁, …, 37_N. С выходов ВЯ 6₁, …, 6_N на блок мультиплексоров результатов 8 поступают шины результатов ВЯ 33₁, …, 33_N, сигналы окончания считывания пакетов результатов 34₁, …, 34_N и сигналы окончания заданий 35₁, …, 35_N. Сигналы по шинам 35₁, …, 35_N также поступают на входы чтения соответствующих блоков памяти номеров заданий 7₁, …, 7_N.On the inputs of VL 6 ₁ , ..., 6 _N from the interface unit 1, a reset signal 13 and the same parameters for all VL 6 ₁ , ..., 6 _N are received on bus 15. From the job arbiter 5, buses come on VL 6 ₁ , ..., 6 _N write control 31 ₁ , ..., 31 _N , and from the data block of the job data 4 via the read bus 30 job data for recording. From the arbiter of results 9 to VL 6 ₁ , ..., 6 _N , signals are received to read the results 37 ₁ , ..., 37 _N. From the outputs of the VL 6 ₁ , ..., 6 _N , the bus of the VL 33 ₁ , ..., 33 _N results, the signals of the end of reading the result packets 34 ₁ , ..., 34 _N and the signals of the end of the tasks 35 ₁ , ..., 35 _N are sent to the block of multiplexers of results 8 . The signals on the buses 35 ₁ , ..., 35 _{N are} also fed to the reading inputs of the corresponding memory blocks of the job numbers 7 ₁ , ..., 7 _N.

Каждое ВЯ 6₁, …, 6_N состоит из входной буферной памяти 6-1, операционного блока 6-2, выходной буферной памяти 6-3 и блока управления 6-4.Each VL 6 ₁ , ..., 6 _N consists of an input buffer memory 6-1, an operating unit 6-2, an output buffer memory 6-3, and a control unit 6-4.

Входная буферная память 6-1 включает память параметров и память данных. Параметры загружаются перед началом обработки массива заданий и не изменяются в течение обработки этого массива заданий, а данные меняются от задания к заданию. Для совмещения загрузки данных задания в память и обработки ранее загруженных данных заданий память данных состоит из нескольких одинаковых блоков. В память 6-1 ВЯ записываются данные с шины 30 под управлением сигналов соответствующей шины 31₁, …, 31_N. Чтение из памяти 6-1 в операционный блок 6-2 осуществляется по интерфейсу, выбираемому разработчиком ВЯ. Управление режимами записи и считывания заданий из блока памяти 6-1 для обработки в операционном блоке 6-2 осуществляется блоком управления 6-4 соответствующего ВЯ.Input buffer memory 6-1 includes parameter memory and data memory. The parameters are loaded before processing the array of tasks and do not change during the processing of this array of tasks, and the data changes from task to task. To combine the loading of task data into memory and the processing of previously loaded task data, the data memory consists of several identical blocks. In the memory 6-1 VYA data is written from the bus 30 under the control of the signals of the corresponding bus 31 ₁ , ..., 31 _N. Reading from the memory 6-1 to the operating unit 6-2 is carried out on the interface selected by the developer of VJ. The control of the modes of writing and reading tasks from the memory block 6-1 for processing in the operation block 6-2 is carried out by the control unit 6-4 of the corresponding VL.

Операционный блок 6-2 ВЯ 6₁, …, 6_N приводится в начальное состояние сигналом сброса 13, запускается для выполнения очередного задания блоком управления 6-4, записывает результаты выполнения задания и признак окончания обработки задания из-за переполнения выходного буфера в выходную буферную память 6-3, формирует и передает на блок управления 6-4 сигнал окончания выполнения очередного задания и признак переполнения выходного буфера. Если в результате обработки очередного задания операционным блоком 6-2 достигнуто переполнение выходной буферной памяти 6-3 и не доведена до конца обработка задания, то операционный блок запоминает точку останова при переполнении и, при следующем старте от блока управления 6-4, начинает обработку оставшейся части задания именно с этой точки останова, а блок управления 6-4 в этом случае не переключает на вход операционного блока 6-2 следующий блок входной памяти 6-1, а оставляет подключенный ранее блок и запускает операционный блок 6-2 на обработку сразу при наличии свободного блока выходной памяти 6-3.The operating unit 6-2 VYA 6 ₁ , ..., 6 _{N is} brought to its initial state by a reset signal 13, it is launched to perform the next task by the control unit 6-4, it records the results of the task and the sign of the end of the processing of the task due to the overflow of the output buffer in the output buffer memory 6-3, generates and transmits to the control unit 6-4 a signal to complete the next task and a sign of overflow of the output buffer. If as a result of processing the next task by the operating unit 6-2, the overflow of the output buffer memory 6-3 is overflowed and the processing of the task is not completed, the operating unit remembers the breakpoint at overflow and, at the next start from the control unit 6-4, starts processing the remaining part of the job from this breakpoint, and the control unit 6-4 in this case does not switch to the input of the operating unit 6-2, the next input memory unit 6-1, but leaves the previously connected unit and starts the operation unit 6-2 for processing immediately when there is a free block of output memory 6-3.

Выходная буферная память 6-3 также состоит из нескольких одинаковых блоков для уменьшения простоев операционного блока 6-2 во время ожидания записи и записи во входную память 6-1 или ожидания чтения и чтения из выходной памяти 6-3. Номера блоков для записи результатов от операционного блока 6-2 и чтения результатов из ВЯ выбираются блоком управления 6-4. Запись результатов в выходную память 6-3 от операционного блока 6-2 осуществляется по интерфейсу, выбираемому разработчиком ВЯ. При выборе блоком управления 6-4 очередного блока памяти 6-3 на чтение на блок управления от выбранного на чтение блока памяти поступают признак отсутствия результатов и сигнал окончания чтения результатов с данного блока. Чтение результатов с выхода памяти 6-3 происходит по соответствующему сигналу 37₁, …, 37_N. Чтение происходит в общем случае с некоторой конвейерной задержкой.The output buffer memory 6-3 also consists of several identical blocks to reduce downtime of the operating unit 6-2 while waiting to write and write to the input memory 6-1 or to wait for reading and reading from the output memory 6-3. The block numbers for recording the results from the operating unit 6-2 and reading the results from the VL are selected by the control unit 6-4. The results are written to the output memory 6-3 from the operating unit 6-2 by the interface selected by the developer of VJ. When the control unit 6-4 selects the next 6-3 memory block for reading to the control unit, a sign of the absence of results and a signal to end the reading of the results from this block are received from the memory block selected for reading. Reading the results from the memory output 6-3 occurs on the corresponding signal 37 ₁ , ..., 37 _N. Reading occurs in the general case with some pipeline delay.

Считываемые из вычислительных ядер результаты выдаются на соответствующую шину 33₁, …, 33_N. Эти результаты сопровождаются признаком достоверности результата 33' (Valid). Последнее слово считываемого результата сопровождается сигналом по соответствующему выходу 34₁, …, 34_N, а если это одновременно и окончание обработки задания (не было записано признака переполнения при записи результатов от операционного блока), то выдается и сигнал по соответствующему выходу 35₁, …, 35_N.The results read from the computing cores are output to the corresponding bus 33 ₁ , ..., 33 _N. These results are accompanied by a sign of reliability of the result 33 '(Valid). The last word of the result to be read is accompanied by a signal for the corresponding output 34 ₁ , ..., 34 _N , and if it is also the end of the job processing (there was no sign of overflow when recording the results from the operation unit), then a signal is issued for the corresponding output 35 ₁ , ... , 35 _N.

В случае отсутствия результатов в задании предлагаемым модулем вычислительной среды поддерживаются два различных режима работы ВЯ. В первом режиме блок управления 6-4, при выборе на чтение блока выходной памяти 6-3 с нулевым числом слов результатов, не выдает запроса на чтение по соответствующей шине 20₁, …, 20_N, а соответствующий блок выходной памяти 6-3, после его выбора на чтение, сразу выдает соответствующий сигнал окончания задания 35₁, …, 35_N, после чего блок управления сразу выбирает следующий блок памяти результатов 6-3 на чтение или ждет загрузки результатов в данный блок, если выходная память 6-3 имеет только один блок. Во втором режиме работы ВЯ, после выбора на чтение блока памяти 6-3 с нулевым числом результатов, блок управления 6-4 выдает соответствующий запрос на чтение 20₁, …, 20_N, как и при ненулевом числе результатов, а блок памяти 6-3 ожидает прихода соответствующего сигнала чтения 37₁, …, 37_N и по его приходу выдает на блок управления 6-4 сигнал окончания чтения, а на блок мультиплексоров результатов 8 по выходам соответствующие сигналы окончания считывания пакетов 34₁, …, 34_N и окончания заданий 35₁, …, 35_N. При этом признак достоверности результата по шине 33₁, …, 33_N не выдается. В первом режиме при отсутствии результатов номер задания просто выталкивается из памяти номеров заданий 7₁, …, 7_N, и время на запись заголовков пустых результатов в выходную память и их считывание не тратится, а во втором режиме номера «безрезультативных» заданий явно указываются в заголовках результатов. Режим работы ВЯ 6₁, …, 6_N при отсутствии результатов выбирается разработчиком ВЯ.If there are no results in the task, the proposed module of the computing environment supports two different operating modes of VL. In the first mode, the control unit 6-4, when the output memory block 6-3 is selected for reading with a zero number of result words, does not issue a read request on the corresponding bus 20 ₁ , ..., 20 _N , and the corresponding output memory block 6-3, after it has been selected for reading, it immediately gives the corresponding signal for the end of task 35 ₁ , ..., 35 _N , after which the control unit immediately selects the next memory block of the results 6-3 for reading or waits for the results to be loaded into this block if the output memory 6-3 has only one block. In the second operation mode of the VL, after choosing to read the memory block 6-3 with zero number of results, the control unit 6-4 issues a corresponding request to read 20 ₁ , ..., 20 _N , as with a non-zero number of results, and the memory block 6- 3, it awaits the arrival of the corresponding read signal 37 ₁ , ..., 37 _N and, upon its arrival, outputs to the control unit 6-4 a read end signal, and to the output multiplexer block 8 at the outputs, the corresponding signals for reading the packets 34 ₁ , ..., 34 _N and the end Tasks 35 ₁ , ..., 35 _N. Moreover, the sign of reliability of the result on the bus 33 ₁ , ..., 33 _{N is} not issued. In the first mode, if there are no results, the task number is simply erased from the memory of the task numbers 7 ₁ , ..., 7 _N , and time is not spent on writing the headers of empty results to the output memory and reading them, and in the second mode, the numbers of the "ineffective" tasks are explicitly indicated in headings of results. The operating mode of the VL 6 ₁ , ..., 6 _N, in the absence of results, is selected by the VL developer.

Блок управления 6-4 соответствующего ВЯ 6₁, …, 6_N принимает сигналы сброса 13, окончания записи исполняемого задания во входную память 31'₁, …, 31'_N, окончания обработки от операционного блока 6-2 с признаком переполнения, нулевого количества слов результатов в выбранном для чтения блоке выходной памяти 6-3, окончания считывания результатов с выбранного для чтения блока выходной памяти. Блок управления 6-4 выбирает номера блоков входной 6-1 и выходной памяти 6-3 для записи и чтения, формирует сигнал запуска операционного блока 6-2 и коды запросов от ядра на запись задания и чтение результатов. Логика формирования запросов на запись и чтение определяется разработчиком ВЯ. Коду запроса присваивается наибольшее значение, если именно выполнение запрашиваемой операции записи или чтения задерживает очередной запуск операционного блока, если для запуска операционного блока кроме выполнения запрашиваемой операции, например записи, необходимо еще и выполнение также операции чтения, то приоритет запроса устанавливается меньше, а если выполнение операции возможно, но операционный блок работает, то приоритет - минимальный. В случае невозможности выполнения данной операции должен формироваться нулевой код запроса.The control unit 6-4 of the corresponding VL 6 ₁ , ..., 6 _N receives the reset signals 13, the end of the recording of the executed task in the input memory 31 ' ₁ , ..., 31' _N , the end of processing from the operation unit 6-2 with an overflow sign, zero quantity words of results in the selected block of output memory 6-3, the end of reading the results from the selected block for reading the output memory. The control unit 6-4 selects the block numbers of the input 6-1 and the output memory 6-3 for writing and reading, generates a start signal for the operating unit 6-2 and request codes from the kernel for recording the task and reading the results. The logic for generating write and read requests is determined by the developer of VJ. The request code is assigned the highest value, if it is the execution of the requested write or read operation that delays the next launch of the operating unit, if, in addition to performing the requested operation, for example, writing, it is also necessary to perform the read operation to start the operation unit, the priority of the request is set less, and if execution operations are possible, but the operation unit is working, then the priority is minimal. If it is impossible to perform this operation, a null request code should be generated.

Блоки памяти номеров заданий 7₁, …, 7_N являются блоками памяти типа FIFO. На входы сброса этих блоков памяти поступают сигналы сброса 13, на входы записи поступают сигналы окончания записи исполняемых заданий 31'₁, …, 31'_N, на входы чтения, используемые как сигналы смены считываемого номера задания, - сигналы окончания заданий 35₁, …, 35_N, на входы данных - шины записываемых номеров заданий 32₁, …, 32_N. С выходов блоков памяти номеров заданий 7₁, …, 7_N номера заданий, к которым относятся считываемые результаты и признаки «Пустой», по соответствующим шинам 36₁, …, 36_N поступают на блок мультиплексоров результатов 8.The memory blocks of job numbers 7 ₁ , ..., 7 _N are FIFO memory blocks. The reset inputs 13 receive signals to the reset inputs of these memory blocks, the end recording signals of executed tasks 31 ' ₁ , ..., 31' _N are received, the read inputs used as signals to change the read number of the task receive signals to complete the tasks 35 ₁ , ... , 35 _N , to the data inputs - buses of recorded job numbers 32 ₁ , ..., 32 _N. From the outputs of the memory blocks of the job numbers 7 ₁ , ..., 7 _N , the job numbers, which include the read results and the “Empty” signs, are sent to the result multiplexer block 8 by the corresponding buses 36 ₁ , ..., 36 _N.

На блок мультиплексоров результатов 8 поступает от арбитра результатов 9 по шине 38 код номера выбранного ВЯ 6₁, …, 6_N. В соответствии с данным кодом блок мультиплексоров передает одну из входных шин результатов 33₁, …, 33_N на выходную шину 33, номер задания с одной из входных шин 36₁, …, 36_N на выходную шину 36, один из входных сигналов 34₁, …, 34_N на выход 34, а один из входных сигналов 35₁, …, 35_N на выход 35. Также, в блоке мультиплексоров результатов 8 на основе признаков «Пустой» блоков памяти номеров заданий 7₁, …, 7_N, передаваемых по шинам 36₁, …, 36_N, формируется общий признак «Пустой» всех блоков памяти номеров заданий 21', который по шине 21 передается на интерфейсный блок 1. Общий признак «Пустой» номеров заданий используется УП для определения окончания обработки всего пакета заданий, если вычислительные ядра, при отсутствии результатов выполнения очередного задания, не выдают запрос на чтение, а сразу выдают сигнал окончания задания 35 и номер данного задания отсутствует в заголовках результатов.To the block of multiplexers of results 8, the number code of the selected VL 6 ₁ , ..., 6 _{N is} received from the arbiter of results 9 via bus 38. In accordance with this code, the multiplexer unit transmits one of the input buses of the results 33 ₁ , ..., 33 _N to the output bus 33, the job number from one of the input buses 36 ₁ , ..., 36 _N to the output bus 36, one of the input signals 34 ₁ , ..., 34 _N to the output 34, and one of the input signals 35 ₁ , ..., 35 _N to the output 35. Also, in the block of multiplexers of results 8 based on the signs of "Empty" memory blocks of job numbers 7 ₁ , ..., 7 _N , transmitted on lines 36 _1, ..., 36 _N, is formed by a common feature of "empty" for all memory blocks job numbers 21 ', which is transmitted over the bus 21 to the interface lock 1. The common attribute “Empty” of job numbers is used by the UE to determine the end of processing the entire job package, if the computational cores, in the absence of the results of the next job, do not give a read request, but immediately give a signal to finish the job 35 and the number of this job is absent headings of results.

На арбитр результатов 9 поступают сигнал сброса 13, код режима чтения результатов по шине 14, запросы вычислительных ядер на чтение по шине 20, сигналы окончания чтения с выбранного арбитром ядра по шине 34, признак достоверности считанного с выбранного ядра кода результата 33' и признаки переполнения блоков выходной памяти результатов 40 и 41. Арбитр результатов 9 формирует сигналы чтения результатов 37₁, …, 37_N в соответствующее вычислительное ядро 6₁, …, 6_N, код номера выбранного вычислительного ядра 38 в блок мультиплексоров результатов 8 и код количества слов в пакете результатов 39 в блок памяти заголовков результатов 10.The result arbiter 9 receives a reset signal 13, a mode code for reading the results on bus 14, requests for reading cores from the bus 20, end signals from the selected arbiter on the bus 34, a sign of reliability of the result code 33 'read from the selected kernel, and signs of overflow output memory blocks 40 and 41. The results arbitrator results reading results 9 generates signals 37 _1, ..., 37 _N in the respective processing core 6 _1, ... 6 _N, number of selected computational kernel code 38 results in a block multiplexer 8 and to the code lichestva words in the package 39 results in memory header block 10 results.

Режим работы арбитра результатов 9 определяется кодом управления, поступающим по шине 14. Арбитр результатов 9 может работать в трех различных режимах:The mode of operation of the result arbiter 9 is determined by the control code received on the bus 14. The arbiter of results 9 can operate in three different modes:

- Чтение результата из соответствующего ВЯ 6₁, …, 6_N, номер которого передается от интерфейсного блока 1 по шине 14, после считывания по шине 20 в интерфейсный блок 1 и анализа запроса на чтение результата от этого ядра.- Reading the result from the corresponding VL 6 ₁ , ..., 6 _N , the number of which is transmitted from the interface block 1 via bus 14, after reading on bus 20 to the interface block 1 and analyzing the request to read the result from this core.

- Чтение результата последовательно из всех ВЯ 6₁, …, 6_N, с ожиданием наличия запросов на чтение результата по шине 20 от каждого ВЯ перед чтением из него результата. Анализ запросов на чтение результата по шине 20 выполняется автоматически арбитром результатов 9. Данный режим может использоваться при одинаковых временах загрузки, выполнения заданий и считывания результатов для всех ВЯ.- Reading result successively from all load cells 6 _1, ... 6 _N, with the expectation of having read requests over the bus 20 the result of each load cell before reading result therefrom. An analysis of requests to read the result on the bus 20 is performed automatically by the arbiter of results 9. This mode can be used for the same time of loading, completing tasks and reading the results for all VL.

- Чтение результата из соответствующего ВЯ 6₁, …, 6_N, выбранного арбитром результатов 9. При этом арбитр результатов 9 осуществляет автоматический выбор ядра для чтения результата в соответствии с приоритетом запросов на чтение от ВЯ. От каждого ядра поступает двухразрядный код запроса на чтение по соответствующим шинам 20₁, …, 20_N. Нулевой код соответствует отсутствию запроса. Большее значение кода соответствует более высокому приоритету запроса. При равных значениях кодов запросов больший приоритет имеет ВЯ с меньшим порядковым номером.- Reading the result from the corresponding VL 6 ₁ , ..., 6 _N , selected by the arbiter of the results 9. In this case, the arbiter of the results 9 automatically selects the kernel for reading the result in accordance with the priority of requests for reading from the VL. Each core receives a two-bit read request code on the corresponding buses 20 ₁ , ..., 20 _N. Zero code means no request. A larger code value corresponds to a higher request priority. With equal values of the request codes, VL with a lower serial number has a higher priority.

Блоки памяти заголовков результатов 10 и памяти данных результатов 11 реализуют дисциплину обслуживания FIFO.The memory blocks of the headers of the results 10 and the memory of the data of the results of 11 implement the discipline of service FIFO.

Блок памяти заголовков результатов 10 записывает сигналом окончания чтения результатов 34 заголовок результата, состоящий из кодов номера задания 36, количества слов в пакете результатов 39 и признака окончания задания 35. По сигналу чтения 23 от интерфейсного блока 1, из блока памяти 10 считываются заголовки пакетов результатов по шине 25 в интерфейсный блок 1. Также блок памяти 10 выдает для чтения на интерфейс УП код количества пакетов результатов в выходной памяти по шине 21 в интерфейсный блок 1 и признак переполнения памяти заголовков результатов 41 в арбитр результатов 9.The block of headers of the results 10 writes a signal to the end of reading the results 34 the header of the result, consisting of codes of the task number 36, the number of words in the result package 39 and the sign of the end of the task 35. By the read signal 23 from the interface unit 1, the headers of the result packets are read from the memory block 10 on the bus 25 to the interface unit 1. Also, the memory unit 10 issues a code for the number of result packets in the output memory for reading to the UP interface on the bus 21 to the interface unit 1 and a sign of overflow of the result headers memory Comrade 41 to Arbitrator of Results 9.

Блок памяти данных результатов 11 записывает сигналом достоверности результата 33' код данных результата, приходящий по шине 33. С данного блока памяти 11 по сигналу чтения 22, приходящему от интерфейсного блока 1, считываются по шине 24 пакеты данных результатов. Также блок памяти 11 выдает признак переполнения памяти результатов 40 в арбитр результатов 9. Блок памяти 11 также преобразует разрядность выходной шины результатов ВЯ 33 в разрядность шины потоковой операции чтения 24. При разрядности выходной шины результатов ВЯ 33 меньшей, чем разрядность шины потоковой операции чтения 24, преобразование разрядности происходит на входе блока памяти 11, а при разрядности шины 33 большей, чем разрядность шины 24, преобразование разрядности происходит на выходе блока памяти 11.The result data memory unit 11 writes a result validity signal 33 'to the result data code coming on the bus 33. From this memory unit 11, the result data packets are read out on the read signal 22 coming from the interface unit 1 through the bus 24. Also, the memory block 11 gives a sign of overflow of the memory of the results 40 into the arbiter of the results 9. The memory block 11 also converts the bitness of the output bus of the VL 33 results to the bit width of the stream read operation 24. When the bit width of the output bus of the VL 33 results is lower than the bit width of the stream read operation 24 , the bit conversion occurs at the input of the memory block 11, and when the bit capacity of the bus 33 is greater than the bit capacity of the bus 24, the bit conversion occurs at the output of the memory block 11.

Предлагаемый вычислительный модуль работает следующим образом.The proposed computing module operates as follows.

Управляющий процессор (УП) обычно работает с несколькими предлагаемыми ВМ одновременно. На всех модулях и в каждом модуле могут использоваться одинаковые ВЯ 6₁, …, 6_N, предназначенные для выполнения однотипных заданий, или различные ВЯ 6₁, …, 6_N, которые могут отличаться и предназначаться для выполнения принципиально разных заданий.The control processor (UP) usually works with several offered VMs at the same time. On all modules and in each module, the same VL 6 ₁ , ..., 6 _N can be used, designed to perform the same tasks, or different VL 6 ₁ , ..., 6 _N , which can differ and are intended to perform fundamentally different tasks.

Работа УП с ВМ происходит в следующей последовательности.Work UP with VM occurs in the following sequence.

УП задает режимы работы модуля для записи заданий и чтений результатов и общие параметры ВЯ 6₁, …, 6_N. После этого УП производит начальную инициализацию модуля, подавая сигнал сброса 13, и начинает работу с модулем. Коды сброса, режимов работы и передающихся параллельным кодом параметров поступают на модуль по шине интерфейса УП 12 и после преобразования в интерфейсном блоке 1 фиксируются в его внутренних регистрах и поступают на блоки модуля по шинам 13, 14, 15 и 16.UE sets the operating modes of the module for recording tasks and reading results and general parameters of VL 6 ₁ , ..., 6 _N. After that, the UE performs initialization of the module, giving a reset signal 13, and starts working with the module. The reset codes, operating modes, and parameters transmitted by the parallel code are sent to the module via the interface bus UP 12 and after conversion in the interface unit 1 are fixed in its internal registers and transmitted to the module blocks via buses 13, 14, 15, and 16.

В работе управляющего процессора с ВМ после загрузки режимов можно выделить 3 процесса: запись заданий, чтение результатов и анализ окончания работы.In the operation of the control processor with the VM after loading the modes, 3 processes can be distinguished: recording tasks, reading results and analyzing the end of work.

Процесс записи заданий. Управляющий процессор периодически считывает количество пакетов заданий во входной памяти путем передачи информации с шины 18 интерфейсным блоком 1 на шину внешнего интерфейса 12. Управляющий процессор, на основании объема блоков памяти заголовков 3 и данных 4 заданий и объема, загруженных в них пакетов заданий, определяет объем свободной входной памяти и загружает в нее максимально возможное число заданий из массива заданий для выполнения. В автоматическом режиме выбора ВЯ 6₁, …, 6_N на этом функция записи заданий заканчивается. В режиме выбора ВЯ от УП управляющий процессор также считывает запросы записи от ВЯ по шине 19, выбирает по этим запросам ядро для записи и посылает код номера этого ядра и сигнал инициализации перезаписи задания из входной памяти в данное ядро по шине 16. Режим выбора ВЯ для записи заданий от УП является вспомогательным режимом модуля.The process of recording tasks. The control processor periodically reads the number of job packets in the input memory by transmitting information from the bus 18 by the interface unit 1 to the external interface bus 12. The control processor, based on the volume of the memory blocks of the headers 3 and the data 4 of the jobs and the amount of job packets loaded into them, determines the amount free input memory and loads into it the maximum possible number of tasks from the array of tasks to perform. In the automatic selection mode VL 6 ₁ , ..., 6 _N , the task recording function ends here. In the VL select mode from the control unit, the control processor also reads write requests from the VL via bus 19, selects the core for recording from these requests, and sends the code for the number of this core and the initialization signal to overwrite the job from the input memory to the given core via bus 16. The VL selection mode for recording jobs from the UE is an auxiliary mode of the module.

Процесс чтения результатов. Управляющий процессор периодически считывает количество пакетов результатов в выходной памяти. Чтение происходит путем передачи информации с шины 21 интерфейсным блоком 1 на шину внешнего интерфейса 12 по запросу УП. При ненулевом числе пакетов результатов, управляющий процессор считывает заголовки этих результатов с блока памяти 10. Перед началом чтения УП выставляет запрос на интерфейсный блок 1 на чтение заданного количества слов по шине 25. Интерфейсный блок 1 формирует сигнал чтения 23 с длительностью, соответствующей количеству считываемых слов, и по шине 25 передает с блока памяти 10 на внешний интерфейс 12 требуемое количество слов. Управляющий процессор анализирует прочитанные заголовки блоков результатов, суммирует количество слов в пакетах результатов, которые содержатся в этих заголовках, с учетом преобразования разрядности результатов к разрядности интерфейсного обмена, и считывает с блока памяти 11 требуемое количество слов результатов. Перед началом чтения УП выставляет запрос на интерфейсный блок 1 на чтение заданного количества слов по шине 24. Интерфейсный блок 1 формирует сигнал чтения 22 с длительностью, соответствующей количеству считываемых слов, и по шине 24 передает с блока памяти 11 на внешний интерфейс 12 требуемое количество слов. В автоматических режимах выбора ВЯ для чтения на этом функция чтения результатов УП заканчивается. В режиме выбора ВЯ 6₁, …, 6_N для чтения от УП, управляющий процессор также считывает запросы на чтение от ВЯ по шине 20, выбирает по этим запросам ядро для чтения и посылает код номера этого ядра и сигнал инициализации чтения результата из данного ядра по шине 14. Режим выбора ядра для чтения результатов от УП является вспомогательным режимом модуля.The process of reading the results. The control processor periodically reads the number of result packets in the output memory. Reading occurs by transmitting information from the bus 21 by the interface unit 1 to the bus of the external interface 12 at the request of the unitary enterprise. If the number of result packets is nonzero, the control processor reads the headers of these results from the memory unit 10. Before starting the reading, the UE issues a request to the interface unit 1 to read the specified number of words on the bus 25. The interface unit 1 generates a read signal 23 with a duration corresponding to the number of read words , and on the bus 25 transmits from the memory unit 10 to the external interface 12 the required number of words. The control processor analyzes the read headers of the result blocks, sums the number of words in the result packets that are contained in these headers, taking into account the conversion of the capacity of the results to the capacity of the interface exchange, and reads from the memory block 11 the required number of results words. Before reading, the UE issues a request to the interface unit 1 to read the specified number of words on the bus 24. The interface unit 1 generates a read signal 22 with a duration corresponding to the number of words to be read, and transmits the required number of words from the memory unit 11 to the external interface 12 . In the automatic modes of selecting the VL for reading, the function for reading the results of the UP ends on this. In the selection mode of VL 6 ₁ , ..., 6 _N for reading from the control unit, the control processor also reads read requests from the VL on bus 20, selects the core for reading from these requests and sends the code for the number of this core and the signal to read the result from the given core on bus 14. The kernel selection mode for reading the results from the UE is an auxiliary mode of the module.

Процесс анализа окончания работы. Если ВЯ 6₁, …, 6_N модуля по окончании обработки задания выставляют запрос на чтение по шине 20 даже при отсутствии результатов, то окончание работы модуля определяется по факту считывания всех результатов и наличию в заголовках этих результатов номеров всех загруженных в модуль заданий с признаками их окончания по шине 34. Если же ВЯ не выставляют запрос на чтение по шине 20 при отсутствии результатов и, следовательно, нет заголовков результатов с номерами безрезультативных заданий, то окончание работы модуля определяется по совпадению следующих условий: в модуль записаны все предназначенные для него задания, количество заданий во входной памяти модуля равно нулю, количество пакетов результатов в выходной памяти модуля равно нулю, все блоки памяти номеров заданий «пусты». Количество заданий во входной памяти определяется УП чтением кода с шины 18, а количество пакетов результатов в выходной памяти и признак «Пустой» всех блоков памяти номеров заданий 7₁, …, 7_N определяется УП чтением шины 21.The process of analyzing the completion of work. If VYA 6 ₁ , ..., 6 _{N of the} module at the end of the processing of the task put a read request on bus 20 even if there are no results, then the end of the module is determined by the fact of reading all the results and the presence in the headings of these results of numbers of all tasks loaded into the module with signs their termination on the bus 34. If VYa do not submit a request to read on the bus 20 in the absence of results and, therefore, there are no results headers with numbers of ineffective tasks, then the termination of the module is determined by coincidence constituent conditions: a module contains all the jobs intended for it, the number of jobs in the input memory module is zero, the number of packets results in the output of the memory module is zero, all numbers memories jobs are "empty". The number of jobs in the input memory is determined by the UP by reading the code from bus 18, and the number of result packets in the output memory and the “Empty” flag of all memory blocks of the job numbers 7 ₁ , ..., 7 _{N are} determined by the UP by reading the bus 21.

Задания поступают от УП по внешнему интерфейсу 12, преобразуются интерфейсным блоком 1 в сигналы шины потоковой записи 17, разделяются блоком 2 "на лету" (без буферизации) на заголовки пакетов задания и непосредственно загружаемые впоследствии в вычислительные ядра данные заданий, которые записываются в блоки памяти заголовков заданий 3 и данных заданий 4 соответственно по шинам 42 и 43. В блоке разделения заданий 2 первое слово, поступающее после сигнала сброса 13 по шине заданий 17, считается заголовком пакета задания, и этот заголовок записывается по шине 42 в блок памяти 3. Из этого заголовка запоминается в блоке 2 количество слов данных в этом пакете, и далее это число слов, с учетом соотношения разрядности слова данных задания и разрядности слова интерфейсного обмена, записывается с шины 17 через блок разделения заданий 2 в память данных 4 по шине 43. После окончания записи в блок памяти данных 4 требуемого количества слов, следующее слово, поступающее по шине 17, снова воспринимается блоком разделения заданий 2 как заголовок следующего задания.Jobs come from the UE via the external interface 12, are converted by the interface unit 1 into the signals of the streaming recording bus 17, are separated by the unit 2 on the fly (without buffering) into the headers of the job packets and the job data directly loaded into the computational cores, which are written to the memory blocks headers of tasks 3 and data of tasks 4, respectively, on buses 42 and 43. In the block for separation of tasks 2, the first word coming after the reset signal 13 on the bus of tasks 17 is considered the heading of the task package, and this title is written communication via bus 42 to memory block 3. From this header, in block 2, the number of data words in this packet is stored, and then this number of words, taking into account the ratio of the word length of the job data and the word length of the interface exchange, is recorded from bus 17 through the task separation unit 2 to the data memory 4 via the bus 43. After the recording of the required number of words in the data memory unit 4, the next word received via the bus 17 is again perceived by the task separation unit 2 as the title of the next task.

В автоматическом режиме выбора ВЯ 6₁, …, 6_N блок арбитра заданий 5, при наличии задания во входной памяти по сигналу на выходе 28, читает заголовок первого пакета задания по шине 26 по адресу 27, и в зависимости от кода режима выбора ядер в заголовке записывает задание в выбранное или во все вычислительные ядра 6₁, …, 6_N без анализа запросов на запись от этих ядер, или ожидает запроса на запись от выбранного в заголовке ядра и после его появления записывает задание в это ядро, или ожидает появления запросов на запись от всех ядер одновременно и записывает задание параллельно во все ядра, или ожидает появления запроса хотя бы от одного ядра и записывает задание в то ядро, которое выбирается схемой анализа приоритета в блоке арбитра заданий 5. В режиме последовательной записи задания во все ВЯ блок арбитра заданий 5 ожидает появления запроса записи от первого ядра, записывает задание в первое ядро, далее возвращает адреса по шинам 27 и 29 к считыванию того же задания и ожидает запроса записи от второго ядра, записывает задание во второе ядро. Далее блок арбитра заданий 5 повторяет эти же действия последовательно для всех остальных ядер.In automatic selection mode VYA 6 ₁ , ..., 6 _N, the task arbiter block 5, if there is a task in the input memory by the signal at output 28, reads the header of the first task packet on bus 26 at address 27, and depending on the code of the core selection mode, Header writes the task to the selected one or to all of the computational cores 6 ₁ , ..., 6 _N without analyzing write requests from these kernels, or waits for a write request from the selected kernel in the header and, after its appearance, writes the task to this kernel, or waits for requests to appear to record from all cores simultaneously and record sends the task in parallel to all cores, or waits for a request from at least one core and writes the task to the core that is selected by the priority analysis circuit in the block of task arbiter 5. In the sequential recording mode of a task in all WL, the block of task arbiter 5 waits for a write request from the first core, writes the task to the first core, then returns the addresses on buses 27 and 29 to read the same task and waits for a write request from the second core, writes the task to the second core. Next, the task arbiter block 5 repeats the same actions sequentially for all other cores.

В режиме выбора ВЯ от УП, управляющий процессор анализирует запросы записи от ВЯ по шине 19, выдает в арбитр заданий 5 код номера выбранного ядра и сигнал записи по шине 16, по которым арбитр заданий 5 записывает задание в соответствующее ядро.In the selection mode of the VL from the control unit, the control processor analyzes the write requests from the VL on the bus 19, issues the code of the selected core number and the write signal on the bus 16 to the task arbiter 5, through which the task arbiter 5 writes the task to the corresponding core.

При записи заданий в ВЯ 6₁, …, 6_N, блок арбитра заданий 5 формирует коды управления записью в ядра по шинам 31₁, …, 31_N в соответствии со значением этого кода в заголовке соответствующего пакета задания для выбранного для записи ядра или для всех ядер. Длительность кода записи определяется количеством слов, указанных в заголовке пакета. Синхронно с кодом управления записью, арбитр заданий 5 формирует адрес на шине 29 для чтения данных задания, которые поступают по шине 30 на входы данных ВЯ. Если данное задание является исполняемым, а не пакетами параметров, то по окончании записи данных для последнего пакета в задании арбитр заданий 5 формирует сигналы окончания записи задания 31'₁, …, 31'_N, по которым блоки управления ВЯ 6-4 фиксируют прием в ядро очередного задания для исполнения, и в соответствии с этим управляют выбором блоков входной памяти задания 6-1 на запись и на чтение, очередным запуском операционного блока 6-2 и формированием запросов на запись 19₁, …, 19_N и на чтение 20₁, …, 20_N. Также по сигналам окончания записи задания 31₁, …, 31_N, поступающим на блоки памяти номера задания 7₁, …, 7_N, записываются в эти блоки памяти с шин 32₁, …, 32_N номера заданий. Номера заданий формируются в арбитре заданий 5 из кода номера задания, запомненного при чтении арбитром заголовка задания, и кода номера ВЯ, если запись производится последовательно или параллельно во все ядра.When recording tasks in VL 6 ₁ , ..., 6 _N , the task arbiter block 5 generates control codes for writing to the cores via buses 31 ₁ , ..., 31 _N in accordance with the value of this code in the header of the corresponding task package for the selected kernel for recording or all cores. The duration of the recording code is determined by the number of words indicated in the packet header. Synchronously with the write control code, job arbiter 5 generates an address on bus 29 for reading job data, which is sent via bus 30 to the inputs of the VL data. If this task is executable, and not parameter packets, then upon completion of data recording for the last packet in the task, the task arbiter 5 generates signals to complete the recording of task 31 ' ₁ , ..., 31' _N , according to which control units VYA 6-4 record the reception in the core of the next task for execution, and in accordance with this control the selection of input memory blocks of task 6-1 for writing and reading, the next launch of the operating unit 6-2 and the formation of write requests 19 ₁ , ..., 19 _N and read 20 ₁ , ..., 20 _N. Also, by the signals of the end of the recording of the task 31 ₁ , ..., 31 _N received by the memory blocks of the task number 7 ₁ , ..., 7 _N , the number of tasks are written to these memory blocks from the buses 32 ₁ , ..., 32 _N. The numbers of tasks are generated in the task arbiter 5 from the code of the number of the task, memorized when the arbiter reads the title of the task, and the code of the number VY, if the record is made sequentially or in parallel to all cores.

Каждое из вычислительных ядер 6₁, …, 6_N последовательно обрабатывает поступившие на это ядро задания, считывая данные из одного из блоков входной памяти 6-1 и записывая результаты в один из блоков выходной памяти 6-3. Блоки памяти как для чтения 6-3, так и для записи 6-1 выбираются только последовательно. Условием запуска обработки очередного задания является окончание обработки предыдущего задания, наличие очередного задания в одном из блоков входной памяти 6-1 и наличие свободного блока в выходной памяти 6-3 для записи результата. Условием запуска продолжения обработки задания после переполнения блока выходной памяти 6-3 является наличие свободного блока в выходной памяти 6-3.Each of the computing cores 6 ₁ , ..., 6 _N sequentially processes the tasks received by this core, reading data from one of the blocks of the input memory 6-1 and writing the results to one of the blocks of the output memory 6-3. The memory blocks for reading 6-3 and for writing 6-1 are selected only sequentially. A condition for starting processing of the next task is the end of the processing of the previous task, the presence of the next task in one of the blocks of the input memory 6-1 and the presence of a free block in the output memory 6-3 for recording the result. The condition for starting the processing of the job after overflowing the block of the output memory 6-3 is a free block in the output memory 6-3.

Запросы на чтение результатов 20₁…20_N поступают в арбитр результатов 9 и на интерфейсный блок 1.Requests for reading the results 20 ₁ ... 20 _N are received by the arbiter of the results 9 and on the interface unit 1.

Арбитр результатов 9 работает в одном из трех режимов: выбор ВЯ для считывания результата по приоритету запроса на считывание, последовательное считывание результатов со всех ядер в порядке нумерации, выбор ядра для считывания результатов от УП.Arbitrator of results 9 operates in one of three modes: selecting the VL for reading the result according to the priority of the read request, sequential reading of the results from all the cores in the order of numbering, and choosing the core for reading the results from the control unit.

В режиме выбора ВЯ 6₁, …, 6_N по приоритету запроса на считывание результата 20₁…20_N, арбитр результатов 9, после сигнала сброса 13 или окончания чтения предыдущего результата, ожидает хотя бы одного запроса и выбирает ядро с наибольшим приоритетом запроса на чтение. В режиме последовательного чтения результатов в порядке нумерации ядер, арбитр результатов 9 после сигнала сброса 13 ждет запроса на чтение от 1-го ядра, а по окончании чтения результата с очередного ядра ждет запроса от ядра со следующим порядковым номером. В режиме выбора ядра от УП, управляющий процессор анализирует запросы на чтение от ВЯ по шине 20 и выдает в арбитр результатов 9 код номера выбранного ядра и сигнал начала чтения по шине 14. По данному сигналу арбитр результатов 9 начинает чтение результатов из соответствующего ядра. В двух остальных режимах начало чтения автоматически определяется арбитром результатов 9.In the selection mode VL 6 ₁ , ..., 6 _N according to the priority of the request to read the result 20 ₁ ... 20 _N , the arbiter of the results 9, after the reset signal 13 or the end of reading the previous result, waits for at least one request and selects the core with the highest priority of the request for reading. In the sequential reading of the results in the order of the numbering of kernels, the arbiter of results 9 after the reset signal 13 waits for a read request from the 1st core, and after reading the result from the next core, waits for a request from the kernel with the following sequence number. In the kernel selection mode from the control unit, the control processor analyzes read requests from the VL via bus 20 and issues the code of the selected kernel number and the signal to start reading on bus 14 to the result arbiter 9. Based on this signal, the result arbiter 9 starts reading the results from the corresponding kernel. In the other two modes, the start of reading is automatically determined by the arbiter of results 9.

При считывании результатов, арбитр результатов 9 выставляет на шине 38 номер ВЯ, с которого считываются результаты, а сигнал чтения на соответствующем выходе 37₁, …, 37_N. По сигналу чтения с выбранного ядра считывается результат по соответствующей шине 33₁, …, 33_N, сопровождаемый признаком достоверности. При считывании последнего слова в пакете результатов с выхода ВЯ формируется соответствующий импульс 34₁, …, 34_N, а если это последнее слово результата в задании, то формируется также соответствующий импульс 35₁, …, 35_N. Сигналы с каждого ВЯ 6₁, …, 6_N поступают на входы блока мультиплексоров результатов 8. Также на входы блока мультиплексоров 8 поступают с блоков памяти номеров заданий 7₁, …, 7_N номера заданий, которые выполняются ядрами 6₁, …, 6_N, и признаки «Пустой» соответствующих блоков памяти 7₁, …, 7_N, которые передаются на выходы блока мультиплексоров 8 в соответствии с номером, заданным на шине 38. Также в блоке мультиплексоров 8 формируется признак «Пустой» всех блоков памяти номеров заданий 21'. Сигнал окончания чтения пакета результатов 34 с выхода блока мультиплексоров 8 используется арбитром результатов 9 для окончания чтения очередного пакета результатов, блоком памяти заголовков результатов 10 этот сигнал используется для записи заголовка пакета результатов, а в блоке памяти данных результатов 11 этот сигнал сбрасывает счетчик, используемый для переформатирования слов данных результатов, если разрядность выходов ВЯ в несколько раз меньше разрядности слова потокового чтения интерфейса. Шина номера задания 36 и сигнал окончания задания 35 поступают с выхода блока мультиплексоров 8 только на вход памяти заголовков результатов 10. Шина данных результатов 33 поступает на вход памяти данных результатов 11, а признак достоверности результатов с выхода 33' поступает на арбитр результатов 9 для подсчета числа слов в пакете результатов и выдачи этого числа по шине 39 на вход памяти заголовков результатов 10. Признаки переполнений с выходов блоков памяти заголовков 41 и данных результатов 40 поступают на арбитр результатов 9 для приостановки процесса чтения результатов при наличии хотя бы одного из этих сигналов.When reading the results, the arbiter of the results 9 puts on the bus 38 the number of VL, from which the results are read, and the read signal on the corresponding output 37 ₁ , ..., 37 _N. The read signal from the selected core reads the result on the corresponding bus 33 ₁ , ..., 33 _N , followed by a sign of reliability. When reading the last word in the result package from the output of the VJ, the corresponding impulse 34 ₁ , ..., 34 _{N is formed} , and if this is the last word of the result in the task, the corresponding impulse 35 ₁ , ..., 35 _N is also formed. The signals from each VL 6 ₁ , ..., 6 _N are supplied to the inputs of the block of multiplexers of results 8. Also, the inputs of the block of multiplexers 8 are supplied from the memory blocks of job numbers 7 ₁ , ..., 7 _N job numbers that are executed by cores 6 ₁ , ..., 6 _N , and the “Empty” signs of the corresponding memory blocks 7 ₁ , ..., 7 _N , which are transmitted to the outputs of the multiplexer unit 8 in accordance with the number set on the bus 38. Also, in the multiplexer block 8, the “Empty” sign of all job number memory blocks is generated 21 '. The signal to finish reading the result package 34 from the output of the multiplexer unit 8 is used by the result arbiter 9 to finish reading the next result package, the result header memory block 10 uses this signal to write the result packet header, and in the result data memory block 11 this signal resets the counter used for reformatting the words of the data of the results if the digit capacity of the output of the ID is several times less than the word capacity of the stream reading interface. The bus of task number 36 and the signal of the end of task 35 are received from the output of the multiplexer unit 8 only to the input of the memory of the results headers 10. The data bus of the results 33 is fed to the input of the data of the results 11 data, and the sign of the reliability of the results from the output 33 'goes to the arbiter of the results 9 the number of words in the result package and issuing this number on the bus 39 to the input of the memory of the headers of the results 10. Signs of overflows from the outputs of the memory blocks of the headers 41 and the data of the results 40 are sent to the arbiter of the results 9 to suspend and the process of reading the results in the presence of at least one of these signals.

Блок памяти заголовков результатов 10 по сигналу окончания чтения пакета результатов 34 записывает заголовок пакета результатов, состоящий из номера задания 36, количества слов в пакете результатов 39 и признака окончания задания 35. Блок памяти 10 также формирует признак переполнения 41 и код количества пакетов результатов в выходной памяти 21. По сигналу чтения 23 блок памяти 10 выдает на шину потокового чтения 25 заголовки результатов.The memory block of the headers of the results 10 on the signal to finish reading the packet of results 34 writes the header of the packet of results, consisting of the number of the task 36, the number of words in the result package 39 and the sign of the end of the task 35. The memory block 10 also generates an overflow sign 41 and a code for the number of result packets in the output memory 21. According to the read signal 23, the memory unit 10 issues the result headers to the streaming read bus 25.

Блок памяти данных результатов 11 осуществляет переформатирование данных на входе или на выходе. Запись в блок памяти 11 происходит по сигналу достоверности данных на шине 33. Блок памяти 11 формирует признак переполнения 40. По сигналу чтения 22 блок памяти 11 выдает на шину потокового чтения 24 данные результатов.The results data memory block 11 reformatts the data at the input or output. Writing to the memory unit 11 takes place according to the data reliability signal on the bus 33. The memory unit 11 generates an overflow sign 40. By the read signal 22, the memory unit 11 outputs the result data to the stream reading bus 24.

Предлагаемый вычислительный модуль был применен при разработке вычислительной системы с ускорителями на основе ПЛИС XC7K410T фирмы Xilinx. При разработке ВС на ПЛИС, на базе предлагаемого ВМ, используют заранее разработанный проект параметризуемой вычислительной среды и разрабатывают только вычислительные ядра 6₁, …, 6_N с интерфейсом, регламентируемым предлагаемым ВМ. Далее в параметрах задается ширина шины данных заданий, шины результатов и шины параметров ВЯ, передаваемых параллельным кодом, число ВЯ 6₁, …, 6_N в модуле и величины частот синхронизации записи заданий, чтения результатов и обработки. RTL-описания ВЯ 6₁, …, 6_N интегрируют с проектом вычислительной среды, и в САПР для данной ПЛИС проводится компиляция проекта с получением кода для программирования ПЛИС. Число ВЯ 6₁, …, 6_N и величины частот синхронизации выбираются для достижения максимальной производительности СВУ.The proposed computing module was used in the development of a computer system with accelerators based on the Xilinx FPGA XC7K410T. In the development of sun on the FPGA based on the proposed BM used previously developed project parametrized computing environment and develop only computational core 6 _1, ... 6 _N interface, regulated proposed VM. Further, the parameters specify the width of the job data bus, the results bus and the bus of the VL parameters transmitted by the parallel code, the number of VL 6 ₁ , ..., 6 _N in the module and the frequency of synchronization of the recording of tasks, reading the results and processing. RTL-descriptions of VYa 6 ₁ , ..., 6 _{N are} integrated with the design of the computing environment, and in CAD for this FPGA, the project is compiled to obtain code for programming the FPGA. The number VL 6 ₁ , ..., 6 _N and the values of the synchronization frequencies are selected to achieve maximum performance of the VCA.

Проведенные оценки показали сокращение на 25% трудоемкости и времени разработки вычислительной системы по сравнению с разработкой без использования унифицированной вычислительной среды. При этом вычислительная среда в предлагаемом вычислительном модуле занимает не более 5% ресурсов ПЛИС, время, затрачиваемое на анализ состояния входной и выходной памяти в предлагаемом вычислительном модуле, сократилось в 3 раза, а также в 2 раза сократилось время «простоев» вычислительных ядер из-за задержек в их обслуживании, в результате чего увеличена производительность вычислительной системы.Estimates showed a 25% reduction in the complexity and time of developing a computing system compared to developing without using a unified computing environment. Moreover, the computing environment in the proposed computing module occupies no more than 5% of FPGA resources, the time spent analyzing the state of the input and output memory in the proposed computing module was reduced by 3 times, and the downtime of the computing cores was also halved for delays in their maintenance, resulting in increased computing system performance.

Таким образом, вышеизложенные сведения позволяют сделать вывод, что предлагаемый вычислительный модуль соответствует заявляемому техническому результату - уменьшение трудоемкости и сроков разработки и повышение производительности.Thus, the above information allows us to conclude that the proposed computing module corresponds to the claimed technical result - reducing the complexity and development time and increasing productivity.

Claims

The computing module contains a group of N computing cores 6 ₁ , ..., 6 _N , an interface unit 1, a task separation unit 2, a task header memory block 3, a task data memory block 4, a task arbiter 5, a group of N memory block numbers of computational core jobs 7 ₁ , ..., 7 _N , results multiplexers block 8, results arbiter 9, results headers memory block 10, results data memory block 11, external interface 12, with each computing core 6 ₁ , ..., 6 _N consisting of an input buffer memory 6-1, operating unit 6-2, output buffer memory 6-3 and control unit 6-4, and the external interface 12 is connected to the interface unit 1, the reset output 13 of which is connected to the corresponding reset inputs of the task separation unit 2, the memory of the titles of the titles of tasks 3, the memory block of the data of tasks 4, the arbiter of tasks 5 , blocks of computing cores 6 ₁ , ..., 6 _N , memory blocks of job numbers 7 ₁ , ..., 7 _N , arbiter of results 9, memory block of headers of results 10 and memory block of data of results 11, also with interface block 1 with code bus for reading results 14 connected to result arbiter 9, other common parameters 15 is connected to all the operating units 6-2 cores 6 _1, ... 6 _N, assignments recording mode code bus 16 is connected to the arbiter 5 jobs, job streaming write bus 17 is connected to the job separation unit 2, the amount of code assignments bus 18 is connected to the output of the block of memory of the headers of tasks 3, the bus of requests for writing tasks 19 and the bus of requests to read the results 20 is connected to the outputs of all the computational cores 6 ₁ , ..., 6 _N , the bus of the status code 21 is connected to the output of the block of memory of the block of the headers of results 10 and output common feature "Empty" 21 'of the block of multiplexers of results 8, a bus for streaming reading data of results 24 is connected to the output of the memory block of data for results 11, a bus for streaming reading of headers of results 25 is connected to the output of the block of memory of headers of results 10, and the output of the signal for reading data of results 22 is connected to the corresponding input of the data block of the data of the results of 11 and the output of the signal for reading the headers of the results 23 is connected to the corresponding input of the memory block of the headers of the results 10, the unit for separating tasks 2 bus write of the second job headers 42 is connected to the corresponding input of the job header memory block 3 and the bus of recorded job data 43 is connected to the corresponding input of the job data memory block 4, the job header memory block 3 is the bus of the read job headers 26 and the output of the “Empty” sign of the job memory is connected to the corresponding the inputs of the task arbiter 5, the output of which by the control bus for reading the titles of the tasks 27 is connected to the corresponding input of the memory block of the titles of the tasks 3, the arbiter of the tasks 5 by the write control bus 31 is connected to the second input buffer memory 6-1 corresponding cores 6 _1, ... 6 _N on lines 31 _1, ..., 31 _N recording jobs and is connected to memory units 7 job numbers _1, ..., _N 7 of the tire 31 _'1 , ..., 31 ' _N , in addition, the task arbiter 5 is connected via bus 32 ₁ , ..., 32 _N to the corresponding inputs of the memory blocks of task numbers 7 ₁ , ..., 7 _N via the bus of the recorded task numbers 32, the corresponding input of the task arbiter 5 is connected to bus for requests for recording tasks 19, including the outputs of requests for recording tasks from computing cores 19 ₁ , ..., 19 _N , as well as the arbiter of tasks 5, the control bus for reading the data of tasks 29 is connected to the corresponding input by the data block of tasks 4, the output of which is connected via the bus of the read data of tasks 30 to the corresponding inputs of the input buffer memories 6-1 of the corresponding computing cores 6 ₁ , ..., 6 _N , the corresponding outputs of the control units 6-3 computing cores 6 ₁ , ..., 6 _N with the result buses 33 ₁ , ..., 33 _{N are} connected to the corresponding inputs of the unit of multiplexers of results 8, the corresponding output of which is connected via the results bus of the selected computing core 33 with the corresponding input of the results data memory block 11, in addition, the corresponding outputs of the reading of the result packets 34 ₁ , ..., 34 _N of the control units 6-3 of the computational cores 6 ₁ , ..., 6 _{N are} connected to the corresponding inputs of the results multiplexer block 8, the corresponding output reading results closure package 34 which is connected to the corresponding inputs results arbiter 9, the storage unit 10 and results of header memory block 11, the results corresponding to outputs closure tasks 35 _1, ..., 35 _N blocks councils eniya 6-3 cores 6 _1, ... 6 _N are connected to respective inputs of the storage units 7 job numbers _1, ..., _N, and 7 are connected to respective inputs of multiplexers results 8 block corresponding output job closure 35 which is connected to the corresponding inputs of the storage unit headers of results 10, the corresponding outputs of the blocks of memory of the numbers of tasks 7 ₁ , ..., 7 _N with the buses of the read numbers of the tasks 36 ₁ , ..., 36 _{N are} connected to the corresponding inputs of the block of multiplexers of results 8, the corresponding output of which is via the read bus the current job number 36 is connected to the corresponding input of the block of memory of the headers of the results 10, the corresponding outputs of the arbiter of the results 9 are the outputs of the signals of reading the results 37 ₁ , ..., 37 _{N are} connected to the corresponding inputs of the blocks of the output buffer memory 6-3 of the corresponding computing cores 6 ₁ , ..., 6 _N , as well as the corresponding outputs of the arbiter of results 9 via bus 38 of the code, the numbers of the selected computing core are connected to the control inputs of the block of multiplexers of results 8, and through the bus of the number of words in the packet of results 39 s to the corresponding inputs of the memory block of the headers of the results 10, the output of the overflow sign 41 of which is connected to the corresponding input of the arbiter of the results 9, the corresponding input of which is connected to the output 40 of the sign of overflow of the memory of the results 11.