RU2282236C1

RU2282236C1 - Module for multi-processor system

Info

Publication number: RU2282236C1
Application number: RU2004136937/09A
Authority: RU
Inventors: Иль Израилевич Левин (RU); Илья Израилевич Левин; Лиди Ивановна Виневска (RU); Лидия Ивановна Виневская
Priority date: 2004-12-16
Filing date: 2004-12-16
Publication date: 2006-08-20
Also published as: RU2004136937A

Abstract

FIELD: computer engineering, namely, method for digital processing of signals and images, solving problems of mathematical physics, modeling of complicated technological systems, possible use in multi-processor computing systems with mass parallelism.

SUBSTANCE: module contains random-access memory, block for multi-controllers of distributed memory, matrix commutator, block of macro-processors.

EFFECT: creation of module of multi-processor computing system, real productiveness of which is close to peak for broad class of problems, including those for closely coupled problems.

1 dwg

Description

Предлагаемое изобретение относится к области вычислительной техники, а именно цифровой обработке сигналов и изображений, решению задач математической физики, моделированию сложных технических систем и природных явлений, и может найти применение в многопроцессорных вычислительных системах с массовым параллелизмом.The present invention relates to the field of computing, namely, digital processing of signals and images, solving mathematical physics problems, modeling complex technical systems and natural phenomena, and can find application in multiprocessor computing systems with mass parallelism.

Известно устройство: реконфигурируемый вычислитель XPuter (см. статью М.Herz, Т.Hoffmann, U.Nageldinger, С.Schreiber "Interfacing the MoM-PDA to an Internet-based Development System". University of Kaiserslautem. http://xputers.mformatik/unl-kl.de/papers/paper104.pdf и статью R.Hartenstein "Coarse Grain Reconfigurable Architectures" http://xputers.informatik/unl-kl.de/papers/paper110.pdfA device is known: a reconfigurable XPuter computer (see article M. Herz, T. Hoffmann, U. Nageldinger, C. Schreiber "Interfacing the MoM-PDA to an Internet-based Development System". University of Kaiserslautem. Http: // xputers. mformatik / unl-kl.de / papers / paper104.pdf and an article by R. Hartenstein "Coarse Grain Reconfigurable Architectures" http: //xputers.informatik/unl-kl.de/papers/paper110.pdf

XPuter состоит из двумерной распределенной памяти, представляющей собой множество блоков, сопряженной с общим синтезатором потоком данных, и реконфигурируемого АЛУ. Реконфигурируемое АЛУ представляет собой множество АЛУ, каждое из которых содержит коммутатор, позволяющий переключать восемь каналов, попарно соединяющих четыре смежных АЛУ.XPuter consists of a two-dimensional distributed memory, which consists of many blocks, coupled with a common synthesizer data stream, and reconfigurable ALU. A reconfigurable ALU is a set of ALUs, each of which contains a switch that allows you to switch eight channels pairwise connecting four adjacent ALUs.

Недостатком данного устройства является то, что работает оно эффективно (с производительностью, близкой к пиковой) только на систолических, либо сводимых к ним классах задач.The disadvantage of this device is that it works efficiently (with performance close to peak) only on systolic or reducible classes of problems.

Причинами недостатка является то, что, во-первых, доступ к памяти данных осуществляется только по шаблонам выборки операндов синтезатором потока данных для конвейеризированнных структур данных, что не позволяет реализовать произвольные процедуры доступа для разного класса задач, во-вторых, выборки из разных блоков памяти осуществляются по общей процедуре, поскольку существует одна, единственная, программа синтезатора потока данных (такая организация характерна для SIMD-машин), в-третьих, коммутатор реконфигурируемого АЛУ является неполно доступным.The reasons for the drawback are that, firstly, access to the data memory is carried out only according to the operand sampling patterns by the data stream synthesizer for pipelined data structures, which does not allow to implement arbitrary access procedures for a different class of tasks, and secondly, samples from different memory blocks are carried out according to the general procedure, since there is one, only, data stream synthesizer program (such an organization is typical for SIMD machines), and thirdly, the reconfigurable ALU switch is incomplete about affordable.

Известен ряд вычислительных систем с распределенной памятью, содержащих процессоры, объединенные некоторой коммутационной средой. Среди них можно указать Intel Paragon, IBM SP1/SP2, Cray T3D/T3E и др.There are a number of distributed memory computing systems containing processors integrated by some switching environment. Among them, you can specify Intel Paragon, IBM SP1 / SP2, Cray T3D / T3E, etc.

Различие между ними состоит в используемых процессорах и организации коммутационной среды. В качестве аналога рассмотрим компьютер Cray T3D, являющийся суперкомпьютером с массовым параллелизмом и с распределенной памятью. В его состав входят два основных компонента: вычислительные узлы и коммутационная сеть. Каждый вычислительный узел компьютера Cray T3D содержит по два независимых процессорных элемента (ПЭ) и сетевой интерфейс. ПЭ, в свою очередь, содержит процессор и локальную память. Локальная память каждого ПЭ является частью физически распределенной, но логически разделяемой общей памяти всего компьютера. Сетевой интерфейс узла связан с соответствующим сетевым маршрутизатором, который является частью коммутационной сети. Коммутационная сеть компьютера Cray T3D образует трехмерную решетку, соединяя сетевые маршрутизаторы узлов в трех пространственных направлениях. Таким образом, каждый узел имеет шесть непосредственных соседей. Связь между двумя смежными узлами реализована с помощью двух однонаправленных каналов передачи данных, что допускает одновременный обмен данными в противоположных направлениях.The difference between them is the processors used and the organization of the switching environment. As an analog, consider a Cray T3D computer, which is a supercomputer with mass parallelism and distributed memory. It consists of two main components: computing nodes and a switching network. Each computing node of a Cray T3D computer contains two independent processor elements (PEs) and a network interface. PE, in turn, contains a processor and local memory. The local memory of each PE is part of the physically distributed, but logically shared, shared memory of the entire computer. The network interface of the node is connected to the corresponding network router, which is part of the switching network. The Cray T3D computer switching network forms a three-dimensional lattice, connecting network routers of nodes in three spatial directions. Thus, each node has six immediate neighbors. Communication between two adjacent nodes is realized using two unidirectional data transmission channels, which allows simultaneous data exchange in opposite directions.

Недостатком этого устройства является низкая реальная производительность при решении задач, требующих большого числа междупроцессорных пересылок.The disadvantage of this device is the low real performance in solving problems requiring a large number of interprocessor transfers.

Причины недостатка: во-первых, коммуникационная сеть является неполно доступной, во-вторых, доступ к памяти данных других процессорных элементов осуществляется дольше и более сложным способом, чем к собственной локальной памяти, за счет того, что обращение осуществляется медленной коммутационной системой, в результате время ожидания данных процессорами возрастает, что приводит к рассинхронизации параллельных вычислений процесса, что, в свою очередь, приводит к снижению реальной производительности системы.The reasons for the disadvantage: firstly, the communication network is not fully accessible, and secondly, access to the data memory of other processor elements is carried out longer and more complex way than to its own local memory, due to the fact that the access is carried out by a slow switching system, as a result processor data latency increases, which leads to a desynchronization of parallel process calculations, which, in turn, leads to a decrease in the real system performance.

Наиболее близким к предлагаемому изобретению является параллельный процессор Alice (см. "Вычислительные процессоры и системы." / Под редакцией Г.Н.Марчука. Выпуск 7. - М.: Наука. Главная редакция физико-математической литературы, 1990. - С.64-66), содержащий 16 процессорных элементов и 26 элементов памяти для хранения пакетов информации. Каждый процессорный элемент состоит из пяти транспьютеров (процессоров Т414 фирмы Inmos), два из которых осуществляют интерфейс с коммутационной сетью, принимают и передают пакеты, два других производят обработку пакетов, а пятый используется для хранения и выборки программ, соответствующих функциям обрабатываемых пакетов. Каждый элемент памяти, в свою очередь, состоит из двух транспьютеров и динамической памяти. Один транспьютер осуществляет прием посылок от процессорных элементов, модификацию пакетов в памяти и выборку из нее активных пакетов, а другой передает пакеты в коммутационную сеть.Closest to the proposed invention is a parallel processor Alice (see. "Computing processors and systems." / Edited by G. N. Marchuk. Issue 7. - M.: Science. Main edition of the physics and mathematics literature, 1990. - P.64 -66), containing 16 processor elements and 26 memory elements for storing information packets. Each processor element consists of five transporters (Inmos T414 processors), two of which interface with the switching network, receive and transmit packets, the other two process packets, and the fifth is used to store and select programs that correspond to the functions of the processed packets. Each memory element, in turn, consists of two transputer and dynamic memory. One transporter receives packets from processor elements, modifies packets in memory and selects active packets from it, and the other transfers packets to the switching network.

Недостатком этого устройства является низкое быстродействие на определенном классе задач (при реализации сильносвязанных задач, когда число обменов между процессорными элементами сопоставимо с числом операций, выполняемых в процессорных элементах, реальная производительность параллельного процессора Alice будет существенно ниже пиковой). Кроме того, требуется большой объем оборудования для реализации конвейерного функционирования параллельного процессора.The disadvantage of this device is the low performance on a certain class of tasks (when implementing tightly coupled tasks, when the number of exchanges between processor elements is comparable to the number of operations performed on processor elements, the actual performance of the Alice parallel processor will be significantly lower than peak). In addition, a large amount of equipment is required to realize the conveyor operation of a parallel processor.

Причина недостатков состоит в том, что обработка информации в транспьютерах выполняется в параллельном коде, а обмен информацией идет в последовательном коде. Транспьютеры будут находиться в состоянии ожидания данных от других транспьютеров, и чем больше будет степень связанности задачи (отношение пересылок к числу операций), тем ниже реальная производительность.The reason for the disadvantages is that the processing of information in transporters is performed in parallel code, and the exchange of information is in serial code. Transporters will be in a state of waiting for data from other transporters, and the greater the degree of connectivity of the task (the ratio of transfers to the number of operations), the lower the real productivity.

Более того, понижению быстодействия еще способствует тот факт, что в параллельном процессоре Alice интерфейсы коммутационной сети и памяти реализованы не аппаратно, а программно.Moreover, the fact that in the parallel Alice processor the switching network and memory interfaces are implemented not in hardware but in software also contributes to a decrease in performance.

Причина недостатка требуемого большого объема оборудования для организации конвейерной обработки информации состоит в том, что в параллельном процессоре Alice используются стандартные процессорные элементы. Так, в процессорном элементе используется пять транспьютеров и только два из них используются непосредственно для обработки информации.The reason for the lack of the required large amount of equipment for organizing pipelined information processing is that standard Alice processor elements are used in the parallel Alice processor. So, in a processor element five transputer are used and only two of them are used directly for information processing.

Задача, на решение которой направлено заявленное изобретение, заключается в создании модуля многопроцессорной вычислительной системы, реальная производительность которого близка к пиковой на широком классе задач, в том числе и для сильносвязанных задач.The problem to which the claimed invention is directed is to create a multiprocessor computing system module, the real performance of which is close to peak in a wide class of problems, including for highly coupled tasks.

Технический результат, достигаемый при осуществлении изобретения, состоит в том, что обработка информации и обмен информацией осуществляется согласованно (для одинакового числа разрядов и одинаковой тактовой частоты). Кроме того, за счет аппаратной синхронизации операндов упрощается программирование. Вместо стандартных процессорных элементов введены специальные устройства, обеспечивающие высокоскоростной доступ к памяти и коммутационной сети.The technical result achieved by the implementation of the invention is that information processing and information exchange is carried out in a coordinated manner (for the same number of bits and the same clock frequency). In addition, due to the hardware synchronization of the operands, programming is simplified. Instead of standard processor elements, special devices have been introduced that provide high-speed access to memory and switching networks.

Для достижения указанного технического результата в устройство вместо процессорных элементов введены макропроцессоры (МАП), вместо элементов памяти введены мультиконтроллеры распределенной памяти (МКРП), соединенные с оперативной памятью (ОП), представляющей собой блок стандартных ОЗУ, в качестве коммутационной сети используется матричный коммутатор, причем информационные входы устройства соединены с двунаправленными входами/выходами оперативной памяти и двунаправленными входами/выходами блока мультиконтроллеров распределенной памяти, управляющие входы которых соединены с входом управляющего сигнала устройства, первые информационные входы матричного коммутатора соединены соответственно с первыми выходами блока макропроцессоров, первые информационные входы которых соединены соответственно с первыми выходами матричного коммутатора, вторые выходы которого соединены с информационными входами блока мультиконтроллеров распределенной памяти, информационные выходы которых соединены с вторыми информационными входами и третьими входами матричного коммутатора и вторыми входами блока макропроцессоров, управляющие выходы блока мультиконтроллеров распределенной памяти соединены с управляющими входами оперативной памяти, вторые выходы блока макропроцессоров соединены с выходами устройства.To achieve the specified technical result, instead of processor elements, macro-processors (MAPs) are introduced into the device, instead of memory elements, distributed-memory multicontrollers (MCUs) are introduced, connected to random access memory (OP), which is a block of standard RAM, using a matrix switch as a switching network, moreover information inputs of the device are connected with bidirectional inputs / outputs of random access memory and bidirectional inputs / outputs of a distributed multicontroller unit memory, the control inputs of which are connected to the control signal input of the device, the first information inputs of the matrix switch are connected respectively to the first outputs of the macroprocessor unit, the first information inputs of which are connected respectively to the first outputs of the matrix switch, the second outputs of which are connected to the information inputs of the block of multicontrollers of distributed memory, information the outputs of which are connected to the second information inputs and the third inputs of the matrix switch and the second inputs of the block of macroprocessors, the control outputs of the block of multicontrollers of distributed memory are connected to the control inputs of the RAM, the second outputs of the block of macroprocessors are connected to the outputs of the device.

Причинно-следственная связь между совокупностью существенных признаков заявленного изобретения и достигаемым техническим результатом заключается в следующем: введение в устройство макропроцессоров, структурно выполняющих крупные операции, мультиконтроллеров распределенной памяти, обеспечивающих скоростной обмен информацией между оперативной памятью и макропроцессорами и параллельно-конвейерную обработку информации, матричного коммутатора, обеспечивающего прямые пространственные соединения между всеми компонентами системы, позволяет повысить быстродействие вычислений при решении сильносвязанных задач за счет структурной (аппаратной) реализации крупных математических операций и упростить программирование задач различных проблемных областей, согласованная обработка и передача информации позволяет уменьшить общее время решения сильносвязанных задач.A causal relationship between the totality of the essential features of the claimed invention and the achieved technical result is as follows: the introduction into the device of macro-processors structurally performing large operations, distributed memory multicontrollers, providing high-speed exchange of information between main memory and macro-processors and parallel-pipelined information processing, matrix switch providing direct spatial connections between all components of the system, n It allows to increase the speed of calculations when solving strongly coupled problems due to the structural (hardware) implementation of large mathematical operations and to simplify the programming of tasks of various problem areas, the coordinated processing and transmission of information can reduce the total time for solving strongly coupled problems.

Изобретение поясняется чертежом, на котором представлена блок-схема устройства модуля многопроцессорной системы.The invention is illustrated in the drawing, which shows a block diagram of a device module of a multiprocessor system.

Устройство содержит информационные входы D₁-D_m - входы поступления информации, оперативную память 1, блок 2 мультиконтроллеров распределенной памяти (2₁-2_m), вход 3 устройства - вход поступления управляющего сигнала PUSK, матричный коммутатор 4, блок 5 макропроцессоров (5₁-5_m), выходы 6 устройства - выходы отказа OTK1-OTKm блока макропроцессоров.The device contains information inputs D ₁ -D _m - inputs of information, RAM 1, block 2 of multicontrollers of distributed memory (2 ₁ -2 _m ), input 3 of the device - input of control signal PUSK, matrix switch 4, block 5 of macro-processors (5 ₁ -5 _m ), outputs 6 of the device are the outputs of the failure OTK1-OTKm of the block of macroprocessors.

Информационные входы D₁-D_m устройства соединены с двунаправленными входами/выходами оперативной памяти и двунаправленными входами/выходами D блока 2 мультиконтроллеров распределенной памяти, управляющие входы PUSK которых соединены с входом 3 управляющего сигнала PUSK устройства, первые информационные входы D₁-D_m матричного коммутатора 4 соединены соответственно с первыми выходами Z блока 5 макропроцессоров, первые информационные входы Х которых соединены соответственно с первыми выходами Q₁-Q_m матричного коммутатора 4, вторые выходы Q_m+1-Q_2m которого соединены с информационными входами Х блока 2 мультиконтроллеров распределенной памяти, информационные выходы Z которых соединены с вторыми информационными входами D_m+1-D_2m и третьими входами KOM1-KOMm матричного коммутатора 4 и вторыми входами КОМ блока 5 макропроцессоров, управляющие выходы Y₁-Y_m блока 2 мультиконтроллеров распределенной памяти соединены с управляющими входами оперативной памяти 1, вторые выходы OTK1-OTKm блока 5 макропроцессоров соединены с выходами 6 устройства.Information inputs D ₁ -D _{m of the} device are connected to bidirectional inputs / outputs of RAM and bidirectional inputs / outputs D of block 2 of distributed memory multicontrollers, whose control inputs PUSK are connected to input 3 of the control signal PUSK of the device, the first information inputs D ₁ -D _{m of the} matrix the switch 4 are connected respectively to the first outputs Z of the unit 5 of the macroprocessors, the first information inputs X of which are connected respectively to the first outputs Q ₁ -Q _{m of the} matrix switch 4, the second outputs Q _{m + 1} -Q _{2 m of} which is connected to information inputs X of block 2 of multicontrollers of distributed memory, information outputs Z of which are connected to second information inputs D _{m + 1} -D _2m and third inputs KOM1-KOMm of matrix switch 4 and second inputs KOM of block 5 of microprocessors, control outputs Y ₁ -Y _m block 2 of the multicontrollers of distributed memory are connected to the control inputs of the RAM 1, the second outputs OTK1-OTKm of the block 5 of the macroprocessors are connected to the outputs 6 of the device.

Модуль многопроцессорной системы включает в себя ОП, содержащую стандартные ОЗУ, m МАП, работающих в формате с плавающей запятой, m МКРП, обеспечивающих передачу данных из/в оперативную память, где m=2-1024, и матричный коммутатор, обеспечивающий соединение по полному графу m макропроцессоров и m мультиконтроллеров распределенной памяти. Соответственно имеется m внешних информационных каналов D₁-D_m, m выходов отказа ОТК макропроцессоров, вход управляющего сигнала PUSK.The multiprocessor system module includes an OP containing standard RAM, m MAPs operating in a floating point format, m MCRPs, which transfer data from / to random access memory, where m = 2-1024, and a matrix switch that provides a full graph connection m macro processors and m distributed memory multicontrollers. Accordingly, there are m external information channels D ₁ -D _m , m OTC failure outputs of the macroprocessors, the input of the control signal PUSK.

Параллельная программа для модуля многопроцессорной системы представляет собой последовательность кадров. Каждый кадр является программно-неделимой конструкцией и представляет собой совокупность команд макропроцессоров, коммутационных структур, реализованных в матричном коммутаторе, и процедур обращения (чтения, записи) к блокам распределенной памяти, реализованных в МКРП.A parallel program for a multiprocessor system module is a sequence of frames. Each frame is a software-indivisible design and represents a set of macroprocessor commands, switching structures implemented in a matrix switch, and procedures for accessing (reading, writing) to distributed memory blocks implemented in the MCI.

Все устройства системы настраиваются на выполнение нового кадра только тогда, когда все устройства системы завершили процедуры, относящиеся к предыдущему кадру. Выполнение кадров состоит из двух этапов. На первом этапе осуществляется настройка компонентов системы на реализацию кадров, на втором - происходит параллельно-конвейерная обработка информационных потоков в соответствии с настроенной вычислительной структурой.All devices of the system are configured to execute a new frame only when all devices of the system have completed the procedures related to the previous frame. The execution of personnel consists of two stages. At the first stage, the system components are tuned for the implementation of personnel; at the second stage, parallel processing of information flows occurs in accordance with the configured computing structure.

Предварительно в память программ, расположенную в МКРП, и в ОП по внешним входам D₁-D_m помещается соответственно программная и числовая информация. Работа модуля начинается с поступления внешнего сигнала PUSK, по которому осуществляется инициализация процедуры обращения в каждом МКРП и начинает выполняться начальный оператор программы. Мультиконтроллер выполняет следующие операторы: операторы циклического обращения (чтения, записи), операторы пересчета (модификации) параметров и операторы управления. С помощью оператора чтения команды производится запись (настройка) программной информации макропроцессоров. При этом мультиконтроллер настраивается на выдачу команд по выходам Z. В результате по m каналам с выходов Z МКРП разрядностью К обеспечивается последовательная запись команд в макропроцессоры и матричный коммутатор. В формате информации с выходов Z МКРП содержится два старших служебных разряда, кодировка которых определяет назначение поступающей информации, а именно: является ли данная информация командой с разрядностью К или - данными с разрядностью Р. Команда для МАП содержит код операции, выполняемой в МАП, а для матричного коммутатора - адреса операндов, поступающих на входы макропроцессоров.Preliminarily, program and numerical information, respectively, are stored in the program memory located in the MCRP and in the OD via external inputs D ₁ -D _m The operation of the module begins with the receipt of an external PUSK signal, through which the circulation procedure is initialized in each MCRP and the initial program operator begins to execute. A multicontroller performs the following operators: cyclic access (read, write) operators, conversion (modification) operators of parameters, and control operators. With the help of the command read operator, recording (tuning) of the program information of the macroprocessors is carried out. In this case, the multicontroller is configured to issue commands on outputs Z. As a result, on m channels from the outputs Z of the MCRP, the bit capacity K provides sequential recording of commands to macro-processors and a matrix switch. The format of the information from the outputs of the ICRP Z contains two senior service bits, the coding of which determines the purpose of the incoming information, namely: whether this information is a command with a bit capacity K or data with a bit capacity R. The command for the MAP contains the code of the operation performed in the MAP, and for a matrix switch, the addresses of operands arriving at the inputs of the macroprocessors.

После загрузки команд начинается непосредственно работа модуля. МКРП обеспечивает поток данных на входы МАП. Данными для МАП может быть либо информация, считанная из оперативной памяти по двунаправленным информационным каналам D через матричный коммутатор в соответствии с заданной адресацией каждого МАП, либо информация, поступающая с выхода любого МАП на матричный коммутатор. Кроме того, результаты всех МАП через матричный коммутатор могут поступать на входы Х МКРП и по двунаправленным каналам D в оперативную память. Сигналы управления Y₁-Y_mмультиконтроллера управляют чтением/записью информации из/в ОП.After loading the commands, the module starts directly. MKRP provides data flow to the inputs of the MAP. The data for the MAP can be either information read from random access memory via the bidirectional information channels D through the matrix switch in accordance with the specified addressing of each MAP, or information coming from the output of any MAP to the matrix switch. In addition, the results of all the MAP through the matrix switch can be fed to the inputs X of the MCRP and via bi-directional channels D to the RAM. The control signals Y ₁ -Y _m multicontroller control reading / writing information from / to the OP.

Макропроцессоры в соответствии с поступившими кодами команд настраиваются на структурную реализацию крупных функционально-законченных операций. Структура МАП позволяет реализовать в рамках своей универсальной вычислительной системы различные вычислительные структуры, эффективно реализующие крупные математические операции различных проблемных областей, таких как экспонента, синус, дивергенция, интеграл, фильтр, комплексное умножение, быстрое преобразование Фурье и др.Macroprocessors in accordance with the received command codes are configured for the structural implementation of large functionally completed operations. The MAP structure allows one to implement various computational structures within its universal computing system that efficiently implement large mathematical operations of various problem areas, such as exponent, sine, divergence, integral, filter, complex multiplication, fast Fourier transform, etc.

В случае переполнения разрядной сетки в МАП формируется сигнал отказа ОТК, поступающий на выход модуля.In the event of overflow of the discharge grid in the MAP, an OTC failure signal is generated, which is output to the module output.

Блоки заявленного устройства могут быть реализованы средствами вычислительной техники отечественного производства. Так, в модуле используется макропроцессор, описанный в патенте №2210808 на изобретение от 05.01.2001, опубликованный в бюллетене №23 от 20.08.2003 г., мультиконтроллер распределенной памяти, описанный в патенте №220804 от 08.08.2001, опубликованный в бюллетене №23 от 20.08.2003 г., матричный коммутатор, описанный в патенте №2059288 от 05.08.1994 г., опубликованный в бюллетене №12 от 27.04.96 г.Blocks of the claimed device can be implemented by means of computer technology of domestic production. So, the module uses the macroprocessor described in patent No. 2210808 for the invention dated January 5, 2001, published in bulletin No. 23 of 08/20/2003, the multicontroller of distributed memory described in patent No. 220804 of 08/08/2001, published in bulletin No. 23 dated 08/20/2003, the matrix switch described in patent No. 2059288 dated 08/05/1994, published in the bulletin No. 12 dated 04/27/96.

Введение в устройство макропроцессоров, мультиконтроллеров распределенной памяти, матричного коммутатора, соединенных соответствующим образом, позволяет, во-первых, повысить быстродействие вычислений при решении сильносвязанных задач за счет структурной (аппаратной) реализации крупных математических операций и упростить программирование задач различных проблемных областей.The introduction of macroprocessors, distributed memory multicontrollers, matrix switchers into the device, connected in an appropriate way, allows, firstly, to increase the speed of calculations when solving tightly coupled problems due to the structural (hardware) implementation of large mathematical operations and to simplify the programming of tasks of various problem areas.

Во-вторых, согласованная обработка и передача информации позволяет уменьшить общее время решения сильносвязанных задач.Secondly, the coordinated processing and transmission of information allows to reduce the total time for solving tightly coupled tasks.

В-третьих, позволяет обеспечить равномерный скоростной доступ любого процессора к любому элементу памяти и, как следствие, увеличить скорость обработки информационных структур.Thirdly, it allows ensuring uniform high-speed access of any processor to any memory element and, as a result, increasing the processing speed of information structures.

В-четвертых, система команд мультиконтроллера распределенной памяти в сочетании с полнодоступным коммутатором обеспечивает создание эффективных производительных процедур доступа к распределенной памяти, что позволяет сократить время решения задач.Fourth, the command system of a multicontroller of distributed memory in combination with a fully accessible switch provides the creation of effective productive procedures for accessing distributed memory, which reduces the time it takes to solve problems.

Claims

A multiprocessor system module designed to build multiprocessor systems, characterized in that it contains a group of macroprocessors that perform large mathematical operations, a group of multicontrollers of distributed memory that provide high-speed information exchange between RAM and macroprocessors and parallel-pipelined information processing, a matrix switch that provides direct spatial connections between all components of the system, and the information inputs of the device are connected are bi-directional inputs / outputs of random access memory and bidirectional inputs / outputs of a block of multicontrollers of distributed memory, the control inputs of which are connected to the input of the control signal of the device, the first information inputs of the matrix switcher are connected respectively to the first outputs of the block of macroprocessors, the first information inputs of which are connected respectively to the first outputs matrix switch, the second outputs of which are connected to the information inputs of the multicontroller unit p spredelennoy memory data outputs are connected to the second data inputs and address and control inputs of the matrix switch and the second inputs of the macro processor unit, the control outputs of unit multicontroller shared memory connected to the control inputs of the RAM macro processor outputs the second unit are connected to the device outputs.