RU72339U1

RU72339U1 - MULTI-PROCESSOR COMPUTER SYSTEM MODULE (OPTIONS)

Info

Publication number: RU72339U1
Application number: RU2007148590U
Authority: RU
Inventors: Николай Николаевич Храбров; Павел Михайлович Коновальчик; Александр Дмитриевич Малеванчук
Original assignee: Общество с ограниченной ответственностью "НТЦ "Фактор"
Priority date: 2007-12-27
Filing date: 2007-12-27
Publication date: 2008-04-10
Also published as: RU72339U8

Abstract

Полезная модель относится к области вычислительной техники, а именно к решению трудоемких вычислительных задач, например, задач математической физики, линейной алгебры, моделированию сложных систем и может найти применение в многопроцессорных вычислительных системах с распределенной памятью (массовым параллелизмом). Модуль многопроцессорной вычислительной системы, предназначенный для выполнения параллельно-конвейерных вычислений, содержит блок макропроцессоров на базе программируемых логических интегральных схем, контроллеры распределенной памяти, контроллер обращений, реализующий интерфейс с управляющей машиной вычислительного устройства, для каждого макропроцессора введен блок распределенной статической памяти, при этом контроллер обращений соединен с информационными входами блока макропроцессоров по 40 двунаправленным линиям и, по меньшей мере, с одним блоком распределенной статической памяти, причем макропроцессоры соединены в кольцо по 320 двунаправленным линиям, реализуя возможность независимого информационного обмена для каждой пары последовательно размещенных в кольце макропроцессоров, а каждый макропроцессор связан с соответствующим блоком распределенной статической памяти по 65 линиям, при этом внутри пар 2-7 и 3-6 макропроцессоров реализована связь по 40 радиальным двунаправленным линиям. Полезная модель дает возможность повысить производительность вычислительной системы путем реализации топологии связей между виртуальными устройствами обработки типа "кольцо", возможности независимой передачи/приема сигналов по полному графу (каждый с каждым), передачи между виртуальными устройствами, размещенными в соседних ПЛИС модуля, за 1 такт строк длиной до 320 бит.The utility model relates to the field of computer engineering, namely to solving labor-intensive computing problems, for example, problems of mathematical physics, linear algebra, modeling of complex systems and can be used in multiprocessor computing systems with distributed memory (mass parallelism). A multiprocessor computing system module designed for parallel-pipelined computing contains a block of macroprocessors based on programmable logic integrated circuits, distributed memory controllers, a call controller that implements an interface with a control device of a computing device, a distributed static memory block is introduced for each macroprocessor, and the controller calls is connected to the information inputs of the block of macroprocessors on 40 bidirectional lines and, at least at least with one block of distributed static memory, and the macro-processors are connected in a ring by 320 bidirectional lines, realizing the possibility of independent information exchange for each pair of macroprocessors sequentially placed in the ring, and each macro-processor is connected to the corresponding distributed static memory block along 65 lines, while inside pairs 2-7 and 3-6 of the macroprocessors, communication is realized along 40 radial bi-directional lines. The utility model makes it possible to increase the performance of a computing system by implementing a topology of connections between virtual processing devices of the ring type, the possibility of independent transmission / reception of signals over a full graph (each with each), and transfers between virtual devices located in adjacent FPGAs of a module in 1 clock cycle lines up to 320 bits long.

Description

Полезная модель относится к области вычислительной техники, а именно к решению трудоемких вычислительных задач, например, задач математической физики, линейной алгебры, моделированию сложных систем и может найти применение в многопроцессорных вычислительных системах с распределенной памятью (массовым параллелизмом).The utility model relates to the field of computer engineering, namely to solving labor-intensive computing problems, for example, problems of mathematical physics, linear algebra, modeling complex systems and can be used in multiprocessor computing systems with distributed memory (mass parallelism).

Известен ряд вычислительных систем с распределенной памятью, содержащих процессоры, объединенные некоторой коммуникационной средой. Наиболее известными среди них являются: Intel Paragon, IBM SP1/SP2, Cray T3D и многие другие, включая отечественные кластерные и массово-параллельные установки, например, МВС-1000М, СКИФ Syberia и т.д.There are a number of distributed memory computing systems containing processors integrated by some communication medium. The most famous among them are: Intel Paragon, IBM SP1 / SP2, Cray T3D and many others, including domestic cluster and mass-parallel installations, for example, MVS-1000M, SKIF Syberia, etc.

Указанные системы, в основном, отличаются друг от друга типом процессорных элементов и применяемой коммуникационной средой. Каждый процессорный элемент (ПЭ) содержит процессор или несколько процессоров, локальную память и интерфейс для связи с другими процессорными элементами. Локальная память каждого ПЭ является частью физически распределенной, но логически разделяемой памяти всего компьютера. Обработка данных осуществляется процессорами - устройствами с программным управлением и жестко заданной логикой.These systems mainly differ from each other by the type of processor elements and the communication medium used. Each processor element (PE) contains a processor or several processors, a local memory and an interface for communication with other processor elements. The local memory of each PE is part of the physically distributed, but logically shared memory of the entire computer. Data processing is carried out by processors - devices with programmed control and hard-set logic.

Главным недостатком таких вычислительных систем является низкая производительность при решении реальных задач.The main disadvantage of such computing systems is low productivity in solving real problems.

Причины указанного недостатка заложены, во-первых, в жесткой архитектуре, в силу чего имеющиеся ресурсы не могут быть использованы полностью - значительная часть оборудования процессора простаивает, ожидая или данных, не пришедших из памяти, или команд, для выполнения которых предназначена эта часть оборудования. Второй причиной недостатка является большой разрыв между производительностью с одной стороны, собственно процессора, с другой стороны, оперативной памяти и коммуникационной средой.The reasons for this drawback are, firstly, in a rigid architecture, due to which the available resources cannot be used completely - a significant part of the processor equipment is idle, waiting for either data not coming from memory or the commands for which this part of the equipment is intended. The second reason for the disadvantage is the large gap between the performance on the one hand, of the processor itself, on the other hand, of RAM and the communication environment.

Известен ряд модулей, построенных на базе программируемой логики, представляющих собой аналогово-цифровые приемники и фильтры с небольшим суммарным вентильным ресурсом и богатым интерфейсным набором. Наибольшее распространение из них получили модули: XDSP-xMC, АЦП-хх, ПЛИСАР-ххх, DSP35X3-хххх, специализированная линейка модулей отечественного производства М-5хх и другие.A number of modules based on programmable logic are known, which are analog-digital receivers and filters with a small total gate resource and a rich interface set. The most widely used of them were modules: XDSP-xMC, ADC-xx, PLISAR-xxx, DSP35X3-xxxx, a specialized line of modules of domestic production M-5xx and others.

Главным недостатком таких модулей является малый суммарный вентильный ресурс. Зачастую, подобные изделия не предусматривают функцию изменения конфигурации программируемых логических интегральных схем (ПЛИС) для выполнения иной задачи.The main disadvantage of such modules is the small total valve resource. Often, such products do not provide the function of changing the configuration of programmable logic integrated circuits (FPGA) to perform a different task.

Причина указанного недостатка заложена на этапе проектирования изделий в соответствии с предполагаемыми функциональными нагрузками. Имеющиеся вычислительные мощности в полной мере удовлетворяют заявленным алгоритмам цифровой обработки по имеющемуся логическому ресурсу, а разработанные конвейерные реализации тех же алгоритмов - по скорости обработки всего объема входного (выходного) графика.The reason for this drawback is laid at the stage of product design in accordance with the expected functional loads. The available computing power fully satisfies the declared digital processing algorithms according to the available logical resource, and the developed pipeline implementations of the same algorithms - according to the processing speed of the entire volume of the input (output) schedule.

Близким аналогом предлагаемой полезной модели по функциональному назначению является линейка вычислительных модулей RSP-504, RSP-506, RSP-512 и RSP-517 производства отечественной фирмы ООО НПО «Роста». Эти модули представляют собой мезонинные модули формата РМС и предназначены для разработки быстродействующих устройств цифровой обработки данных с использованием технологии ПЛИС. Каждый модуль несет на себе по два кристалла с перестраиваемой архитектурой передовых серий Virtex II и Virtex 4.A close analogue of the proposed utility model for functional purpose is the line of computing modules RSP-504, RSP-506, RSP-512 and RSP-517 produced by the domestic company OOO NPO Rosta. These modules are PMC mezzanine modules and are designed to develop high-speed digital data processing devices using FPGA technology. Each module carries on itself two crystals with a tunable architecture of the advanced Virtex II and Virtex 4 series.

Главным недостатком модулей серии RSP является ограниченный объем общего вентильного поля, в ряде случаев, не позволяющий эффективно поместить весь алгоритм в модуль.The main disadvantage of the RSP series modules is the limited amount of the common gate field, in some cases, which does not allow to efficiently place the entire algorithm in the module.

Причиной указанного недостатка является наличие лишь двух кристаллов ПЛИС на одном модуле, связанных между собой внутренней шиной от 132 до 155 разрядов. Указанные модули содержат большое количество разнородных внешних интерфейсов: РМС, LVDS, Local Bus, Expansion Bus, Side Bus, Control Bus. Учитывая обязательное наличие всей служебной периферии (коммутационных и энергетических мостов, запоминающих устройств, генераторов и т.п.), The reason for this drawback is the presence of only two FPGA crystals on one module, interconnected by an internal bus from 132 to 155 discharges. These modules contain a large number of heterogeneous external interfaces: PMC, LVDS, Local Bus, Expansion Bus, Side Bus, Control Bus. Given the mandatory presence of the entire service periphery (switching and energy bridges, storage devices, generators, etc.),

формируется ограничение по ширине мгновенной линии между пользовательскими ПЛИС до 155 линий.a limit is formed on the width of the instant line between user FPGAs to 155 lines.

В качестве ближайшего аналога, т.е. прототипа, можно привести изобретение «Модуль многопроцессорной системы» (патент RU №2282236, опубл. 2006.08.20), содержащий оперативную память, блок мультиконтроллеров распределенной памяти, матричный коммутатор, блок макропроцессоров. Данный вычислительный модуль с перестраиваемой архитектурой, разработанный НИИ МВС ТРТУ, так же как и предлагаемая полезная модель, имеют по несколько кристаллов ПЛИС и функционально предназначен для решения вычислительно трудоемких фрагментов задач.As the closest analogue, i.e. of the prototype, one can cite the invention “Multiprocessor system module” (patent RU No. 2282236, publ. 2006.08.20), containing RAM, a block of multicontrollers of distributed memory, a matrix switch, a block of macroprocessors. This computational module with a tunable architecture, developed by the SRI MVS TRTU, as well as the proposed utility model, has several FPGAs and is functionally designed to solve computationally time-consuming fragments of tasks.

Недостатком данного изобретения является невозможность построения конвейерной реализации алгоритма цифровой обработки данных на общем вентильном поле модуля без ограничений по частоте, накладываемых коммуникационной схемой вычислительных элементов модуля.The disadvantage of this invention is the impossibility of constructing a conveyor implementation of a digital data processing algorithm on a common valve field of a module without frequency restrictions imposed by the communication circuit of the computing elements of the module.

Причиной недостатка является организация совместного использования вентильных ресурсов всех ПЛИС как общего решающего поля через матричный коммутатор, который накладывает высокие требования к синхронизации итогового проекта и ограничивает информационный обмен между кристаллами.The reason for the drawback is the organization of the sharing of valve resources of all FPGAs as a common decisive field through a matrix switch, which imposes high requirements on the synchronization of the final project and limits the information exchange between the crystals.

Задача, на решение которой направлена предлагаемая полезная модель, заключается в создании мультикристального модуля многопроцессорной вычислительной системы, который обеспечит возможность конвейерной реализации алгоритма цифровой обработки информации на общем вентильном поле всех вычислительных элементов, обеспечивая повышение производительности вычислительного модуля.The problem to which the proposed utility model is directed is to create a multicrystal module of a multiprocessor computing system, which will provide the possibility of pipelining an algorithm for digital processing of information on the common gate field of all computing elements, providing an increase in the performance of the computing module.

Данная задача решается созданием модуля многопроцессорной вычислительной системы для выполнения параллельно-конвейерных вычислений, содержащего блок макропроцессоров на базе программируемых логических интегральных схем, контроллеры распределенной памяти, дополнительно в него введен контроллер обращений, реализующий интерфейс с управляющей машиной вычислительного устройства, дополнительно в блок макропроцессоров введены, по меньшей мере, четыре макропроцессора, причем в состав каждого макропроцессора входит контроллер распределенной памяти, а для каждого This problem is solved by creating a multiprocessor computing system module for performing parallel-pipelined computing, containing a block of macroprocessors based on programmable logic integrated circuits, distributed memory controllers, an access controller that implements an interface with a control machine of a computing device is additionally introduced into it, additionally, at least four macroprocessors, and each distribution microprocessor includes a distribution controller ennoy memory, for each

макропроцессора введен блок распределенной статической памяти, при этом контроллер обращений соединен с информационными входами блока макропроцессоров по двунаправленным линиям и, по меньшей мере, с одним блоком распределяемой статической памяти, причем макропроцессоры объединены в кольцо по двунаправленным линиям, реализуя возможность независимого информационного обмена для каждой пары последовательно размещенных в кольце макропроцессоров, а каждый макропроцессор связан с соответствующим блоком распределенной статической памяти, при этом внутри двух пар макропроцессоров реализована связь по двунаправленным линиям.a distributed static memory unit is introduced, the access controller is connected to the information inputs of the macro-processor unit via bi-directional lines and at least one distributed static memory unit, and the macro-processors are combined in a ring along bi-directional lines, realizing the possibility of independent information exchange for each pair sequentially placed in the ring of macroprocessors, and each macroprocessor is associated with a corresponding block of distributed static memory and wherein the inside two pairs macroprocessor implemented bidirectional communication lines.

Кроме того, макропроцессоры объединены в кольцо по 320 двунаправленным линиям.In addition, the macro-processors are ringed along 320 bi-directional lines.

Кроме того, каждый макропроцессор связан с соответствующим блоком распределенной статической памяти по 65 линиям.In addition, each macro processor is associated with a corresponding block of distributed static memory along 65 lines.

Кроме того, внутри пар макропроцессоров 2-7 и 3-6 реализована связь по 40 радиальным двунаправленным линиям.In addition, within pairs of macroprocessors 2-7 and 3-6, communication is implemented along 40 radial bi-directional lines.

Данная задача по варианту 2 решается созданием модуля многопроцессорной вычислительной системы, предназначенной для выполнения параллельно-конвейерных вычислений, содержащего блок макропроцессоров на базе программируемых логических интегральных схем, контроллеры распределенной памяти, отличающийся тем, что в него введен контроллер обращений, реализующий интерфейс с управляющей машиной вычислительного устройства, дополнительно в блок макропроцессоров введены, по меньшей мере, четыре макропроцессора, причем в состав каждого макропроцессора входит контроллер распределенной памяти, а для каждого макропроцессора введен блок распределенной статической памяти, при этом контроллер обращений соединен с информационными входами блока макропроцессоров по 40 двунаправленным линиям и, по меньшей мере, с одним блоком распределяемой статической памяти, причем макропроцессоры объединены в кольцо по 320 двунаправленным линиям, реализуя возможность независимого информационного обмена для каждой пары последовательно размещенных в кольце макропроцессоров, а каждый макропроцессор связан с соответствующим блоком распределенной статической памяти по 65 линиям, при этом внутри двух пар макропроцессоров реализована This task according to option 2 is solved by creating a module of a multiprocessor computing system designed to perform parallel-pipelined computing, containing a block of macroprocessors based on programmable logic integrated circuits, distributed memory controllers, characterized in that a call controller is introduced into it, which implements an interface with a computer control machine devices, additionally, at least four macroprocessors are introduced into the block of macroprocessors, moreover, the composition of each macro a distributed memory controller is included in the processor, and a distributed static memory block is introduced for each macro processor, and the access controller is connected to information inputs of the macro processor block via 40 bi-directional lines and at least one distributed static memory block, with the macro processors combined in a ring of 320 bidirectional lines, realizing the possibility of independent information exchange for each pair of macroprocessors sequentially placed in the ring, and each macroprocess op associated with a respective unit distributed static memory 65 lines, the inside of the two pairs is realized macroprocessor

связь по двунаправленным линиям 2-7 и 3-6 реализована связь по 40 радиальным двунаправленным линиям.communication on bi-directional lines 2-7 and 3-6, communication on 40 radial bi-directional lines is implemented.

Ограничением по производительности созданного конвейера выступает лишь максимально возможный объем информации (40 байт), передаваемый между кристаллами модуля.The performance limit for the created pipeline is only the maximum amount of information (40 bytes) transferred between the crystals of the module.

Технический результат, достигаемый при осуществлении полезной модели, состоит в согласованном темпе обработки и обмена информацией на всех элементах модуля.The technical result achieved by the implementation of the utility model consists in a coordinated pace of processing and exchange of information on all elements of the module.

Для достижения указанного технического результата в полезной модели, как и в прототипе, вместо процессорных элементов введены макропроцессоры (МАП), реализованные в ПЛИС. В отличие от указанного прототипа, предлагаемая полезная модель имеет вместо матричного коммутатора, с одной стороны, расширяющего класс решаемых задач, с другой стороны, использующего ресурсы и не позволяющего вести обмен между словами требуемой разрядности, используется коммутационная топология "кольцо". Что на ряде задач (при размещении конвейера в нескольких кристаллах), дает заметный выигрыш в производительности.To achieve the specified technical result in a utility model, as in the prototype, instead of processor elements introduced macro-processors (MAP) implemented in FPGA. In contrast to the specified prototype, the proposed utility model has, instead of a matrix switch, on the one hand expanding the class of tasks to be solved, on the other hand, using resources and not allowing exchanges between words of the required bit depth, the ring switching topology is used. Which on a number of tasks (when placing the conveyor in several crystals), gives a noticeable gain in performance.

Причинно-следственная связь между совокупностью существенных признаков заявленной полезной модели и достигаемым техническим результатом заключается в следующем: фиксированная топология "кольцо" позволяет повысить производительность вычислительного модуля при решении задач за счет структурной организации крупных операций при необходимости размещения элементов конвейера в различных макропроцессорах (ПЛИС) и передачи между ними строк до 320 бит, т.е. возможность получения более высокой производительности, в отличие от прототипа, в котором каждый блок решает свою задачу.A causal relationship between the set of essential features of the claimed utility model and the technical result achieved is as follows: a fixed topology "ring" allows to increase the performance of the computing module in solving problems due to the structural organization of large operations when it is necessary to place conveyor elements in various macroprocessors (FPGAs) and transferring between them strings up to 320 bits, i.e. the possibility of obtaining higher performance, in contrast to the prototype, in which each unit solves its own problem.

В дальнейшем предлагаемое полезная модель поясняется конкретными примерами его выполнения и прилагаемыми чертежами, на которых:In the future, the proposed utility model is illustrated by specific examples of its implementation and the accompanying drawings, in which:

Фиг 1 - изображает структурную схему модуля многопроцессорной системы.Fig 1 - depicts a block diagram of a module of a multiprocessor system.

Фиг.2 - изображает структурную схему высокопроизводительного вычислительного устройства (ВВУ)Figure 2 - depicts a block diagram of a high-performance computing device (VVU)

Фиг.3-изображает структурную схему модуля многопроцессорной системы, на которой приведены каналы загрузки конфигурации ПЛИС базового модуля.Figure 3 depicts a block diagram of a multiprocessor system module, which shows the loading channels of the FPGA configuration of the base module.

Фиг.4 - изображает формат регистров конфигурации контроллера обращений (КО).Figure 4 - depicts the format of the configuration registers of the access controller (TO).

На структурной схеме модуля многопроцессорной системы, изображенной на фиг.1, приняты следующие обозначения: ВЭ i - вычислительный элемент (макропроцессор) модуля, КО - контроллер обращений, RAM - банк статической памяти, Р - разъемы. Контроллер обращений 9 предназначен для информационного обмена модуля с УЭВМ, представляющую собой управляющий контроллер 32 (управляющая ПЭВМ (УЭВМ) типа IBM PC). КО 9 обеспечивает связь между шиной PCI и микросхемами ПЛИС, в состав которых входит контроллер распределенной памяти (КРП), а также реализует интерфейс загрузки конфигурации ПЛИС. Через разъемы Р по LVDS каналу происходит загрузка программной и числовой информации.The following notation is used on the block diagram of a multiprocessor system module, shown in FIG. 1: VE i — the computing element (macro processor) of the module, KO — the access controller, RAM — the static memory bank, P — connectors. The access controller 9 is intended for information exchange of the module with the computer, which is a control controller 32 (control PC (computer) type IBM PC). KO 9 provides communication between the PCI bus and FPGA chips, which include a distributed memory controller (KRP), and also implements an interface for loading the FPGA configuration. Through connectors P on the LVDS channel is the loading of program and numerical information.

Высокопроизводительный вычислительный блок ВВУ (фиг.2) состоит из модуля многопроцессорной вычислительной системы 29, связанного с УЭВМ через плату сопряжения LVDS-каналом 31, обеспечивающим высокоскоростной обмен между модулем многопроцессорной вычислительной системы и УЭВМ.The high-performance computing unit of the VVU (Fig. 2) consists of a module of a multiprocessor computing system 29 connected to a computer through an interface card LVDS channel 31, which provides high-speed exchange between the module of the multiprocessor computer system and the computer.

Модуль многопроцессорной системы 29 по вариантам 1, 2 (фиг.1) содержит блок макропроцессоров на базе программируемых логических интегральных схем 1-8, в состав каждого из которых входит контроллер распределенной памяти, обеспечивающий обмен информацией с соответствующим блоком распределенной статической памяти, контроллер обращений 9, реализующий интерфейс с управляющей машиной вычислительного устройства (фиг.2), при этом контроллер обращений 9 соединен с информационными входами блока макропроцессоров 1 и 8 по двунаправленным линиям и, по меньшей мере, с одним блоком распределенной статической памяти 13-20, причем макропроцессоры 1-8 соединены в кольцо двунаправленными линиями, реализуя возможность независимого информационного обмена для каждой пары, последовательно размещенных в кольце, макропроцессоров, а каждый макропроцессор 1-8 связан с соответствующим блоком распределенной статической памяти 21-28.The multiprocessor system module 29 according to options 1, 2 (Fig. 1) contains a block of macroprocessors based on programmable logic integrated circuits 1-8, each of which includes a distributed memory controller that provides information exchange with the corresponding distributed static memory unit, access controller 9 that implements the interface with the control machine of the computing device (figure 2), while the access controller 9 is connected to the information inputs of the block of macroprocessors 1 and 8 in bidirectional lines and, in m at least, with one block of distributed static memory 13-20, and the macroprocessors 1-8 are connected in a ring by bidirectional lines, realizing the possibility of independent information exchange for each pair of macroprocessors sequentially placed in the ring, and each macroprocessor 1-8 is connected to the corresponding block distributed static memory 21-28.

Кроме того, макропроцессоры 1-8 объединены в кольцо по 320 двунаправленным линиям.In addition, the macroprocessors 1-8 are combined in a ring along 320 bidirectional lines.

Кроме того, каждый макропроцессор 1-8 связан с соответствующим блоком распределенной статической памяти по 65 линиям.In addition, each macroprocessor 1-8 is associated with a corresponding block of distributed static memory along 65 lines.

Кроме того, внутри пар 2-7 и 3-6 макропроцессоров реализована связь по 40 радиальным двунаправленным линиям.In addition, within pairs of 2-7 and 3-6 macroprocessors, communication is carried out along 40 radial bi-directional lines.

Загрузка программной и числовой информации осуществляется через разъемы 11, 12.Download software and numerical information through connectors 11, 12.

Устройство по вариантам 1, 2 работает следующим образом.The device according to options 1, 2 works as follows.

Работа устройства демонстрируется на наилучшем примере исполнения, показанном на фиг.1-4. В блоке макропроцессоров на базе программируемых логических интегральных схем 1-8 группа из восьми макропроцессоров рассматривается как конфигурационно-неделимый блок. Конфигурировать блок можно только целиком. Перед началом конфигурации следует установить соответствующий бит PROG в ноль, что вызовет сброс предыдущей конфигурации ПЛИС. После, установив бит PROG в единицу, можно перевести ПЛИС в состояние ожидания.The operation of the device is demonstrated in the best example of execution, shown in figures 1-4. In a block of macro-processors based on programmable logic integrated circuits 1-8, a group of eight macro-processors is considered as a configuration-indivisible unit. The unit can only be configured in its entirety. Before starting the configuration, set the corresponding PROG bit to zero, which will reset the previous FPGA configuration. After setting the PROG bit to one, you can put the FPGA in the standby state.

Для реализации загрузки конфигурации ПЛИС 1-8 в состав контроллера обращений 9 введены два 32-разрядных регистра конфигурации РК0 36 и РК1 37 (фиг.4).To implement the download configuration of the FPGA 1-8, two 32-bit configuration registers PK0 36 and PK1 37 were introduced into the access controller 9; FIG. 4.

Регистр конфигурации РК0 36.PK0 configuration register 36.

Разряды (31÷24)-DATA3(7:0), (23÷16)-DATA2(7:0), (15÷8)-DATA1(7:0), (7÷0)-DATA0(7:0) соответствуют байтам конфигурационного файла, загружаемого в ПЛИС.Discharges (31 ÷ 24) -DATA3 (7: 0), (23 ÷ 16) -DATA2 (7: 0), (15 ÷ 8) -DATA1 (7: 0), (7 ÷ 0) -DATA0 (7: 0) correspond to the bytes of the configuration file loaded into the FPGA.

Регистр конфигурации РК137.RK137 configuration register.

Разряды (31÷28)-DONE 1(3:0) соответствуют сигналам DONE -окончание процесса конфигурации, поступающим от ПЛИС первого столбца БМ.The bits (31 ÷ 28) -DONE 1 (3: 0) correspond to the DONE signals — the end of the configuration process coming from the FPGA of the first BM column.

Разряды (25÷24)-PROG_B (1:0) соответствуют сигналам PROG - сброс конфигурации, подаваемый на все ПЛИС соответствующего столбца БМ.The bits (25 ÷ 24) -PROG_B (1: 0) correspond to the PROG signals — configuration reset applied to all FPGAs of the corresponding BM column.

Разряды (23÷22)-NUM_COLUNM(1:0) соответствуют выборке соответствующего столбца БМ.The digits (23 ÷ 22) -NUM_COLUNM (1: 0) correspond to the selection of the corresponding BM column.

Разряды (21÷20)-NUM_CHIP(1:0) соответствуют выборке соответствующей ПЛИС в столбце БМ.The bits (21 ÷ 20) -NUM_CHIP (1: 0) correspond to the selection of the corresponding FPGA in the BM column.

Разряды (15÷12)-DONE0(3:0) соответствуют сигналам DONE -окончание процесса конфигурации, поступающим от ПЛИС нулевого столбца БМ.The bits (15 ÷ 12) -DONE0 (3: 0) correspond to the DONE signals — the end of the configuration process coming from the FPGA of the zero column of the BM.

Остальные разряды регистра РК1 - не используются (N.U.).The remaining bits of the register PK1 are not used (N.U.).

Если ПЛИС находится в состоянии ожидания, то при поступлении данных в регистр конфигурации РК0 в контроллере обращений 9 автоматически осуществляется запись конфигурации в ПЛИС, выбранной в регистре РК1, путем формирования сигналов DATA (исх. данные) в сопровождении перепадов на входе/выходе 11-12 CCLK (задаваемая частота) и соответствующих сигналов RDWR_B (чтение/запись), CS (старт/стоп) (фиг.3).If the FPGA is in the standby state, then when the data arrives in the PK0 configuration register in the access controller 9, the configuration is automatically recorded in the FPGA selected in the PK1 register by generating DATA signals (original data) accompanied by differences in input / output 11-12 CCLK (set frequency) and the corresponding signals RDWR_B (read / write), CS (start / stop) (figure 3).

Для загрузки конфигурации ПЛИС используется программа LF.exe. Данная программа работает с загрузочным файлом конфигурации, который имеет расширение ba.To load the FPGA configuration, use the LF.exe program. This program works with a boot configuration file that has the extension ba.

Пользователь может создать собственный ba-файл и загрузить оригинальные конфигурации ПЛИС. После выбора ba-файла устанавливается программный файл в качестве вычислительного.The user can create his own ba-file and download the original FPGA configuration. After selecting the ba-file, the program file is installed as a computational file.

Формат загрузочного файла конфигурации представляет собой последовательность строк, в которых указаны: 0, 1, 2, 3 - номер загружаемой линейки базового модуля; file_name j - имя конфигурационного файла прошивки ПЛИС (*.bit), где j - номер ПЛИС БМ.The format of the boot configuration file is a sequence of lines that indicate: 0, 1, 2, 3 - the number of the loaded line of the base module; file_name j is the name of the FPGA firmware configuration file (* .bit), where j is the BMP FPGA number.

00 file_name 0file_name 0 file_name 1file_name 1 file_name 2file_name 2 file_name 3file_name 3 1one file_name 4file_name 4 file_name 5file_name 5 file_name 6file_name 6 file_name 7file_name 7

Если необходимо, чтобы часть ПЛИС осталась незагруженной, возможно воспользоваться служебными прошивками ПЛИС. При этом конфигурируемые файлы не должны быть пропущены. Загрузка конфигурации ПЛИС осуществляется по линейкам. Сигнал DONE_i в каждой линейке корректно вырабатывается, только если используется загрузка четырех bit-файлов, где i - номер загружаемой линейки базового модуля. Единичный уровень сигнала DONE_i означает, что ПЛИС загружена корректно.If it is necessary for the FPGA part to remain unloaded, it is possible to use FPGA firmware. Configurable files should not be skipped. Download FPGA configuration by line. The DONE _i signal in each line is correctly generated only if downloading of four bit-files is used, where i is the number of the loaded line of the base module. A single DONE _i signal level means that the FPGA is loaded correctly.

После загрузки данных командой СТАРТ запускается процесс анализа.After loading the data, the START command starts the analysis process.

Во время работы управляющая ЭВМ производит непрерывный опрос регистра команд контроллера обращений, в котором при нахождении события в какой-либо из микросхем устанавливается бит прерывания. Затем с помощью команды ЧТЕНИЕ СТАТУСА последовательно опрашивается регистр статуса каждой из восьми микросхем. По значениям статусов определяется характер события и номер микросхемы, выдавшей сигнал прерывания.During operation, the host computer continuously polls the access controller instruction register, in which when an event is found in any of the microcircuits, an interrupt bit is set. Then, using the READ STATUS command, the status register of each of the eight chips is sequentially polled. The status values determine the nature of the event and the number of the chip that issued the interrupt signal.

События могут быть двух видов: переполнение счетчика изменяемой части ключа и нахождение совпадения с эталоном. В первом случае происходит переход к следующей итерации процесса анализа с загрузкой новых значений постоянной части ключа и обнуления переменной части. Во втором случае из микросхемы, в которой найдено совпадение, при помощи команды ВЫГРУЗКА СЧЕТЧИКА считывается значение изменяемой части. В следующей итерации процесс анализа начинается со значения изменяемой части, следующего за найденным.Events can be of two types: overflow of the counter of the variable part of the key and finding a match with the standard. In the first case, there is a transition to the next iteration of the analysis process with loading new values of the constant part of the key and zeroing the variable part. In the second case, the value of the variable part is read out from the chip in which a match is found using the UNLOAD COUNTER command. In the next iteration, the analysis process begins with the value of the variable part following the found one.

Модуль многопроцессорной вычислительной системы, предназначенный для параллельно-конвейерной реализации вычислительно-трудоемких задач, решение которых требует значительного аппаратного ресурса и связано с выполнением большого числа опробований, может быть выполнен с возможностью организацией вычислений в 8-ми ПЛИС VIRTEX 4 LX80 - возможностью конвейерной реализации крупных операций за счет объединения в единую структуру аппаратного ресурса до 8-ми ПЛИС VIRTEX 4 LX80, наличием контролера обращений - управляющей ПЛИС VIRTEX 4 FX60, распределяемой и распределенной статической памяти 64МБ (по 8 МБ на ПЛИС VIRTEX 4 LX80), топологией межкристальных связей "кольцо", шириной межкристальных связей 320 разрядов, внешних выводов LVDS - 28 разрядов.A multiprocessor computing system module designed for parallel-pipelined implementation of computationally-laborious tasks, the solution of which requires a significant hardware resource and is associated with a large number of tests, can be performed with the possibility of organizing calculations in 8 VIRTEX 4 LX80 FPGAs - the possibility of pipelining large operations by combining up to 8 VIRTEX 4 LX80 FPGAs into a single structure of a hardware resource, by the presence of a call controller - the VIRTEX 4 FX60 FPGA, distributed and distributed EFINITIONS 64MB (8 MB FPGA VIRTEX 4 LX80) static memory topology intercrystalline bonds "ring", a width of 320 bits intercrystalline bonds, external terminals LVDS - 28 bits.

Предлагаемая полезная модель дает возможность повысить производительность вычислительной системы путем реализации топологии связей между виртуальными устройствами обработки типа "кольцо", реализации возможности независимой передачи/приема сигналов по полному графу (каждый с каждым),The proposed utility model makes it possible to increase the performance of a computing system by implementing the topology of connections between virtual processing devices of the "ring" type, by implementing the possibility of independent transmission / reception of signals over a complete graph (each with each),

передачи между виртуальными устройствами, размещенными в соседних ПЛИС модуля, за 1 такт строк длиной до 320 бит.transfers between virtual devices located in neighboring FPGAs of a module for 1 clock of lines up to 320 bits long.

Claims

1. The multiprocessor computing system module, designed to perform parallel-pipelined computing, containing a block of macroprocessors based on programmable logic integrated circuits, distributed memory controllers, characterized in that a call controller is introduced into it that implements an interface with the control machine of the computing device, in addition to the block at least four macroprocessors are introduced into the macroprocessors, with each distributed microprocessor having a distributed controller memory, and a distributed static memory block is introduced for each macro processor, while the access controller is connected to the information inputs of the macro processor block in bi-directional lines and at least one block of distributed static memory, and the macro-processors are combined in a ring along bi-directional lines, making it possible to independently information exchange for each pair of macroprocessors sequentially placed in the ring, and each macroprocessor is distributed with the corresponding block ies of static memory, while inside two pairs of macroprocessors communication is realized along bidirectional lines.

2. The multiprocessor computing system module, designed to perform parallel-pipelined computing according to claim 1, characterized in that the macro-processors are combined into a ring along 320 bidirectional lines.

3. The multiprocessor computing system module, designed to perform parallel-pipelined computing according to claim 1, characterized in that each macroprocessor is connected to the corresponding distributed static memory block along 65 lines.

4. The multiprocessor computing system module, designed to perform parallel-pipelined computing according to claim 1, characterized in that within the pairs of macroprocessors 2-7 and 3-6, communication is performed along 40 radial bi-directional lines.

5. A multiprocessor computing system module designed to perform parallel-pipelined computing, containing a block of macroprocessors based on programmable logic integrated circuits, distributed memory controllers, characterized in that a call controller is introduced into it, which implements an interface with the control machine of the computing device, in addition to the block at least four macroprocessors are introduced into the macroprocessors, with each distributed microprocessor having a distributed controller memory, and a distributed static memory block is introduced for each macro processor, while the access controller is connected to information inputs of the macro processor block through 40 bi-directional lines and at least one distributed static memory block, and the macro processors are combined into a ring along 320 bi-directional lines, realizing the possibility of independent information exchange for each pair of macroprocessors sequentially placed in the ring, and each macroprocessor is associated with a corresponding distribution block constant static memory on lines 65, and the inside of the two pairs is realized macroprocessor bidirectional communication lines 2-7 and 3-6 of the radial 40 bidirectional lines.