RU2189630C1

RU2189630C1 - Method and device for filtering interprocessor requests in multiprocessor computer systems

Info

Publication number: RU2189630C1
Application number: RU2001131317/09A
Authority: RU
Inventors: н Б.А. Баба; Б.А. Бабаян; Ю.Х. Сахин; В.В. Тихорский; А.К. Ким; М.Л. Чудаков
Original assignee: Бабаян Борис Арташесович; Сахин Юлий Хананович; Тихорский Владимир Васильевич; Ким Александр Киирович; Чудаков Михаил Леонидович
Priority date: 2001-11-21
Filing date: 2001-11-21
Publication date: 2002-09-20

Abstract

FIELD: computer engineering. SUBSTANCE: upon arrival of interprocessor request from one of processors for random-access memory and/or cache memory access first of all system cache memories are checked for copies of data requested, fragments read out are compared with fragment selected from request address tag and in case of coincidence between fragment selected from request address tag and one or more fragments read out of filtering data memory, conclusion is made that copies of requested data may be available in system cache memory and cache memory accesses filtered in this way are processed to finally check for availability of requested data copies. EFFECT: enhanced degree of interprocessor request filtering. 4 cl, 5 dwg

Description

Изобретение относится к области вычислительной техники, в частности к методам и устройствам фильтрации межпроцессорных запросов, и может быть использовано при разработке многопроцессорных вычислительных систем. The invention relates to the field of computing, in particular to methods and devices for filtering interprocessor requests, and can be used in the development of multiprocessor computing systems.

Обеспечение согласованности (когерентности) данных, находящихся в кэш- и основной памяти, представляет собой известную проблему многопроцессорных вычислительных систем. См. книгу "Parallel & Distributed Computing Handbook" (Edward Y. Zomaya, Ed., McGraw-Hill 1996) [1]. В имеющих широкое распространение мультипроцессорах с шинной организацией эта проблема обычно решается посредством отслеживания запросов, появляющихся на шине (bus snooping). Подобный технический прием обеспечивает достаточно простое решение проблемы, поскольку адрес на шине является видимым для всех процессоров. Ensuring the consistency (coherence) of data in the cache and main memory is a well-known problem of multiprocessor computing systems. See the book "Parallel & Distributed Computing Handbook" (Edward Y. Zomaya, Ed., McGraw-Hill 1996) [1]. In bus-based multiprocessors with widespread use, this problem is usually resolved by tracking requests that appear on the bus (bus snooping). Such a technique provides a fairly simple solution to the problem, since the address on the bus is visible to all processors.

Однако необходимость в считывании тегов из всех кэш-памятей системы при каждом обращении в основную память влечет за собой значительные накладные расходы, что может привести к уменьшению производительности системы или потребовать дополнительных аппаратных средств (увеличение количества портов считывания памяти тегов или создание копии памяти тегов). However, the need to read tags from all the system’s cache memories each time the main memory is accessed entails significant overhead, which can lead to a decrease in system performance or require additional hardware (increase the number of readout ports for tag memory or create a copy of tag memory).

Для более мощных мультипроцессоров с использованием коммутатора проблема согласованности содержимого кэш и основной памяти представляет большую трудность, т. к. прямое выполнение операций слежения потребовало бы широковещательной передачи сообщения всем процессорам и затем получения ответов от всех процессоров системы, что могло бы свести на нет преимущества коммутационной структуры. Типичным для таких систем является использование специальных справочников для обеспечения согласованности кэш и основной памяти. Были предложены различные схемы поддержания когерентности на базе справочников [1] . Основным недостатком таких решений является значительное количество дополнительных аппаратных средств. В этой связи стали появляться методы и аппаратура для фильтрации запросов, позволяющие снизить межпроцессорный трафик за счет сравнительно небольших объемов дополнительной аппаратуры. For more powerful multiprocessors using a switch, the problem of matching the contents of the cache and the main memory is very difficult, since direct execution of tracking operations would require broadcasting messages to all processors and then receiving responses from all processors in the system, which could negate the benefits of switching structure. Typical of such systems is the use of special directories to ensure cache and main memory consistency. Various schemes for maintaining coherence based on reference books have been proposed [1]. The main disadvantage of such solutions is a significant amount of additional hardware. In this regard, methods and equipment for filtering requests began to appear, allowing to reduce interprocessor traffic due to the relatively small amounts of additional equipment.

Известен способ для снижения межпроцессорного трафика в многопроцессорных системах с общей памятью, заключающийся в использовании специальных битов - меток, включаемых в элементы таблицы страниц и показывающих, может ли информация из этой страницы находиться в кэш-памяти какого-либо процессора, и реализованный в известном устройстве, содержащем модифицированный блок преобразования виртуальных адресов в физические, расположенный в центральном процессоре, и включающий специальные биты - метки и схемы для их анализа (См. Патент US 6044446, кл. G 06 F 12/00, 1997). There is a method for reducing interprocessor traffic in multiprocessor systems with shared memory, which consists in the use of special bits - labels, included in the elements of the page table and showing whether the information from this page can be in the cache memory of any processor, and implemented in a known device containing a modified unit for converting virtual addresses to physical, located in the central processor, and including special bits - labels and circuits for their analysis (See Patent US 6044446, cl. G 06 F 12/00, 1997).

Данное техническое решение весьма экономично, но дает весьма грубую фильтрацию, позволяющую получить заметный эффект лишь в случаях отсутствия в системе общих данных или их сильной локализации. This technical solution is very economical, but gives a very rough filtering, which allows to obtain a noticeable effect only in cases where there is no general data in the system or their strong localization.

По технической сущности наиболее близким к предлагаемому изобретению является способ фильтрации межпроцессорных запросов в многопроцессорных вычислительных системах, заключающийся в предварительной проверке возможности наличия копий запрашиваемых данных в кэш-памятях процессорных узлов системы при поступлении от одного из процессоров запроса на обращение к оперативной памяти и обращении к тем группам процессоров, для которых выявлена такая возможность, а также в передаче сообщений в остальные группы процессоров для обновления в них фильтрующей информации (interest map), которая при обращении к оперативной памяти позволяет с некоторой степенью вероятности определить наличие копий запрашиваемых данных в кэш-памятях системы, а также устройство для фильтрации межпроцессорных запросов ("Снуп-фильтр") в многопроцессорных вычислительных системах, содержащее блок фильтрации, позволяющий определить процессоры или группы процессоров, которые могут иметь в своих кэш-памятях данные, принадлежащие к некоторому пространству адресов (например, страницы памяти, как в предыдущем аналоге), а также специальную память фильтрующей информации для каждого процессора или группы процессоров (см. патент US 5966729, кл. G 06 F 12/12, 1997). By technical nature, the closest to the present invention is a method for filtering interprocessor requests in multiprocessor computing systems, which consists in pre-checking the availability of copies of the requested data in the cache memories of the processor nodes of the system when a request is received from one of the processors for accessing the RAM and accessing those groups of processors for which such a possibility has been identified, as well as in sending messages to other processor groups for updating to their filtering information (interest map), which, when accessing the RAM, allows with some degree of probability to determine the availability of copies of the requested data in the system’s cache memory, as well as a device for filtering interprocessor requests ("Snoop-filter") in multiprocessor computing systems, which contains a filtering unit that allows you to identify processors or groups of processors that may have data in their cache memory belonging to a certain address space (for example, a memory page, as in the previous log), as well as a special memory of filtering information for each processor or group of processors (see US patent 5966729, class G 06 F 12/12, 1997).

Однако известные способ и устройство требуют значительных аппаратных затрат для обеспечения достаточной степени фильтрации, что приводит к усложнению системы в целом. Затраты оборудования обусловлены тем, что объем памяти фильтра является функцией от объема оперативной памяти системы, а также размножением аппаратуры фильтра, так как количество фильтров равно количеству процессорных узлов (системных узлов) и сложной структурой устройства фильтрации. However, the known method and device require significant hardware costs to provide a sufficient degree of filtration, which leads to the complexity of the system as a whole. The cost of equipment is due to the fact that the filter memory is a function of the amount of system RAM, as well as the multiplication of filter equipment, since the number of filters is equal to the number of processor nodes (system nodes) and the complex structure of the filtering device.

Кроме того, для обновления информации в памятях устройств фильтрации требуется регулярная передача адресов ко всем системным узлам или группам узлов, что ограничивает возможные типы подсистемы коммутации адресов схемами шинного типа, т. е. собственно шинами или размножителями, и снижает производительность системы. In addition, to update the information in the memories of filtering devices, a regular transmission of addresses to all system nodes or groups of nodes is required, which limits the possible types of address switching subsystems by bus-type circuits, i.e., buses or multipliers, and reduces system performance.

Техническим результатом является увеличение степени фильтрации межпроцессорных запросов и повышение производительности системы при минимальных аппаратных затратах. The technical result is to increase the degree of filtering of interprocessor requests and to increase system performance with minimal hardware costs.

Для решения поставленной технической задачи в способе фильтрации межпроцессорных запросов в многопроцессорных вычислительных системах, заключающемся в предварительной проверке возможности наличия копий запрашиваемых данных в кэш-памятях системы при поступлении от одного из процессоров запроса на обращение к оперативной памяти и/или запроса на уничтожение копий данных в кэш-памятях системы, согласно изобретению, из адреса запроса выделяют части, соответствующие индексам и тегам кэш-памятей системы, первую часть используют в качестве адреса при обращении к памяти фильтрующей информации, состоящей из сегментов, каждый из которых соответствует одной из кэш-памятей системы, а из второй части выделяют фрагмент, представляющий собой несколько разрядов тега или некоторую функцию от разрядов тега, из всех сегментов памяти фильтрующей информации производят считывание содержащихся в них фрагментов тегов, после чего производят сравнение считанных фрагментов с фрагментом, выделенным из тега адреса запроса, и при отсутствии совпадений делают вывод об отсутствии копий запрашиваемых данных в кэш-памятях системы, а в случае совпадения фрагмента, выделенного из тега адреса запроса, с одним или несколькими фрагментами, считанными из памяти фильтрующей информации, делают вывод о возможности наличия копий запрашиваемых данных в кэш-памятях системы, для которых получены совпадения, и производят отфильтрованные межпроцессорные запросы к этим кэш-памятям для окончательной проверки наличия копий запрашиваемых данных, кроме того, производят обновление памяти фильтрующей информации, для чего в ячейку сегмента, соответствующего кэш-памяти, связанной с процессором-запросчиком, производят запись фрагмента тега, выделенного из адреса запроса. To solve the technical problem in a method for filtering interprocessor requests in multiprocessor computing systems, which consists in pre-checking the availability of copies of the requested data in the system’s cache memory when one of the processors receives a request to access RAM and / or a request to destroy copies of data in system cache memories, according to the invention, the parts corresponding to the system cache indexes and tags are extracted from the request address, the first part is used as addresses when accessing the memory of filtering information, consisting of segments, each of which corresponds to one of the cache memories of the system, and from the second part a fragment is selected that represents several bits of the tag or some function of the bits of the tag, from all memory segments of the filtering information read tag fragments contained in them, after which the read fragments are compared with the fragment extracted from the request address tag, and if there are no matches, they conclude that there are no copies of the request data in the system cache, and if the fragment extracted from the request address tag matches one or more fragments read from the filtering information memory, it is concluded that copies of the requested data in the system cache for which matches are found , and produce filtered interprocessor requests to these cache memories to finally check for copies of the requested data, in addition, update the filtering information memory, for which, in the segment cell, respectively The existing cache associated with the requestor processor records the tag fragment allocated from the request address.

Кроме того, устройство фильтрации межпроцессорных запросов в многопроцессорной вычислительной системе, включающей М секций оперативной памяти, системный коммутатор адресов и данных и К процессорных узлов, каждый из которых включает как минимум один процессор и одну кэш-память, где М и К - целые числа, согласно изобретению, содержит узел межпроцессорных запросов и как минимум один блок фильтрации, выполненный с возможностью хранения фильтрующей информации, причем информационный вход блока фильтрации и информационный вход узла межпроцессорных запросов подключены к выходу адресной информации системного коммутатора адресов и данных, входы-выходы межпроцессорного обмена узла межпроцессорных запросов связаны группой двунаправленных шин с вышеуказанными К процессорными узлами, а выход результатов фильтрации блока фильтрации подключен к соответствующему входу узла межпроцессорных запросов, выход управляющей информации которого подключен к соответствующему входу управления системного коммутатора адресов и данных, причем блок фильтрации может содержать входной регистр, дешифратор номера запросчика, L компараторов фрагментов тегов и память фильтрующей информации, состоящую из L сегментов, каждый из которых соответствует одной из кэш-памятей системы, где L - целое число, причем вход дешифратора связан с первым выходом входного регистра шиной номера процессора, выходы дешифратора подключены к входам стробов записи соответствующих сегментов памяти фильтрующей информации, информационные входы которых соединены со вторым выходом входного регистра шиной фрагмента тега, к которой подключен первый вход каждого компаратора фрагментов тегов, второй вход которого подключен к выходу соответствующего сегмента памяти фильтрующей информации, а выход подключен к выходу результатов фильтрации блока фильтрации, третий выход входного регистра соединен шиной индекса с входом адреса каждого сегмента памяти фильтрующей информации, а вход является входом блока фильтрации, кроме того узел межпроцессорных запросов может содержать К буферных памятей межпроцессорных запросов, каждая из которых соответствует одному из К процессорных узлов многопроцессорной системы, блок сбора ответов на межпроцессорные запросы, дешифратор обратного адреса межпроцессорного запроса, блок регистров и К N-входовых элементов "ИЛИ", причем каждый элемент "ИЛИ" соответствует одной из К буферных памятей межпроцессорных запросов, информационный вход узла межпроцессорных запросов является входом блока регистров, первый выход которого соединен шиной адреса с первыми входами всех буферных памятей межпроцессорных запросов, вторые входы которых соединены шиной обратного адреса межпроцессорного запроса со вторым выходом блока регистров и входом дешифратора обратного адреса межпроцессорного запроса, выход которого подключен ко входу позиционного номера строки блока сбора ответов на межпроцессорные запросы, N-разрядная группа входов каждой буферной памяти и N входов соответствующего ей элемента "ИЛИ" связаны с разрядами входа результатов фильтрации узла межпроцессорных запросов, соответствующими N процессорам одного из процессорных узлов многопроцессорной системы, прямой выход каждого из N-входовых элементов "ИЛИ" соединен с третьим входом соответствующей ему буферной памяти межпроцессорных запросов, а инверсный выход - с одним из входов результатов фильтрации блока сбора ответов на межпроцессорные запросы, выходы буферных памятей межпроцессорных запросов и соответствующие входы ответов на межпроцессорные запросы блока сбора ответов на межпроцессорные запросы образуют соответствующие входы-выходы межпроцессорных обменов узла межпроцессорных запросов, а выход стробов готовности ответов и выход кодов ответов блока сбора ответов на межпроцессорные запросы образуют многоразрядный выход управляющей информации узла межпроцессорных запросов. In addition, a device for filtering interprocessor requests in a multiprocessor computing system, including M sections of RAM, a system switch of addresses and data, and K processor nodes, each of which includes at least one processor and one cache memory, where M and K are integers, according to the invention, comprises an interprocessor request node and at least one filtering unit configured to store filtering information, the information input of the filtering unit and the information input of the interprocess unit weed requests are connected to the output of the address information of the address and data system switch, the interprocessor node inputs and outputs of the interprocessor request node are connected by a group of bi-directional buses to the above-mentioned processor nodes, and the output of the filtering filtering unit is connected to the corresponding input of the interprocessor request node, the control information output of which is connected to the corresponding control input of the system switch addresses and data, and the filtering unit may contain input regis p, decryptor number decoder, L tag fragment comparators and filtering information memory, consisting of L segments, each of which corresponds to one of the system cache memory, where L is an integer, and the decoder input is connected to the first output of the input register by the processor number bus, the outputs of the decoder are connected to the inputs of the recording gates of the corresponding memory segments of the filtering information, the information inputs of which are connected to the second output of the input register by a tag fragment bus to which the first input is connected th comparator of tag fragments, the second input of which is connected to the output of the corresponding memory segment of filtering information, and the output is connected to the output of the filtering results of the filtering unit, the third output of the input register is connected by an index bus to the address input of each memory segment of the filtering information, and the input is the input of the filtering unit, in addition, the interprocessor request node may contain K buffer memories of interprocessor requests, each of which corresponds to one of the K processor multiprocessor nodes system, a block for collecting responses to interprocessor requests, a decoder of the return address of an interprocessor request, a block of registers and K N-input elements "OR", each element "OR" corresponds to one of the K buffer memories of interprocessor requests, the information input of the node interprocess requests is an input block of registers, the first output of which is connected by the address bus with the first inputs of all buffer memories of interprocessor requests, the second inputs of which are connected by the bus of the return address of the interprocessor request from the second the output of the block of registers and the input of the decoder of the return address of the interprocessor request, the output of which is connected to the input of the position number of the line of the block for collecting answers to interprocessor requests, the N-bit group of inputs of each buffer memory and N inputs of the corresponding OR element are associated with the bits of the input of the node filtering results interprocessor requests corresponding to N processors of one of the processor nodes of a multiprocessor system, the direct output of each of the N-input elements "OR" is connected to the third input of co the corresponding buffer memory of interprocessor requests, and the inverse output with one of the inputs of the results of filtering the block for collecting answers to interprocessor requests, the outputs of the buffer memories of interprocessor requests and the corresponding inputs of responses to interprocessor requests of the block for collecting responses to interprocessor requests form the corresponding inputs and outputs of the interprocessor node interprocessor requests, and the output of response ready strobes and the output of response codes of the response collection unit for interprocessor requests form a lot Operating capacity output control information unit interprocessor requests.

Сущность изобретения заключается в том, что выполнение устройства вышеописанным образом позволило использовать иные схемы фильтрации и реализовать вышеописанный способ, что позволяет получить более высокую степень фильтрации и повысить производительность системы в целом при меньших аппаратных затратах. Объем памяти блока фильтрации является функцией от объема кэш-памятей системы, а его схема состоит из минимального количества простых элементов. Кроме того, обновление информации в памяти фильтрующей информации происходит одновременно с обслуживанием запросов к оперативной памяти и не требует дополнительных передач адресов. The essence of the invention lies in the fact that the implementation of the device in the manner described above allowed the use of other filtering schemes and the implementation of the above method, which allows to obtain a higher degree of filtering and to increase the performance of the system as a whole with less hardware. The amount of memory of the filtering unit is a function of the amount of cache memory of the system, and its circuit consists of a minimum number of simple elements. In addition, updating information in the memory of filtering information occurs simultaneously with servicing requests for RAM and does not require additional address transfers.

Сравнение предлагаемых технических решений с ближайшим аналогом позволяет утверждать о соответствии критерию "новизна", а отсутствие в аналогах отличительных признаков говорит о соответствии критерию "изобретательский уровень". Предварительное моделирование позволяет судить о возможности промышленного использования. Comparison of the proposed technical solutions with the closest analogue allows us to confirm compliance with the criterion of "novelty", and the absence of distinctive features in the analogs indicates compliance with the criterion of "inventive step". Preliminary modeling allows us to judge the possibility of industrial use.

На фиг.1 представлена функциональная блок-схема предлагаемого устройства фильтрации межпроцессорных запросов в составе многопроцессорной вычислительной системы, на фиг.2 - принципиальная блок-схема блока фильтрации; на фиг.3 - принципиальная блок-схема узла межпроцессорных запросов; на фиг.4 - принципиальная блок-схема блока сбора ответов на межпроцессорные запросы; на фиг. 5 - блок-схема алгоритма обслуживания запроса с поддержкой согласованности данных и предварительной фильтрацией межпроцессорных запросов. Figure 1 presents a functional block diagram of the proposed device for filtering interprocessor requests as part of a multiprocessor computing system, figure 2 is a schematic block diagram of a filtering unit; figure 3 is a schematic block diagram of a node interprocess requests; figure 4 is a schematic block diagram of a unit for collecting responses to interprocessor requests; in FIG. 5 is a block diagram of a request servicing algorithm with support for data consistency and preliminary filtering of interprocessor requests.

Устройство фильтрации межпроцессорных запросов в многопроцессорной вычислительной системе (фиг.1) содержит узел 1 межпроцессорных запросов и, как минимум, один блок 2 фильтрации, выполненный с возможностью хранения фильтрующей информации. Информационные входы 3 и 4 блока 2 фильтрации и узла 1 межпроцессорных запросов подключены к выходу адресной информации системного коммутатора 5 адресов и данных, связанному первой и второй группами шин 6-1. ..6-К и 7-1...7-К, соответственно с К процессорными узлами 8-1...8-К, каждый из которых включает как минимум один процессор 9-1...9-N и как минимум одну кэш-память 10-1... 10-N, и М секциями 11-1...11-М оперативной памяти, где М, N и К - целые числа. Входы-выходы 12-1...12-К межпроцессорного обмена узла 1 межпроцессорных запросов связаны группой двунаправленных шин 13-1...13-К с вышеуказанными К процессорными узлами 8-1. ..8-К, а выход 14 результатов фильтрации блока 2 фильтрации подключен к соответствующему входу 15 узла 1 межпроцессорных запросов, выход 16 управляющей информации которого подключен к соответствующему входу управления системного коммутатора 5 адресов и данных. Входы и выходы узла 1 межпроцессорных запросов и блока 2 фильтрации являются многоразрядными. The device for filtering interprocessor requests in a multiprocessor computing system (Fig. 1) comprises a node 1 for interprocessor requests and at least one filtering unit 2 configured to store filtering information. Information inputs 3 and 4 of filtering unit 2 and interprocessor request node 1 are connected to the output of the address information of the system switch 5 addresses and data associated with the first and second bus groups 6-1. ..6-K and 7-1 ... 7-K, respectively, with K processor nodes 8-1 ... 8-K, each of which includes at least one processor 9-1 ... 9-N and how at least one cache memory 10-1 ... 10-N, and M sections 11-1 ... 11-M RAM, where M, N and K are integers. Inputs-outputs 12-1 ... 12-K interprocessor exchange node 1 interprocessor requests are connected by a group of bidirectional buses 13-1 ... 13-K with the above K processor nodes 8-1. ..8-K, and the output 14 of the filtering results of the filtering unit 2 is connected to the corresponding input 15 of the interprocessor request node 1, the output 16 of the control information of which is connected to the corresponding control input of the system switch 5 of the address and data. The inputs and outputs of the node 1 interprocessor requests and block 2 filtering are multi-bit.

Блок 2 фильтрации (фиг. 2) содержит входной регистр 17, дешифратор 18 номера запросчика, L компараторов 19-1...19-L фрагментов тегов, и память 20 фильтрующей информации, состоящую из L сегментов 21-1...21-L, каждый из которых соответствует одной из кэш-памятей системы, где L=К•N. Вход дешифратора 18 связан с первым выходом входного регистра 17 шиной 22 номера процессора, выходы дешифратора 18 подключены ко входам 23-1...23-L стробов записи соответствующих сегментов 21-1...21-L памяти 20 фильтрующей информации, информационные входы которых соединены со вторым выходом входного регистра 17 шиной 24 фрагмента тега, к которой подключен первый вход каждого компаратора 19-1...19-L фрагментов тегов, второй вход которого подключен к выходу соответствующего сегмента 21-1...21-L памяти 20 фильтрующей информации, а выход подключен к выходу 14 результатов фильтрации блока 2 фильтрации непосредственно или через выходной регистр (см. пунктир). Третий выход входного регистра 17 соединен шиной 25 индекса с входом адреса каждого сегмента 21-1. . . 21-L памяти 20 фильтрующей информации, а вход является входом 3 блока 2 фильтрации. Filtering unit 2 (Fig. 2) contains an input register 17, a requestor number decoder 18, L comparators 19-1 ... 19-L tag fragments, and filtering information memory 20, consisting of L segments 21-1 ... 21- L, each of which corresponds to one of the cache memories of the system, where L = K • N. The input of the decoder 18 is connected to the first output of the input register 17 by the bus 22 of the processor number, the outputs of the decoder 18 are connected to the inputs 23-1 ... 23-L of the recording gates of the corresponding segments 21-1 ... 21-L of the memory 20 of filtering information, information inputs which are connected to the second output of the input register 17 by bus 24 of the tag fragment, to which the first input of each comparator 19-1 ... 19-L of tag fragments is connected, the second input of which is connected to the output of the corresponding memory segment 21-1 ... 21-L 20 filtering information, and the output is connected to output 14 of the result filtering block 2 filtering directly or through the output register (see dotted line). The third output of the input register 17 is connected by an index bus 25 to the address input of each segment 21-1. . . 21-L of memory 20 of filtering information, and the input is input 3 of filtering unit 2.

Узел 1 межпроцессорных запросов (фиг. 3) содержит К буферных памятей 26-1...26-К межпроцессорных запросов, каждая из которых соответствует одному из К процессорных узлов 8-1...8-К многопроцессорной системы, блок 27 сбора ответов на межпроцессорные запросы, дешифратор 28 обратного адреса межпроцессорного запроса, блок 29 регистров и К N-входовых элементов 30-1...30-К "ИЛИ", каждый из которых соответствует одной из К буферных памятей 26-1... 26-К межпроцессорных запросов. Информационный вход 4 узла 1 межпроцессорных запросов является входом блока 29 регистров, первый выход которого соединен шиной 31 адреса с первыми входами всех буферных памятей 26-1...26-К межпроцессорных запросов, вторые входы которых соединены шиной 32 обратного адреса межпроцессорного запроса со вторым выходом блока 29 регистров и входом дешифратора 28 обратного адреса межпроцессорного запроса, выход которого подключен ко входу 33 позиционного номера строки блока 27 сбора ответов на межпроцессорные запросы. N-разрядная группа входов каждой буферной памяти 26-1... 26-К и N входов соответствующего ей элемента 30-1...30-К "ИЛИ" связаны с разрядами входа 15 результатов фильтрации, соответствующими N кэш-памятям 10-1...10-N, одного из процессорных узлов 8-1...8-К многопроцессорной системы. Прямой выход каждого из N-входовых элементов 30-1...30-К "ИЛИ" соединен с третьим входом соответствующей ему буферной памяти 26-1...26-К межпроцессорных запросов, а инверсный выход - с одним из входов 34-1...34-К результатов фильтрации блока 27 сбора ответов на межпроцессорные запросы. Выходы буферных памятей 26-1...26-К межпроцессорных запросов и соответствующие входы 35-1...35-К ответов на межпроцессорные запросы блока 27 сбора ответов на межпроцессорные запросы образуют соответствующие входы-выходы 12-1...12-К межпроцессорных обменов узла 1 межпроцессорных запросов. Выход 36 стробов готовности ответов и выход 37 кодов ответов блока 27 сбора ответов на межпроцессорные запросы образуют многоразрядный выход 16 управляющей информации узла 1 межпроцессорных запросов. Дешифратор 28 обратных адресов и блок 29 регистров могут быть выполнены по соответствующим стандартным схемам. Interprocessor request node 1 (Fig. 3) contains K buffer memories 26-1 ... 26-K of interprocessor requests, each of which corresponds to one of K processor nodes 8-1 ... 8-K of a multiprocessor system, response collection unit 27 for interprocessor requests, a decoder 28 of the return address of the interprocessor request, block 29 registers and K N-input elements 30-1 ... 30-K "OR", each of which corresponds to one of K buffer memories 26-1 ... 26- To interprocessor requests. The information input 4 of node 1 of interprocessor requests is the input of block 29 registers, the first output of which is connected by bus 31 of the address to the first inputs of all buffer memories 26-1 ... 26-K of interprocessor requests, the second inputs of which are connected by bus 32 of the return address of the interprocessor request to the second the output of the block 29 registers and the decoder 28 of the return address of the interprocessor request, the output of which is connected to the input 33 of the position line number of the block 27 of the collection of responses to interprocessor requests. The N-bit group of inputs of each buffer memory 26-1 ... 26-К and N inputs of the corresponding element 30-1 ... 30-К "OR" are associated with the bits of the input 15 of the filtering results corresponding to N cache memories 10- 1 ... 10-N, of one of the processor nodes 8-1 ... 8-K of a multiprocessor system. The direct output of each of the N-input elements 30-1 ... 30-К "OR" is connected to the third input of the corresponding buffer memory 26-1 ... 26-K of interprocessor requests, and the inverse output is connected to one of the inputs 34- 1 ... 34-K filtering results of block 27 for collecting responses to interprocessor requests. The outputs of the buffer memories 26-1 ... 26-K interprocessor requests and the corresponding inputs 35-1 ... 35-K responses to the interprocessor requests block 27 of the collection of responses to interprocessor requests form the corresponding inputs-outputs 12-1 ... 12- To interprocessor exchanges of node 1 interprocessor requests. The output of 36 response readiness gates and the output of 37 response codes of the block 27 for collecting responses to interprocessor requests form a multi-bit output 16 of the control information of the node 1 of interprocessor requests. The decoder 28 return addresses and block 29 registers can be performed according to the corresponding standard schemes.

Блок 27 сбора ответов на межпроцессорные запросы (фиг.4) содержит I групп 38-1. . . 38-1 элементов сбора ответов, номер каждой из которых соответствует обратному адресу межпроцессорного запроса и К дешифраторов 39-1... 39-К обратного адреса, каждый из которых соответствует одному из К процессорных узлов многопроцессорной системы. Каждая из групп 38-1...38-1 элементов сбора ответов на межпроцессорные запросы содержит К RS-триггеров 40-1... 40-К готовности, и соответствующие им элементы 41-1...41-К "И" и элементы 42-1. ..42-K "ИЛИ", формирующие сигнал установки соответствующего триггера в "1", К RS-триггеров 43-1...43-К кода ответа и соответствующие им элементы 44-1. . . 44-К "И", формирующие сигнал установки соответствующего триггера в "1", К-входовой элемент 45 "И", формирующий сигнал готовности строки и К-входовой элемент 46 "ИЛИ", формирующий код ответа строки. Выходы элементов 45 всех групп 38-1...38-1 образуют многоразрядный выход 36 стробов готовности ответов блока 27 сбора ответов на межпроцессорные запросы. Выходы элементов 46 всех групп 38-1...38-1 образуют многоразрядный выход 37 кодов ответов блока 27 сбора ответов на межпроцессорные запросы. Block 27 collecting responses to interprocessor requests (figure 4) contains I groups 38-1. . . 38-1 response collection elements, the number of each of which corresponds to the return address of the interprocessor request and K decoders 39-1 ... 39-K of the return address, each of which corresponds to one of the K processor nodes of the multiprocessor system. Each of the groups 38-1 ... 38-1 of the elements for collecting responses to interprocessor requests contains K RS-triggers 40-1 ... 40-K readiness, and the corresponding elements 41-1 ... 41-K "And" and elements 42-1. ..42-K "OR", forming the signal for setting the corresponding trigger to "1", K RS-flip-flops 43-1 ... 43-K of the response code and the corresponding elements 44-1. . . 44-K "AND" forming the signal for setting the corresponding trigger to "1", K-input element 45 "AND", forming the signal of readiness of the line and K-input element 46 "OR", forming the response code of the line. The outputs of the elements 45 of all groups 38-1 ... 38-1 form a multi-bit output 36 of the response readiness gates of the block 27 for collecting responses to interprocessor requests. The outputs of the elements 46 of all groups 38-1 ... 38-1 form a multi-bit output 37 of the response codes of the block 27 for collecting responses to interprocessor requests.

На фиг.1-4 не показаны цепи синхронизации и питания, которые могут быть выполнены стандартным образом. 1-4, synchronization and power circuits, which can be performed in a standard manner, are not shown.

Пример алгоритма обслуживания запроса с поддержкой согласованности данных и фильтрацией межпроцессорных запросов представлен на фиг.5. An example of a request servicing algorithm with support for data consistency and filtering of interprocessor requests is presented in FIG. 5.

Необходимость обеспечения согласованности (когерентности) данных, находящихся в кэш-памятях процессорных узлов и оперативной памяти многопроцессорной системы приводит к тому, что большая часть обращений процессора к оперативной памяти должна сопровождаться межпроцессорными запросами, т.е. обращениями к кэш-памятям других процессоров для проверки наличия в них копий данных, за которыми производится обращение. При наличии таких копий в зависимости от типа обращения к памяти (запись/считывание) и состояния найденных копий ("чистая"/"модифицированная", "единственная"/ "разделяемая") выполняются действия, обеспечивающие согласованное состояние данных в системе. Так, например, найденная копия может быть уничтожена или ее состояние может быть изменено с "единственная" на "разделяемая". Также может быть отменено считывание данных из оперативной памяти и вместо этого выполнено считывание копии данных из какой-либо кэш-памяти системы. В любом случае обслуживание запроса к оперативной памяти и/или кэш-памятям других процессоров начинается с проверки наличия копий данных во всех кэш-памятях системы. Эта фаза является наиболее "затратной", так как требует передачи адресов ко всем кэш-памятям системы и обращения к ним, что приводит к деградации производительности системы или реализации справочников, требующей значительных затрат оборудования. При этом большая часть проверок дает отрицательный результат, то есть копии данных, к которым производится обращение, находятся в небольшом количестве кэш-памятей системы, либо не находятся вовсе. The need to ensure the consistency (coherence) of the data in the cache memories of the processor nodes and the RAM of the multiprocessor system leads to the fact that most of the processor accesses the RAM should be accompanied by interprocessor requests, i.e. by accessing the cache memories of other processors to check for the availability of copies of the data that are being accessed. If such copies are available, depending on the type of memory access (write / read) and the state of the found copies (“clean” / “modified”, “single” / “shared”), actions are taken to ensure a consistent state of the data in the system. So, for example, a found copy can be destroyed or its state can be changed from “only” to “shared”. Reading data from the main memory can also be canceled and instead a copy of the data is read from any system cache. In any case, servicing the request to the RAM and / or cache memories of other processors begins with checking for copies of data in all cache memories of the system. This phase is the most "expensive", as it requires the transfer of addresses to all cache memories of the system and access to them, which leads to degradation of system performance or the implementation of directories that require significant equipment costs. Moreover, most of the checks give a negative result, that is, copies of the data that are being accessed are in a small amount of cache memory of the system, or are not at all.

В предлагаемом изобретении для снижения затрат, связанных с проверкой наличия копий данных в кэш-памятях системы, вводится дополнительная фаза фильтрации - предварительной проверки возможности наличия копий данных, за которыми производится обращение, в кэш-памятях системы. In the present invention, in order to reduce the costs associated with checking for the availability of data copies in the system cache, an additional filtering phase is introduced - a preliminary check of the availability of copies of the data being accessed in the system cache.

Способ фильтрации межпроцессорных запросов реализуется в предлагаемом устройстве и осуществляется в следующей последовательности. A method for filtering interprocessor requests is implemented in the proposed device and is carried out in the following sequence.

При поступлении от одного из процессоров 9-1...9-L, например процессора 9-1, запроса на обращение к секциям 11-1...11-М оперативной памяти и/или запроса на уничтожение копий данных в кэш-памятях 10-2...10-L многопроцессорной системы, производят предварительную проверку возможности наличия копии данных, за которыми производится обращение, в кэш-памятях 10-2...10-L. (Копии данных - это данные, ранее считанные из оперативной памяти и помещенные в одну или несколько кэш-памятей на временное хранение). Upon receipt from one of the processors 9-1 ... 9-L, for example, processor 9-1, a request to access sections 11-1 ... 11-M of RAM and / or a request to destroy copies of data in cache memories 10-2 ... 10-L of a multiprocessor system, pre-check the availability of a copy of the data that is being accessed in the cache memories 10-2 ... 10-L. (Data copies are data previously read from RAM and placed in one or more cache memories for temporary storage).

При этом из адреса запроса выделяют части, соответствующие индексам и тегам кэш-памятей 10-1. ..10-L системы, первую часть используют в качестве адреса при обращении к памяти 20 фильтрующей информации, состоящей из сегментов 21-1. . . 21-L, каждый из которых соответствует одной из кэш-памятей системы, а из второй части выделяют фрагмент, представляющий собой несколько разрядов тега или некоторую функцию от разрядов тега. Простейшим вариантом фрагмента тега являются несколько разрядов тега, выбранные произвольно. Возможно также использование свертки разрядов тега по некоторому модулю, например по модулю "5" для четырехразрядного фрагмента. Возможно использование других функций от значения тега. Важно чтобы фрагмент был значительно меньше самого тега и давал возможность определить принадлежность данного фрагмента к исходному тегу с возможно большей точностью. At the same time, the parts corresponding to the indexes and tags of cache memories 10-1 are extracted from the request address. ..10-L systems, the first part is used as the address when accessing the memory 20 of filtering information consisting of segments 21-1. . . 21-L, each of which corresponds to one of the cache memories of the system, and from the second part a fragment is selected that represents several bits of the tag or some function of the bits of the tag. The simplest variant of a tag fragment is several bits of the tag, chosen arbitrarily. It is also possible to use the convolution of the bits of a tag modulo some, for example modulo 5 for a four-bit fragment. It is possible to use other functions of the tag value. It is important that the fragment is much smaller than the tag itself and makes it possible to determine whether this fragment belongs to the original tag with the greatest possible accuracy.

Из всех сегментов 21-2...21-L памяти 20 фильтрующей информации производят считывание содержащихся в них фрагментов тегов, после чего производят сравнение считанных фрагментов с фрагментом, выделенным из тега адреса запроса. From all segments 21-2 ... 21-L of the memory 20 of the filtering information, the tag fragments contained in them are read, then the read fragments are compared with the fragment extracted from the request address tag.

При отсутствии совпадений делают вывод об отсутствии копий запрашиваемых данных в кэш-памятях 10-2...10-L системы. If there are no matches, they conclude that there are no copies of the requested data in the cache memories 10-2 ... 10-L of the system.

В случае совпадения фрагмента, выделенного из тега адреса запроса, с одним или несколькими фрагментами, считанными из памяти 20 фильтрующей информации, делают вывод о возможности наличия копий запрашиваемых данных в кэш-памятях системы, для которых получены совпадения. If the fragment extracted from the request address tag matches one or more fragments read from the filtering information memory 20, it is concluded that copies of the requested data can be found in the system cache for which matches are obtained.

Далее производят обращения к этим кэш-памятям для окончательной проверки наличия копий запрашиваемых данных и дальнейшего обслуживания исходного запроса в соответствии с результатами проверок. Next, access to these cache-memories is made for the final check of the availability of copies of the requested data and further servicing of the initial request in accordance with the results of the checks.

Кроме того, производят обновление памяти 20 фильтрующей информации, для чего в ячейку сегмента 21-1, соответствующего кэш-памяти, связанной с процессором 10-1 (запросчиком), производят запись фрагмента тега, выделенного из адреса запроса. In addition, the filter information memory 20 is updated, for which purpose a tag fragment extracted from the request address is recorded in the cell of the segment 21-1 corresponding to the cache memory associated with the processor 10-1 (the interrogator).

Кроме фрагментов тегов в сегментах 21-1...21-L памяти 20 фильтрующей информации могут содержаться биты состояния, определяющие, в каком состоянии находится копия данных в соответствующей кэш-памяти. Анализ битов состояния позволяет осуществить дополнительную фильтрацию, т.е. в некоторых случаях отменить межпроцессорные запросы даже при наличии совпадения фрагментов тегов. Так, например, при обслуживании запроса на чтение данных можно отменить межпроцессорные запросы к кэш-памятям, если биты состояния показывают, что копии запрашиваемых данных в этих кэш-памятях "чистые", т.е. полностью совпадают с данными в оперативной памяти. In addition to tag fragments, segments 21-1 ... 21-L of the memory 20 of the filtering information may contain status bits that determine the state of the copy of the data in the corresponding cache. Analysis of status bits allows for additional filtering, i.e. in some cases, cancel interprocessor requests even if the tag fragments match. So, for example, when servicing a request for reading data, you can cancel interprocessor requests to cache memories if status bits indicate that copies of the requested data in these cache memories are “clean”, i.e. completely coincide with the data in RAM.

Запросы процессоров 9-1...9-L на обращение к секциям 11-1...11-М оперативной памяти и/или кэш-памятям 10-1...10-L содержат адрес, код операции и данные (для операций записи). Requests by processors 9-1 ... 9-L to access sections 11-1 ... 11-M of RAM and / or cache memory 10-1 ... 10-L contain the address, operation code, and data (for write operations).

Запросы поступают в системный коммутатор 5 адресов и данных по шинам 6-1. . . 6-К. Устройство управления системного коммутатора адресов и данных выбирает запрос для обслуживания и помещает его в буферную память ожидания ответов на межпроцессорные запросы, входящую в состав системного коммутатора адресов и данных (в зависимости от реализации, буферная память ожидания ответов на межпроцессорные запросы может быть включена в состав контроллеров секций 11-1...11-М оперативной памяти). Адрес запроса в этой буферной памяти является обратным адресом межпроцессорного запроса. Requests are received in the system switch 5 addresses and data on buses 6-1. . . 6-K. The control device for the address and data system switch selects a service request and places it in the buffer memory for waiting for answers to interprocessor requests, which is part of the system switch for addresses and data (depending on the implementation, the buffer memory for waiting for answers to interprocessor requests can be included in the controllers sections 11-1 ... 11-M RAM). The request address in this buffer memory is the return address of the interprocessor request.

Адрес и код операции исходного запроса, а также приданный ему обратный адрес межпроцессорного запроса передаются в узел 1 межпроцессорных запросов по шине 4. Часть разрядов адреса и номер процессора-запросчика по этой же шине передаются в блок 2 фильтрации. The address and operation code of the initial request, as well as the return address of the interprocessor request assigned to it, are transmitted to the interprocessor request node 1 via bus 4. Some of the address bits and the number of the requestor processor on the same bus are transferred to filtering unit 2.

Часть адреса, принятая на входной регистр 17 блока 2 фильтрации, делится на фрагмент тега и индекс. Фрагмент тега по шине 24 подается на информационный вход каждого сегмента 21-1...21-L памяти 20 фильтрующей информации и первый вход каждого компаратора 19-1...19-L фрагментов тегов, второй вход которого соединен с информационным выходом соответствующего сегмента 21-1 памяти 20 фильтрующей информации. Индекс по шине 25 подается на адресные входы сегментов 21-1...21-L памяти 20 фильтрующей информации. Номер процессора-запросчика по шине 22 поступает на дешифратор 18, прямые выходы которого подключены ко входам 23-1...23-L стробов записи сегментов 21-1...21-L памяти 20 фильтрующей информации. В результате производится считывание фрагментов тегов из всех сегментов 21-1...21-L памяти 20 фильтрующей информации, кроме сегмента, соответствующего процессору-запросчику, и сравнение считанных фрагментов тегов с фрагментом, выделенным из обслуживаемого запроса. Результаты сравнения с выходов компараторов 19-1...19-L подаются на многоразрядный выход 14 результатов фильтрации блока 2 фильтрации, соединенный со входом 15 результатов фильтрации узла 1 межпроцессорных запросов. В зависимости от реализации для обеспечения правильных временных соотношений и физических требований выход 14 блока 2 фильтрации и/или вход 15 узла 1 межпроцессорных запросов могут быть снабжены регистрами. Одновременно производится запись фрагмента тега со входного регистра 17 в сегмент 21-1 памяти 20 фильтрующей информации, соответствующий процессору-запросчику, т.е. обновление памяти 20 фильтрующей информации. The part of the address received at the input register 17 of the filtering unit 2 is divided into a tag fragment and an index. A tag fragment on the bus 24 is fed to the information input of each segment 21-1 ... 21-L of the filtering information memory 20 and the first input of each tag fragment comparator 19-1 ... 19-L, the second input of which is connected to the information output of the corresponding segment 21-1 memory 20 of filtering information. The index on the bus 25 is fed to the address inputs of the segments 21-1 ... 21-L of the memory 20 of the filtering information. The number of the processor-interrogator via bus 22 is supplied to the decoder 18, the direct outputs of which are connected to the inputs 23-1 ... 23-L of the recording gates of segments 21-1 ... 21-L of the filtering information memory 20. As a result, the tag fragments are read from all segments 21-1 ... 21-L of the filtering information memory 20, except for the segment corresponding to the requestor processor, and the tag tags that are read are compared with the fragment extracted from the served request. The comparison results from the outputs of the comparators 19-1 ... 19-L are fed to the multi-bit output 14 of the filtering results of the filtering unit 2, connected to the input 15 of the filtering results of the node 1 of interprocessor requests. Depending on the implementation, to ensure the correct time relationships and physical requirements, the output 14 of the filtering unit 2 and / or the input 15 of the interprocessor request node 1 may be provided with registers. At the same time, the tag fragment is recorded from the input register 17 into the segment 21-1 of the filtering information memory 20 corresponding to the requestor processor, i.e. memory update 20 filtering information.

В узле 1 межпроцессорных запросов формируются межпроцессорные запросы. Каждый межпроцессорный запрос соответствует одному из процессорных узлов 8-1. . . 8-К многопроцессорной системы и состоит из адреса, кода операции и обратного адреса межпроцессорного запроса, принятых на входной блок 29 регистров узла 1 межпроцессорных запросов из системного коммутатора 5 адресов и данных, а также поля результатов фильтрации, соответствующих данному процессорному узлу многопроцессорной системы и являющихся частью многоразрядного входа 15 результатов фильтрации принятых узлом 1 межпроцессорных запросов из блока 2 фильтрации. Результаты фильтрации показывают, в каких именно кэш-памятях 10-1. . .10-N данного процессорного узла могут находиться копии искомых данных. Соответственно при поступлении межпроцессорного запроса в процессорный узел данный запрос отменяется для тех кэш-памятей 10-1...10-N, для которых поле результатов фильтрации содержит 0. Одновременно на элементах 30-1. . .30-К "ИЛИ" формируются обобщенные результаты фильтрации для каждого процессорного узла (каждого межпроцессорного запроса). Если все разряды результатов фильтрации для какого-либо процессорного узла равны 0, т.е. блок 2 фильтрации не обнаружил для него ни одного сравнения фрагментов, то обобщенный результат фильтрации также равен 0 и соответствующий межпроцессорный запрос отменяется полностью. In node 1 of interprocessor requests interprocessor requests are generated. Each interprocessor request corresponds to one of the processor nodes 8-1. . . 8-K of the multiprocessor system and consists of the address, operation code, and return address of the interprocessor request received at the input block 29 of the registers of the node 1 of the interprocessor requests from the system switch 5 addresses and data, as well as the filtering result fields corresponding to this processor node of the multiprocessor system and are part of the multi-bit input 15 of the filtering results received by the node 1 interprocessor requests from block 2 filtering. The filtering results show in which cache memories 10-1. . .10-N of this processor node may contain copies of the desired data. Accordingly, when an interprocessor request arrives at the processor node, this request is canceled for those cache memories 10-1 ... 10-N for which the filtering results field contains 0. At the same time on elements 30-1. . .30-K "OR" generalized filtering results are generated for each processor node (each interprocessor request). If all bits of the filtering results for any processor node are 0, i.e. filtering unit 2 did not find a single fragment comparison for it, the generalized filtering result is also 0 and the corresponding interprocessor request is canceled completely.

Сформированные таким образом межпроцессорные запросы поступают на выходные интерфейсы входов-выходов 12-1...12-К межпроцессорных запросов через соответствующие буферные памяти 26-1...26-К межпроцессорных запросов, в которых запросы могут временно храниться при невозможности немедленной передачи (занятости соответствующего входа-выхода). The interprocessor requests formed in this way are sent to the output interfaces of the I / O 12-1 ... 12-K of the interprocessor requests through the corresponding buffer memory 26-1 ... 26-K of the interprocessor requests, in which the requests can be temporarily stored if it is impossible to immediately transmit ( occupancy of the corresponding input-output).

С входов-выходов 12-1...12-К по шинам 13-1...13-К межпроцессорные запросы передаются в соответствующие процессорные узлы, где производятся обращения к тем кэш-памятям 10-1...10-N, для которых поле результатов фильтрации межпроцессорного запроса содержит "1". From the inputs and outputs 12-1 ... 12-K via the buses 13-1 ... 13-K, interprocessor requests are transmitted to the corresponding processor nodes, where calls are made to those cache memories 10-1 ... 10-N, for which the field of the results of filtering the interprocessor request contains "1".

По результатам обращений к кэш-памятям в процессорных узлах 8-1...8-К формируются ответы на межпроцессорные запросы и производятся другие необходимые действия (уничтожаются и/или считываются копии данных из кэш-памятей, корректируется содержимое битов состояния копий данных в кэш-памятях). Данные действия являются стандартными при обслуживании запросов с поддержкой согласованности данных и поэтому подробно здесь не рассматриваются. Based on the results of accessing cache memories in processor nodes 8-1 ... 8-K, responses to interprocessor requests are generated and other necessary actions are performed (copies of data from cache memories are destroyed and / or read, the contents of the status bits of data copies in the cache are adjusted -memory). These actions are standard in servicing queries that support data consistency and therefore are not considered in detail here.

Ответы на межпроцессорные запросы по шинам 13-1...13-К через входы-выходы 12-1...12-К узла 1 межпроцессорных запросов поступают на входы 35-1... 35-К блока 27 сбора ответов на межпроцессорные запросы. Responses to interprocessor requests on buses 13-1 ... 13-K through inputs-outputs 12-1 ... 12-K of node 1 of interprocessor requests are received at inputs 35-1 ... 35-K of block 27 for collecting responses to interprocessor inquiries.

Параллельно с формированием межпроцессорных запросов выполняют инициализацию группы элементов сбора ответов на межпроцессорные запросы блока 27 сбора ответов на межпроцессорные запросы. Для этого обратный адрес межпроцессорного запроса со входного блока 29 регистров дешифрируют на дешифраторе 28 и продешифрированный (позиционный) обратный адрес межпроцессорного запроса с выхода дешифратора 28 подают на вход 33 блока 27 сбора ответов на межпроцессорные запросы. Каждый разряд позиционного обратного адреса со входа 33 соответствует одной из групп 38-1...38-1 блока 27 сбора ответов на межпроцессорные запросы. Одновременно на входы 34-1...34-К блока 27 сбора ответов на межпроцессорные запросы подают обобщенные результаты фильтрации для каждого процессорного узла (каждого межпроцессорного запроса) с инверсных выходов N-входовых элементов 30-1...30-К "ИЛИ". In parallel with the formation of interprocessor requests, initialization of a group of elements for collecting responses to interprocessor requests of block 27 for collecting responses to interprocessor requests is performed. For this, the return address of the interprocessor request from the input unit 29 of the registers is decrypted on the decoder 28 and the decoded (positional) return address of the interprocessor request from the output of the decoder 28 is fed to the input 33 of the block 27 for collecting responses to interprocessor requests. Each bit of the positional return address from input 33 corresponds to one of the groups 38-1 ... 38-1 of block 27 for collecting responses to interprocessor requests. At the same time, the inputs 34-1 ... 34-K of the block 27 for collecting responses to interprocessor requests are supplied with generalized filtering results for each processor node (each interprocessor request) from the inverse outputs of the N-input elements 30-1 ... 30-K "OR "

Триггеры 40-1. . .40-К готовности запроса устанавливают в "1", когда из процессорных узлов 8-1...8-К приходят ответы на межпроцессорные запросы, соответствующие данной группе, или в том случае, если при инициализации строки, для соответствующих процессорных узлов получен отрицательный обобщенный результат фильтрации. Необходимые для этого сигналы формируются на дешифраторах 39-1...39-К и элементах 41-1...41-К, 42-1...42-К. Triggers 40-1. . .40-K the readiness of the request is set to "1" when responses to interprocessor requests corresponding to this group are received from processor nodes 8-1 ... 8-K, or if, when initializing a string, received for the corresponding processor nodes negative generalized filtering result. The necessary signals for this are generated on the decoders 39-1 ... 39-K and elements 41-1 ... 41-K, 42-1 ... 42-K.

Каждый из триггеров 43-1...43-К кода ответа группы устанавливают в "1", если из соответствующего ему процессорного узла для межпроцессорного запроса с обратным адресом, соответствующим номеру группы, приходит ответ с кодом "1", означающим, что обращение к оперативной памяти должно быть отменено (чтение данных будет производиться из одной из кэш-памятей процессорного узла). Необходимые для этого сигналы формируются на дешифраторах 39-1...39-К и элементах 44-1. ..44-К. Код ответа для каждого межпроцессорного запроса является логической суммой (ИЛИ) выходов всех триггеров 43-1...43-К соответствующей строки ожидания ответов и формируется на элементах 46 "ИЛИ". Each of the triggers 43-1 ... 43-K of the group response code is set to "1" if, from the corresponding processor node for the interprocessor request with the return address corresponding to the group number, a response with the code "1" is received, which means that the call RAM should be canceled (data will be read from one of the cache memories of the processor node). The necessary signals for this are formed on the decoders 39-1 ... 39-K and elements 44-1. ..44-K. The response code for each interprocessor request is the logical sum (OR) of the outputs of all triggers 43-1 ... 43-K of the corresponding response expectation string and is generated on the OR elements 46.

Межпроцессорный запрос считается выполненным, когда все триггеры 40-1... 40-К соответствующей группы ожидания ответов установлены в "1". В этом случае на выходе элемента 45 "И" появляется "1", после чего все триггеры 40-1... 40-К и 43-1...43-К данной группы ожидания ответов сбрасываются в "0". Кроме того, выходы элементов 45 "И" всех групп 38-1...38-1 ожидания ответов образуют многоразрядный выход 36 стробов готовности, а выходы элементов 46 "ИЛИ" - многоразрядный выход 37 кодов ответов блока 27 сбора ответа на межпроцессорные запросы. С выходов 36 и 37 управляющая информация передается на выход 16 устройства межпроцессорных запросов и далее в системный коммутатор 5 адресов и данных. An interprocessor request is considered complete when all triggers 40-1 ... 40-K of the corresponding response waiting group are set to "1". In this case, “1” appears at the output of element 45 “AND”, after which all triggers 40-1 ... 40-K and 43-1 ... 43-K of this group wait for responses are reset to “0”. In addition, the outputs of the 45 AND elements of all groups 38-1 ... 38-1 of the response expectation form a multi-bit output of 36 readiness gates, and the outputs of the 46 OR elements form a multi-bit output of 37 response codes of the block 27 for collecting response to interprocessor requests. From outputs 36 and 37, control information is transmitted to output 16 of the interprocessor request device and then to the system switch 5 addresses and data.

Таким образом, в предлагаемых технических решениях достигается высокая по сравнению с прототипом степень фильтрации и повышается производительность системы в целом при меньших аппаратных затратах. Thus, in the proposed technical solutions, a high degree of filtration is achieved in comparison with the prototype and the overall performance of the system is increased at a lower hardware cost.

Claims

1. A method for filtering interprocessor requests in multiprocessor computing systems, which consists in pre-checking the availability of copies of the requested data in the system’s cache memory upon receipt of an interprocessor request from one of the processors for accessing the system’s main memory and / or system cache, from the address the request, the parts corresponding to the indexes and tags of the cache memory of the system are allocated, the first part is used as the address when accessing the filtering information memory consisting of segments, Each of them corresponds to one of the cache memories of the system, and from the second part a fragment is selected that consists of several tag bits or the function of convolution of tag bits modulo, from all memory segments of the filtering information, the tag fragments contained in them are read, after which the read fragments with a fragment extracted from the request address tag, and if there are no matches, they conclude that there are no copies of the requested data in the system’s cache memory, and if the fragment matches, select of the request address tag, with one or more fragments read from the filtering information memory, make a conclusion about the possibility of having copies of the requested data in the system’s cache memories for which matches are obtained, and make thus filtered accesses to these cache memories for final checking the availability of copies of the requested data, in addition, updating the memory of the filtering information is done, for which, in the cell of the segment corresponding to the cache memory associated with the requestor processor, Referring tag fragment isolated from the request address.

2. A device for filtering interprocessor requests in a multiprocessor computing system, including a system switch of addresses and data, connected by the first and second bus groups, respectively, to K processor nodes, each of which includes at least one processor and one cache memory, and M operational sections memory, where M and K are integers, characterized in that it contains an interprocessor request node and at least one filtering unit configured to store filtering information, the information input One of the filtering unit and the information input of the interprocessor request node are connected to the output of the address information of the system switch of addresses and data, the input-outputs of the interprocessor exchange of the interprocessor request node are connected by a group of bi-directional buses to K processor nodes of the system, and the output of the filtering results of the filtering unit is connected to the corresponding input of the interprocessor node requests, the control information output of which is connected to the corresponding control input of the system address and data switch.

3. The device according to claim 2, characterized in that the filtering unit contains an input register, an interrogator number decoder, L tag fragment comparators and filtering information memory consisting of L segments, each of which corresponds to one of the system’s cache memories, where L is an integer, and the decoder input is connected to the first output of the input register by the processor number bus, the decoder outputs are connected to the inputs of the recording gates of the corresponding memory segments of the filtering information, the information inputs of which are connected to the second output the input register is connected with the tag fragment bus, to which the first input of each tag fragment comparator is connected, the second input of which is connected to the output of the corresponding memory segment of the filtering information, and the output is connected to the output of the filtering results of the filtering unit, the third output of the input register is connected by the index bus to the address input of each memory segment of the filtering information, and the input is the input of the filtering unit.

4. The device according to claim 2, characterized in that the interprocessor request node contains K buffer memories of interprocessor requests, each of which corresponds to one of the K processor nodes of the multiprocessor system, a unit for collecting responses to interprocessor requests, an interprocessor request reverse address decoder, a register block and To the N-input OR elements, each OR element corresponding to one of the K buffer memories of interprocessor requests, the information input of the interprocessor request node is an input of the register block s, the first output of which is connected by the address bus with the first inputs of all buffer memories of interprocessor requests, the second inputs of which are connected by the bus of the return address of the interprocessor request with the second output of the register block and the input of the decoder of the return address of the interprocessor request, the output of which is connected to the input of the position number of the line of the response collection blocks for interprocessor requests, the N-bit group of inputs of each buffer memory and the N inputs of the corresponding OR element are associated with the bits of the input of the filter results of the node of the interprocessor requests corresponding to the N processors of one of the processor nodes of the multiprocessor system, the direct output of each of the N-input elements "OR is connected to the third input of the corresponding buffer memory of the interprocessor requests, and the inverse output is connected to one of the inputs of the filtering results of the response collection unit to interprocessor requests, outputs of buffer memories of interprocessor requests and the corresponding inputs of responses to interprocessor requests of the unit for collecting responses to interprocessor requests form tvetstvuyuschie inputs and outputs interprocessor interprocessor request node exchanges, and the output strobe response readiness and output unit codes responses collecting answers to questions interprocessor form a multi-bit output node interprocessor control information requests.