CN115633098B - Storage management method and device of many-core system and integrated circuit - Google Patents

Storage management method and device of many-core system and integrated circuit Download PDF

Info

Publication number
CN115633098B
CN115633098B CN202211533104.XA CN202211533104A CN115633098B CN 115633098 B CN115633098 B CN 115633098B CN 202211533104 A CN202211533104 A CN 202211533104A CN 115633098 B CN115633098 B CN 115633098B
Authority
CN
China
Prior art keywords
message
internal standard
module
protocol
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211533104.XA
Other languages
Chinese (zh)
Other versions
CN115633098A (en
Inventor
张倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongshan Microelectronics Technology Co ltd
Original Assignee
Beijing Hongshan Microelectronics Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hongshan Microelectronics Technology Co ltd filed Critical Beijing Hongshan Microelectronics Technology Co ltd
Priority to CN202211533104.XA priority Critical patent/CN115633098B/en
Publication of CN115633098A publication Critical patent/CN115633098A/en
Application granted granted Critical
Publication of CN115633098B publication Critical patent/CN115633098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/08Protocols for interworking; Protocol conversion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The device comprises four modules, namely a protocol adaptation module, a matrix calculation module, a storage controller module and an interface controller module, wherein the protocol adaptation module is used for converting messages between different internal protocols and internal standard protocols; the matrix calculation module is used for switching the internal standard message from the source port to the destination port according to the port routing mode; the storage controller module comprises an uplink sub-module for uplink processing of message data and a downlink sub-module for downlink processing of the message data; the interface controller module is used for realizing data conversion between the target external memory and the storage controller module. The device can meet the requirements of a high-performance and multifunctional storage management device in a many-core system, and can reduce the complexity of the realization of the whole many-core system.

Description

Storage management method and device of many-core system and integrated circuit
Technical Field
The present application relates to the field of integrated circuit design technologies, and in particular, to a storage management method and apparatus for a many-core system, and an integrated circuit.
Background
For the more advanced processor systems in the current market, both CPU (central processing unit) and GPU (graphics processing unit) generally support memory management devices; with the current trend of technology development, demand for computing power and integration is increasing, and CPU/GPU chips each include a large number of independent cores and tightly coupled memory management devices, and are interconnected through a high-speed Network On Chip (NOC) infrastructure, which is becoming the mainstream architecture.
Under a modern many-core architecture system, high requirements are put on a storage management device, for example, multiple controllers are required to be adapted, multiple interfaces with variable internal widths are provided, the time delay is as short as possible, and the fluctuation is small, but the existing storage management device cannot completely meet the requirements, and various functional deficiencies or design deficiencies exist.
Disclosure of Invention
In view of this, embodiments of the present application provide a storage management method, an apparatus, and an integrated circuit for a many-core system, which can effectively solve the problems that an existing storage management apparatus is small in application scenario and does not meet corresponding requirements.
In a first aspect, an embodiment of the present application provides a storage management apparatus for a many-core system, including:
the protocol adaptation module is used for connecting different internal protocol buses so as to convert the target message between the corresponding internal protocol and the internal standard protocol to obtain the internal standard message or the message data sent to the corresponding internal protocol bus;
the matrix calculation module is used for switching the internal standard message from a source port to a destination port according to a port routing mode according to the read-write request;
the storage controller module comprises an uplink sub-module and a downlink sub-module, and the uplink sub-module is used for identifying and mapping the internal standard message from the matrix calculation module and sending the internal standard message to a target interface controller; the downlink sub-module is used for converting message data returned from the target interface controller into the internal standard message, and sending the internal standard message to the matrix calculation module after demapping and identification resolution processing;
and the interface controller module is used for connecting external memories with different interface types so as to send the data read from the target external memory to the target interface controller or store the data acquired from the target interface controller into the target external memory.
In some embodiments, the internal standard protocol defines a first transmission channel and a second transmission channel according to a bus type, where the first transmission channel is used to transmit a delivered request, and the second transmission channel is used to transmit a returned response message;
the protocol adaptation module is used for converting the target message between the corresponding internal protocol and the internal standard protocol to obtain the internal standard message, and comprises:
the protocol adaptation module is used for analyzing a target message sent by a corresponding internal protocol bus to obtain corresponding data and address contents, combining channel data of the data and the address contents to obtain a message body with a uniform format, then adding a message header with a preset format to obtain an internal standard message, sending the internal standard message to a type message queue through the first transmission channel, and waiting to send the internal standard message to the matrix calculation module.
In some embodiments, the protocol adaptation module is configured to convert the target packet between the corresponding internal protocol and the internal standard protocol to obtain the packet data of the corresponding internal protocol bus, and includes:
the protocol adaptation module is further configured to send the internal standard packet from the matrix computation module to a response message queue through the second transmission channel, and then remove the packet header from the internal standard packet and perform format conversion on the obtained corresponding packet body according to the corresponding internal protocol, so as to obtain a required response message and send the required response message through the corresponding internal protocol bus.
In some embodiments, the message header of the preset format comprises three components, wherein the first component is used for describing a unique identifier of the request or response message; the second component is used for describing the type of the request or response message; the third component is for describing a processing priority of the request or response message.
In some embodiments, the read-write request includes information of a source port and a destination port of the target packet; the matrix calculation module is configured to switch the internal standard packet from a source port to a destination port according to a port routing mode according to a read-write request, and includes:
the matrix calculation module is used for acquiring a global address mapping table according to preset configuration information through a routing unit, determining routing resources and paths from the source port to the destination port according to the global address mapping table, and sending the internal standard message based on the routing resources and paths; the global address mapping table comprises a physical address mapping table or a shared virtual address mapping table.
In some embodiments, the uplink sub-module comprises a first queue unit, an identification unit and a mapping unit;
the first queuing unit is used for queuing the internal standard message from the matrix calculation module according to a first preset rule, the identification unit is used for recording an identifier of the dequeued message and storing the identifier in an identifier table, and the mapping unit is used for converting the internal standard message into an interface time sequence corresponding to a target interface controller and sending the interface time sequence to the target interface controller in a flow control message form;
the downlink sub-module comprises a demapping unit, a de-identification unit and a second queue unit;
the de-mapping unit is used for converting message data returned from the target interface controller into the internal standard message, the de-identification unit is used for searching an identifier of an enqueue message and aligning messages belonging to the same destination port, and the second queue unit is used for queuing the internal standard message output by the de-mapping unit according to a second preset rule so as to send the internal standard message to the matrix calculation module.
In some embodiments, the uplink sub-module further comprises one or more combinations of a storage unit, an encoding unit and a scrambling unit;
the storage unit, the coding unit and the scrambling unit are respectively used for carrying out calculation processing, coding processing and interference adding processing on the corresponding message data in the mapping unit according to the corresponding configuration information;
the downlink sub-module also comprises one or more combinations of a storage unit, a decoding unit and a descrambling unit;
the storage unit, the decoding unit and the descrambling unit are respectively used for carrying out calculation processing, decoding processing and filtering interference processing on the corresponding message data in the demapping unit according to the corresponding configuration information.
In a second aspect, an embodiment of the present application provides a storage management method for a many-core system, including:
converting the target message between the corresponding internal protocol and the internal standard protocol through a protocol adaptation module to obtain the internal standard message or message data sent to the corresponding internal protocol bus;
switching the internal standard message from a source port to a destination port according to a port routing mode through a matrix calculation module according to a read-write request;
the internal standard message from the matrix calculation module is processed by identification and mapping through an uplink sub-module in a storage controller module and is sent to a target interface controller; converting message data returned from the target interface controller into the internal standard message through a downlink sub-module in the storage controller module, and sending the internal standard message to the matrix calculation module after demapping and identification resolution processing;
and sending the data read from the target external memory to the target interface controller through the interface controller module, or storing the data acquired from the target interface controller into the target external memory.
In a third aspect, an embodiment of the present application provides an integrated circuit, where the integrated circuit includes multiple processors and multiple memories, and the integrated circuit employs the storage management apparatus of the many-core system to implement data storage management between the multiple processors and the multiple memories.
In a fourth aspect, the present application provides a readable storage medium storing a computer program, which when executed on a processor implements the functions of the modules in the storage management device of the many-core system.
The embodiment of the application has the following beneficial effects:
the storage management device of the many-core system provided by the embodiment of the application designs an architecture which mainly comprises four modules, namely a protocol adaptation module, a matrix calculation module, a storage controller module and an interface controller module, wherein the protocol adaptation module is mainly used for realizing the conversion of messages between different internal protocols and internal standard protocols; the matrix calculation module is used as a core switching module and is mainly used for switching the internal standard message from a source port to a destination port according to a port routing mode; the storage controller module comprises an uplink sub-module and a downlink sub-module which are respectively used for correspondingly processing the uplink message data and the downlink message data; the interface controller module is mainly used for realizing data conversion between the target external memory and the storage controller module so as to meet the requirements of protocols and circuits of the storage interface. The storage management device can adapt to various types of controllers and memories of different interfaces or protocols through the protocol adaptation module and the interface controller, and data handover is carried out through the matrix calculation module so as to achieve short delay and small fluctuation.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 shows an overall framework diagram of a storage management device of a many-core system of an embodiment of the present application;
FIG. 2 is a diagram illustrating an application of a first transmission channel of a protocol adaptation module in a storage management device of a many-core system according to an embodiment of the present application;
FIG. 3 is a diagram illustrating an application of a second transmission channel of a protocol adaptation module in a storage management device of a many-core system according to an embodiment of the present application;
FIG. 4 is a diagram illustrating an application of a matrix computation module in a storage management device of a many-core system according to an embodiment of the present application;
FIG. 5 is a flow diagram illustrating an upstream sub-module of a storage management device of the many-core system according to an embodiment of the present disclosure;
FIG. 6 is a flow diagram illustrating a downstream sub-module of the storage management device of the many-core system according to an embodiment of the present disclosure;
FIG. 7 shows a flowchart of a storage management method of a many-core system according to an embodiment of the present application.
Description of the main element symbols:
110-a protocol adaptation module; 120-a matrix calculation module; 130-a storage controller module; 140-interface controller module.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.
The components of the embodiments of the present application, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present application, are intended to indicate only specific features, numerals, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the presence of or adding to one or more other features, numerals, steps, operations, elements, components, or combinations of the foregoing. Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another, and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the present application belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments and features of the embodiments described below can be combined with each other without conflict.
Under the existing many-core architecture system, the storage management device cannot completely meet the above requirements, and various functional deficiencies or design defects may exist, for example, the functional deficiencies may include but are not limited to including a control interaction flow that does not support address, control, and data separation, and easy deadlock is caused by simply using a RAW (RAW data) parallel data interface; does not support sequencing, QOS (quality of service) functions between multiple internal interfaces; the Chiplet mode is not supported, and the expansion requirement of a processor system cannot be met; the rapid encryption and decryption functions are not supported, and the system safety requirements cannot be met; not supporting a plurality of different kinds of external interfaces; does not support the delay hiding technology, is limited by bandwidth and controller architecture, cannot improve one or more combinations of efficiency and the like. Therefore, the present application provides a new storage management device, which can meet the high-performance and multi-functional storage management requirements in the existing many-core (CPU/GPU) system. The storage management device of the many-core system is described below with reference to specific embodiments.
Fig. 1 shows an architecture diagram of a storage management device of a many-core system according to an embodiment of the present application.
Exemplarily, the storage management device of the many-core system includes four modules, which are a protocol adaptation module 110, a Matrix computation module (Matrix module for short) 120, a storage controller module 130 and an interface controller module 140, where the protocol adaptation module 110 is used to connect different internal protocol buses and adapt protocols of main load packets; the Matrix module 120 is mainly responsible for routing exchange of message data between different ports; the storage controller module 130 is mainly responsible for aggregation of message data and corresponding data processing, and the interface controller module 140 is used for connecting external memories of different interface types and mainly responsible for data conversion of corresponding memory interfaces. It can be understood that the four modules are respectively responsible for corresponding functions, and various storage management requirements of the many-core system can be realized through cooperative cooperation.
In this embodiment, the protocol adaptation module 110 is mainly used for protocol adaptation, and specifically, is used for converting a target packet between different internal protocols and an internal standard protocol to obtain an internal standard packet or packet data sent to a corresponding internal protocol bus. It is understood that the protocol adaptation module 110 is located at the input/output side of the storage management device, and is responsible for interacting with multiple internal protocol buses and uniformly converting multiple different internal protocols into internal standard messages so as to perform corresponding processing in the storage management device. For example, the internal protocol bus may include, but is not limited to, various combinations including AXI3, AXI4, AHB, CXS, HBI, interlaken, RAW, and the like, and may be specifically configured according to actual requirements, so that the storage management apparatus may interface devices of various protocols and interface types.
The internal standard protocol is also called a HighLink protocol, is specially designed for connecting a plurality of CPUs, and a plurality of coprocessors are integrated and used for communication in an SOC system (system on chip), and has the characteristics of low delay and high bandwidth. In this embodiment, a first transmission channel (denoted as a channel) and a second transmission channel (denoted as D channel) are defined in the internal standard protocol according to the bus type, where the first transmission channel is mainly used for transmitting a issued request, and may transmit a request for performing an operation on a specified address range, accessing, or caching data; the second transmission channel is mainly used for transmitting the returned response/message, and can transmit a data response or confirmation message sent to the original requester.
In addition, the internal standard protocol further provides support for multiple functions by adding a header (OverHead) in a preset format, thereby reducing the number of ports and wires of the Matrix module 120, and reducing the design difficulty of the protocol adaptation module 110 and the device. In this embodiment, the design of the OverHead includes three components, wherein the first component is used to describe a unique identifier of a corresponding request or response message; the second component is used for describing the type of the request or response message; the third component is used to describe the processing priority of the request or response message.
For example, in one embodiment, the format of the OverHead may be as shown in table 1 below, where the first component is described using 8 bits, the second component is described using 5 bits, and the third component is described using 2 bits. It is understood that the number of the described bits of each component is not limited, and is only an example, and can be adjusted according to the actual requirement.
TABLE 1 Overhead Format
Label field Means of Description of the invention
ID[7:0] The message or request The obtained ID An identifier identifying the data segment for sorting and operational matching
Type [4: 0] The message or request Type of finding Providing a type of message, comprising: read/write requests; multiple Transactions requests; 3, atomic operation; 4. of the Cache A Hint message; transfer operation of the Cache; ready/Valid message;
Priority [1:0] the message or request Priority of the solution Providing a priority of the messages, the messages being ordered by type and priority,deadlock avoidance
Based on the internal standard protocol, in an embodiment, the protocol adaptation module 110 converts the target packet between the corresponding internal protocol and the internal standard protocol to obtain the internal standard packet, including: the protocol adaptation module 110 is configured to parse a target packet sent by a corresponding internal protocol bus to obtain corresponding data and address content, merge channel data of the data and the address content to obtain a packet body with a uniform format, add a packet header with a preset format to obtain an internal standard packet, send the internal standard packet to a Type packet queue (Type queue) through a first transmission channel (i.e., a channel a), and wait for sending the internal standard packet to the Matrix module 120. It is understood that the conversion into the internal standard protocol is essentially to perform normalization processing on data of other various bus protocols for subsequent uniform processing.
Similarly, for the protocol adaptation module 110 to convert the target packet between the corresponding internal protocol and the internal standard protocol to obtain the packet data of the corresponding internal protocol bus, the method includes: the protocol adaptation module 110 is further configured to send the internal standard packet from the Matrix module 120 to a response message queue through a second transmission channel (i.e., a D channel), remove the packet header in the preset format from the internal standard packet, and perform format conversion on the obtained corresponding packet body according to a corresponding internal protocol, so as to obtain a required response message and send the required response message through a corresponding internal protocol bus. It can be understood that the conversion operation of the message between each internal protocol and the internal standard protocol is essentially a forward processing and reverse processing procedure.
For further understanding, the overall block diagram and operation flow of the protocol adaptation module 110 are as follows, taking an AXI4 protocol bus as an example here:
as shown in fig. 2, when a request needs to be processed, a message or a command sent based on an AXI4 protocol is first analyzed to obtain corresponding data and addresses (such as WADDR, WDATA, RADDR signals shown in fig. 2), then normalization operation is performed, that is, the obtained data and addresses are merged through channel data to obtain a packet body with a uniform format, and then a corresponding overflow is added according to the format in the table, wherein the unique ID is set as an identifier of the request operation, and different protocol types are mapped to different Type types; and converting Qos information carried by the AXI4 protocol, and the like. And finally, the combined message header and the message body are used as internal standard messages obtained by conversion, and then the internal standard messages are sent into a Type queue according to the Type to wait for processing. When the dequeue decision is completed, the internal standard packet obtained through conversion is sent to the Matrix interface, so that the Matrix module 120 performs the next processing.
The Type queue uses two-stage decision to determine the message processing order with different priorities, specifically, the first priority processing satisfies: ready/Valid message > request; the second Priority processing obeys the Priority field in the message header: 3 > 2 > 1 > 0. Further optionally, each Type queue further supports an Aging function and a priority turning function, that is, after the queue head Aging is overtime, the priority is automatically turned over, and a message with the overtime Aging needs to be processed first, so that deadlock is avoided.
As shown in fig. 3, for the returned response/message, the protocol adaptation module 110 receives the internal standard message from the Matrix module 120 from the Matrix interface, and then sends the internal standard message to the response message queue through the D channel to wait for processing; the response message queue queues according to the Type of the response message and the Qos information, is consistent with a priority processing mechanism (and an Aging function) of the channel A, and is handed to a next-stage module to complete protocol conversion after being subjected to two-stage priority judgment. Specifically, the OverHead of the internal standard message of the D channel is removed, the mapping from the Type to the AXI4 protocol and the mapping from Priority to Qos information are completed according to the header of the message, and then the format conversion and channel separation are performed on the content, such as data and address, which are internally included and subjected to merging processing, by using the remaining message body with a uniform format according to the protocol format corresponding to the AXI4 interface, so as to obtain the required response message (e.g., WRESP, RDATA message shown in fig. 3) conforming to the transmission of the AXI4 interface; and finally, sending the data out through an AXI4 interface.
The Matrix module 120 is used as a core switching module of message data, and is configured to switch an internal standard message from a source port to a destination port according to a port routing mode according to a read-write request, where the read-write request usually includes information of the source port and the destination port of the message data to be processed.
The two roles of Master and Slave are defined in the internal standard protocol, and considering that the number of the Master and Slave ports is usually unequal, the number of the Master type ports is much greater than that of the Slave type ports, as shown in fig. 4, so the Matrix module 120 adopts a multi-stage flexible route crossing mode to realize data interaction between different ports.
In one embodiment, the switching, by a Matrix module, of an internal standard packet from a source port to a destination port in a port routing mode according to a read-write request includes:
and acquiring a global address mapping table according to preset configuration information through the routing unit, determining routing resources and paths from the source port to the destination port according to the global address mapping table, and sending the internal standard message based on the routing resources and paths. The global address mapping table may be a physical address mapping table of each external memory, or a shared virtual address mapping table, for example, if a physical address is searched, a route exit may be directly searched; if the shared virtual address is searched, a virtual address translation Table (TLB) needs to be searched first, and then the TLB corresponds to the route outlet according to the search result.
It can be understood that, the number of connections between the routing unit in the Matrix module 120 and the next level of cross routing node is configurable, and the system address mapping and the port mapping are also determined at the beginning of design, so that the number and the connection mode of cross routing nodes at each level are dynamically generated through configuration, thereby implementing the maximum optimization of routing resources and routing paths. As a preferred scheme, message data will be sent from an input port of Matrix to an output port of Matrix through at most three levels of cross-routing nodes, so as to ensure that low-latency and high-bandwidth data traffic is provided.
It can be appreciated that since many modules such as CPU/GPU, coprocessor and DMA are available in the conventional many-core system, data interaction between multiple ports is handled by the Matrix module 120 to provide low latency and high bandwidth data traffic.
The storage controller module 130 mainly performs functions such as alignment, transaction conversion, out-of-order processing, qos selection, and the like, thereby ensuring that the storage operation can be sent to the corresponding storage interface. In one embodiment, the storage controller module 130 includes an uplink sub-module and a downlink sub-module, where the uplink sub-module is configured to identify and map an internal standard packet from the Matrix module 120, and send the internal standard packet to a target interface controller; the downlink sub-module is used for converting the message data returned from the target interface controller into an internal standard message, and sending the internal standard message to the Matrix module 120 after demapping and de-identification processing;
in an embodiment, as shown in fig. 5, the uplink sub-module includes a first queue unit, an identification (Tagging) unit, and a Mapping (Mapping) unit, where the first queue unit is configured to queue the internal standard packet from the Matrix module 120 according to a first preset rule; the Tagging unit is used for recording an identifier of the dequeue message and storing the identifier in an identifier table (Tagging table), and further used for downlink alignment operation and Reorder (rearrangement) function; the Mapping unit is used for converting the internal standard message into an interface time sequence corresponding to the target interface controller and sending the interface time sequence to the target interface controller in a flow control (FlowControl) message form.
In an optional scheme, for a first preset rule in a first queue unit, queuing may be performed according to triple attributes of a source port, a Type, and Qos information, respectively, such as a source port queue, a Type queue, and a Qos queue shown in fig. 5, where a dequeue decision principle of the queue unit is: (1) the QOS queue is in high-low order; (2) The Type queue is in the order of Ready/Valid message > request; (3) And the source port queue is in other sequence of Cache > SRAM > CPU > DMA >, wherein if the Aging function is supported, the priority can be supported to be turned over after the Aging is overtime, so that dequeuing is preferentially carried out, and deadlock is avoided.
As another optional scheme, the Tagging unit may also suspend the storage unit, for example, if it is determined in the packet that computation is required for data in the packet, the storage unit may complete computation in the storage controller module 130 and then transmit the computation to the next part, for example, the storage unit may include but is not limited to performing operation operations such as accumulation and bit computation, and may be specifically set according to actual requirements. In addition, the Mapping unit can also be used for side hanging, such as an encoding unit and/or a scrambling unit, and the like, so as to support the provision of a simple ECC (check) function for data, such as a Parity check and the like; and processing such as encoding data and/or adding interference according to the corresponding configuration information. For example, when some data needs to be encrypted and stored, processing such as hamming code encoding can be performed by the encoding unit, wherein the hamming code can be used for correcting 1-bit errors, and a certain read-write gain, data integrity and reliability of a high-speed memory interface are provided.
In an embodiment, as shown in fig. 6, the downlink sub-module includes a demapping (Unmapping) unit, a de-identification (unnmarking) unit, and a second queue unit; the Unmapping unit is used for converting message data returned from the target interface controller into an internal standard message; the Untagging unit is used for searching an Identifier (ID) and a type of an enqueue message, so that messages belonging to the same destination port are aligned, reordered and the like according to the ID and the type, and are spliced into a complete operation and the like; and the second queue unit is configured to queue the internal standard packet output by the demapping unit according to a second preset rule to send to the Matrix module 120.
In an optional scheme, for a second preset rule in a second queue unit, queuing may be performed according to triple attributes of a destination port, a Type, and Qos information, respectively, such as a destination port queue, a Type queue, and a Qos queue shown in fig. 6, where a dequeue decision rule of the queue unit is: (1) the QOS queues are in high-low order; (2) The Type queue is in the order of Ready/Valid message > request; (3) And the destination port queue is in other sequence of Cache > SRAM > CPU > DMA >, wherein if the Aging function is supported, the priority can be supported to be turned over after the Aging is overtime, so that dequeue is preferentially carried out. It can be understood that the second queue unit is basically consistent with the dequeue decision principle of the first queue unit, and the difference is that the upstream sub-module is based on the source port, and the downstream sub-module is based on the destination port.
As another alternative, the Unmapping unit may be configured to flank, for example, a decoding unit and/or a descrambling unit, where the decoding unit corresponds to the coding unit, and the scrambling unit and the descrambling unit correspond to each other, for example, if hamming code coding or other processing is adopted, decoding processing may be performed here; if the interference is increased, the descrambling unit may perform processing such as filtering interference. In addition, the Tagging unit may also suspend a storage unit or the like, and if it is determined in the packet that the data requires computation, the data is transmitted to the next part after the computation operation is completed.
It can be understood that, by setting the storage unit in the storage controller module 130, that is, by completing the calculation of frequently reading and writing the memory at a place near the storage controller, the execution instruction of the CPU/GPU, the refresh operation of the Cache (Cache), and the read and write operation of the SRAM (static random access memory) can be effectively reduced, thereby improving the execution efficiency of the processor core, reducing the Miss operation of the Cache, and bringing about the benefits of reducing the delay and improving the running bandwidth.
The interface controller module 140 is mainly responsible for converting data of the memory controller into protocols and circuit requirements of different external memory interfaces for interface modules of different external memory types. Specifically, the interface controller module 140 is used to send data read from the target external memory to the target interface controller, or store data obtained from the target interface controller in the target external memory. For example, the types of the external memory may include, but are not limited to, HBM (memory used in video card, i.e. video memory), DDR (double data rate synchronous dynamic random access memory), GDDR (DDR used in video card), and the like. Accordingly, the supported internal storage interface may include, but is not limited to, SRAM, cache, and the like, and may be specifically set according to an actual requirement, which is not limited herein.
The storage management device of the many-core system in the embodiment of the application can adapt to various controllers and memories of different interfaces or protocols by designing a framework formed by four modules, namely the protocol adaptation module 110, the Matrix module 120, the storage controller module 130 and the interface controller module 140, so that the delay is as short as possible and the fluctuation is small, and in addition, functions of storage, encoding and decoding and the like, and rapid encryption and decryption functions can be set; the control interaction process of address, control and data separation is supported, deadlock is avoided, and time sequence is improved; and the configurable storage and computation integrated logic is supported, the addition, multiplication, logic bit computation operation and the like are supported, the requirements of a high-performance and multifunctional storage management device in the many-core system can be met, and the complexity of the whole many-core system can be reduced.
The embodiment of the present application further provides a storage management method for a many-core system, which is applied to a storage management device for the many-core system, where the device includes a protocol adaptation module 110, a matrix computation module 120, a storage controller module 130, and an interface controller module 140, where the protocol adaptation module 110 is used to connect different internal protocol buses, and the interface controller module 140 is used to connect external memories of different interface types.
As shown in fig. 7, the storage management method of the many-core system exemplarily includes:
step S110, the protocol adaptation module 110 converts the target packet between the corresponding internal protocol and the internal standard protocol to obtain the internal standard packet or the packet data sent to the corresponding internal protocol bus.
Step S120, the matrix calculation module 120 switches the internal standard packet from the source port to the destination port according to the port routing mode according to the read-write request.
Step S130, the internal standard message from the matrix calculation module is processed by identification and mapping through an uplink sub-module in the storage controller module 130 and is sent to a target interface controller; and the message data returned from the target interface controller is converted into an internal standard message through a downlink sub-module in the storage controller module 130, and is sent to the matrix calculation module after being processed through demapping and de-identification.
In step S140, the data read from the target external memory through the interface controller module 140 is sent to the target interface controller, or the data acquired from the target interface controller is stored in the target external memory.
It is to be understood that the method steps of the present embodiment correspond to the functions of the modules in the apparatus of the above embodiment, and the options in the above embodiment are also applicable to the present embodiment, so that the description is not repeated here.
The present application further provides an integrated circuit, which may be a System On Chip (SOC) or the like, exemplarily including a plurality of processors and a plurality of memories, wherein the integrated circuit employs the storage management device of the many-core system in the above embodiments to implement data storage management between the plurality of processors and the plurality of memories.
Wherein the processor may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor including at least one of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and a Network Processor (NP), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The Memory may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like.
The present application also provides a readable storage medium storing a computer program that, when executed on a processor, implements the functions of the respective modules in the storage management device of the many-core system of the above embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims (8)

1. A storage management apparatus of a many-core system, comprising:
the protocol adaptation module is used for connecting different internal protocol buses so as to convert the target message between the corresponding internal protocol and the internal standard protocol to obtain the internal standard message or the message data sent to the corresponding internal protocol bus;
the matrix calculation module is used for switching the internal standard message from a source port to a destination port according to a port routing mode according to the read-write request;
the storage controller module comprises an uplink sub-module and a downlink sub-module, and the uplink sub-module is used for identifying and mapping the internal standard message from the matrix calculation module and sending the internal standard message to a target interface controller; the downlink sub-module is used for converting message data returned from the target interface controller into the internal standard message, and sending the internal standard message to the matrix calculation module after demapping and de-identification processing;
the interface controller module is used for connecting external memories with different interface types so as to send data read from a target external memory to the target interface controller or store data acquired from the target interface controller into the target external memory;
the internal standard protocol defines a first transmission channel and a second transmission channel according to the bus type, wherein the first transmission channel is used for transmitting a transmitted request, and the second transmission channel is used for transmitting a returned response message;
the protocol adaptation module is used for converting the target message between the corresponding internal protocol and the internal standard protocol to obtain the internal standard message, and comprises:
the protocol adaptation module is used for analyzing a target message sent by a corresponding internal protocol bus to obtain corresponding data and address contents, merging channel data of the data and the address contents to obtain a message body with a uniform format, adding a message header with a preset format to obtain an internal standard message, sending the internal standard message to a type message queue through the first transmission channel, and waiting to send the internal standard message to the matrix calculation module;
the message header in the preset format comprises three components, wherein the first component is used for describing a unique identifier of the request or response message; a second component for describing the type of the request or response message; the third component is for describing a processing priority of the request or response message.
2. The storage management device of a many-core system according to claim 1, wherein the protocol adaptation module is configured to convert the target packet between the corresponding internal protocol and the internal standard protocol to obtain the packet data of the corresponding internal protocol bus, and comprises:
the protocol adaptation module is also used for sending the internal standard message from the matrix calculation module into a response message queue through the second transmission channel, then removing the message header from the internal standard message and carrying out format conversion on the obtained corresponding message body according to the corresponding internal protocol so as to obtain the required response message and sending the required response message out through the corresponding internal protocol bus.
3. The storage management device of many-core system of claim 1, wherein the read-write request contains information of the source port and the destination port of the target packet; the matrix calculation module is configured to switch the internal standard packet from a source port to a destination port according to a port routing mode according to the read-write request, and includes:
the matrix calculation module is used for acquiring a global address mapping table according to preset configuration information through a routing unit, determining routing resources and paths from the source port to the destination port according to the global address mapping table, and further sending the internal standard message based on the routing resources and paths; the global address mapping table comprises a physical address mapping table or a shared virtual address mapping table.
4. The storage management device of the many-core system of claim 1, wherein the uplink submodule comprises a first queue unit, an identification unit and a mapping unit;
the first queuing unit is used for queuing the internal standard message from the matrix calculation module according to a first preset rule, the identification unit is used for recording an identifier of the dequeued message and storing the identifier in an identifier table, and the mapping unit is used for converting the internal standard message into an interface time sequence corresponding to a target interface controller and sending the interface time sequence to the target interface controller in a flow control message form;
the downlink sub-module comprises a demapping unit, a de-identification unit and a second queue unit;
the de-mapping unit is used for converting message data returned from the target interface controller into the internal standard message, the de-identification unit is used for searching an identifier of an enqueue message and aligning messages belonging to the same target port, and the second queue unit is used for queuing the internal standard message output by the de-mapping unit according to a second preset rule so as to send the internal standard message to the matrix calculation module.
5. The storage management device of many-core system of claim 4, wherein the uplink sub-module further comprises one or more combinations of a storage unit, an encoding unit, and a scrambling unit;
the storage unit, the coding unit and the scrambling unit are respectively used for carrying out calculation processing, coding processing and interference adding processing on the corresponding message data in the mapping unit according to the corresponding configuration information;
the downlink sub-module also comprises one or more combinations of a storage unit, a decoding unit and a descrambling unit;
the storage unit, the decoding unit and the descrambling unit are respectively used for carrying out calculation processing, decoding processing and filtering interference processing on the corresponding message data in the demapping unit according to the corresponding configuration information.
6. A storage management method for a many-core system, comprising:
converting the target message between the corresponding internal protocol and the internal standard protocol through a protocol adaptation module to obtain the internal standard message or message data sent to the corresponding internal protocol bus;
switching the internal standard message from a source port to a destination port according to a port routing mode through a matrix calculation module according to a read-write request;
the internal standard message from the matrix calculation module is processed by identification and mapping through an uplink sub-module in a storage controller module and is sent to a target interface controller; converting message data returned from the target interface controller into the internal standard message through a downlink sub-module in the storage controller module, and sending the internal standard message to the matrix calculation module after demapping and de-identification processing;
sending data read from a target external memory to the target interface controller through an interface controller module, or storing data acquired from the target interface controller into the target external memory;
the internal standard protocol defines a first transmission channel and a second transmission channel according to the bus type, wherein the first transmission channel is used for transmitting a transmitted request, and the second transmission channel is used for transmitting a returned response message;
the converting the target message between the corresponding internal protocol and the internal standard protocol through the protocol adaptation module to obtain the internal standard message includes:
analyzing a target message sent by a corresponding internal protocol bus to obtain corresponding data and address contents, merging channel data of the data and the address contents to obtain a message body with a uniform format, adding a message header with a preset format to obtain an internal standard message, sending the internal standard message to a type message queue through the first transmission channel, and waiting for sending the internal standard message to the matrix calculation module;
the message header of the preset format comprises three components, wherein the first component is used for describing a unique identifier of the request or response message; a second component for describing the type of the request or response message; the third component is used to describe the processing priority of the request or response message.
7. An integrated circuit comprising a plurality of processors, a plurality of memories, wherein the integrated circuit employs the storage management apparatus of a many-core system as claimed in any of claims 1 to 5 to implement data storage management between the plurality of processors and the plurality of memories.
8. A readable storage medium, characterized in that it stores a computer program that, when executed on a processor, implements the functionality of the various modules in a storage management device of a many-core system according to any of claims 1-5.
CN202211533104.XA 2022-12-02 2022-12-02 Storage management method and device of many-core system and integrated circuit Active CN115633098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211533104.XA CN115633098B (en) 2022-12-02 2022-12-02 Storage management method and device of many-core system and integrated circuit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211533104.XA CN115633098B (en) 2022-12-02 2022-12-02 Storage management method and device of many-core system and integrated circuit

Publications (2)

Publication Number Publication Date
CN115633098A CN115633098A (en) 2023-01-20
CN115633098B true CN115633098B (en) 2023-03-31

Family

ID=84910304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211533104.XA Active CN115633098B (en) 2022-12-02 2022-12-02 Storage management method and device of many-core system and integrated circuit

Country Status (1)

Country Link
CN (1) CN115633098B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103441573A (en) * 2013-08-01 2013-12-11 国家电网公司 Network processor based on standard IEC61850
CN104317770A (en) * 2014-10-28 2015-01-28 天津大学 Data storage structure and data access method for multiple core processing system
WO2016165421A1 (en) * 2015-09-21 2016-10-20 中兴通讯股份有限公司 Method and apparatus for converting different interface protocol messages
CN110380970A (en) * 2019-07-22 2019-10-25 北京邮电大学 A kind of self-adapting data message forwarding method and device suitable for heterogeneous network
WO2020119430A1 (en) * 2018-12-14 2020-06-18 深圳壹账通智能科技有限公司 Protocol interface test method, device, computer device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103441573A (en) * 2013-08-01 2013-12-11 国家电网公司 Network processor based on standard IEC61850
CN104317770A (en) * 2014-10-28 2015-01-28 天津大学 Data storage structure and data access method for multiple core processing system
WO2016165421A1 (en) * 2015-09-21 2016-10-20 中兴通讯股份有限公司 Method and apparatus for converting different interface protocol messages
WO2020119430A1 (en) * 2018-12-14 2020-06-18 深圳壹账通智能科技有限公司 Protocol interface test method, device, computer device and storage medium
CN110380970A (en) * 2019-07-22 2019-10-25 北京邮电大学 A kind of self-adapting data message forwarding method and device suitable for heterogeneous network

Also Published As

Publication number Publication date
CN115633098A (en) 2023-01-20

Similar Documents

Publication Publication Date Title
CN110647480B (en) Data processing method, remote direct access network card and equipment
US7797467B2 (en) Systems for implementing SDRAM controllers, and buses adapted to include advanced high performance bus features
US8913616B2 (en) System-on-chip-based network protocol in consideration of network efficiency
US7155554B2 (en) Methods and apparatuses for generating a single request for block transactions over a communication fabric
US6757768B1 (en) Apparatus and technique for maintaining order among requests issued over an external bus of an intermediate network node
US6832279B1 (en) Apparatus and technique for maintaining order among requests directed to a same address on an external bus of an intermediate network node
US20160110301A1 (en) Inline PCI-IOV Adapter
US20200081850A1 (en) Unified address space for multiple hardware accelerators using dedicated low latency links
US7277975B2 (en) Methods and apparatuses for decoupling a request from one or more solicited responses
US20040024947A1 (en) Buffering non-posted read commands and responses
CN110297797B (en) Heterogeneous protocol conversion device and method
CN111078609B (en) PCIe-to-three bus interface and method based on FPGA
US10146468B2 (en) Addressless merge command with data item identifier
US20110153875A1 (en) Opportunistic dma header insertion
CN112882986B (en) Application method of many-core processor with supernode and supernode controller
TW201138379A (en) Directly providing data messages to a protocol layer
US9846662B2 (en) Chained CPP command
US6466993B1 (en) Method and apparatus for performing transactions rendering between host processors and I/O devices using concurrent non-blocking queuing techniques and I/O bus write operations
GB2377138A (en) Ring Bus Structure For System On Chip Integrated Circuits
CN115633098B (en) Storage management method and device of many-core system and integrated circuit
JP2022510803A (en) Memory request chain on the bus
JP4432388B2 (en) I / O controller
CN116991793B (en) Data transmission chip, method and server
US20050289280A1 (en) Switching fabric bridge
US20230214345A1 (en) Multi-node memory address space for pcie devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant