CN113868109B - Method, apparatus, device and readable medium for evaluating performance of multiprocessor interconnection - Google Patents

Method, apparatus, device and readable medium for evaluating performance of multiprocessor interconnection Download PDF

Info

Publication number
CN113868109B
CN113868109B CN202111158110.7A CN202111158110A CN113868109B CN 113868109 B CN113868109 B CN 113868109B CN 202111158110 A CN202111158110 A CN 202111158110A CN 113868109 B CN113868109 B CN 113868109B
Authority
CN
China
Prior art keywords
data
delay
transmission path
transmission
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111158110.7A
Other languages
Chinese (zh)
Other versions
CN113868109A (en
Inventor
邹晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202111158110.7A priority Critical patent/CN113868109B/en
Publication of CN113868109A publication Critical patent/CN113868109A/en
Application granted granted Critical
Publication of CN113868109B publication Critical patent/CN113868109B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling

Abstract

The invention provides a method, a device, equipment and a readable medium for evaluating the performance of multiprocessor interconnection access, wherein the method comprises the following steps: modeling multiprocessor interconnection access sharing a Cache consistency protocol, acquiring transmission path information of each access request, and quantifying the length of a path based on the transmission path information; acquiring delay data of each transmission stage in the transmission path according to process delay model data of each processing component in the transmission path; and evaluating the system performance according to the acquired delay data of each transmission stage in the transmission path. By using the scheme of the invention, the delay and the bandwidth of local and remote access of the multiprocessor under a specific consistency protocol can be measured and calculated, and the access delay and the bandwidth between nodes of the system can be quantitatively evaluated at the beginning of the collaborative chip design.

Description

Method, apparatus, device and readable medium for evaluating performance of multiprocessor interconnection
Technical Field
The present invention relates to the field of computers, and more particularly, to a method, apparatus, device, and readable medium for performance evaluation of multiprocessor interconnect access.
Background
The shared memory multiprocessor is an important structure in a computer system structure, and is interconnected through a special processor point-to-point consistency protocol interface, a complex topological structure and an interconnection network, so that the consistency interconnection of the multiprocessor and the sharing of the global memory are realized. For the shared memory multiprocessor, the multiprocessor system can be realized by directly connecting the multiprocessor or through the collaborative chip agent according to the number of the interfaces of the processor and the condition that the interfaces support the consistency protocol forwarding function. For example, in IBM's Power mini-machine, the Power processor has enough direct connection ports for the processor, so that the 16-way system can be realized by direct connection of the processor. The Intel Xeon processor usually has only 3 processor interconnection interfaces (QPI or UPI), the interfaces support multi-hop consistency protocol forwarding, and 8-way systems can be realized maximally based on the Xeon processor, so that when a 16-way system is realized, interface expansion and message forwarding are realized by taking a processor cooperation chip as an intermediate agent, and a large-scale multi-processor system with more than 16 ways can be constructed based on the cooperation chip. The processor cooperation chip is mainly used for realizing the processing and forwarding of Cache consistency protocol messages among multiple processors according to the consistency protocol of the processors, and a processor interface needs to be designed on the side facing the processors, and an interconnection network interface needs to be designed on the side facing other processors. In the initial stage of the design of the processor and the cooperative chip, the chip is not engineering, so that the delay and the bandwidth between nodes of the multiprocessor system cannot be tested.
Disclosure of Invention
In view of the above, an object of the embodiments of the present invention is to provide a method, apparatus, device and readable medium for evaluating performance of multiprocessor interconnection access, by using the technical solution of the present invention, it is able to implement measurement and calculation of delay and bandwidth of local and remote access to a multiprocessor under a specific coherence protocol, and it is able to implement quantitative evaluation of access delay and bandwidth between nodes of a system at the beginning of collaborative chip design.
In view of the foregoing, an aspect of an embodiment of the present invention provides a method for performance evaluation of multiprocessor interconnect accesses, comprising the steps of:
Modeling multiprocessor interconnection access sharing a Cache consistency protocol, acquiring transmission path information of each access request, and quantifying the length of a path based on the transmission path information;
Acquiring delay data of each transmission stage in the transmission path according to process delay model data of each processing component in the transmission path;
And evaluating the system performance according to the acquired delay data of each transmission stage in the transmission path.
According to one embodiment of the invention, system performance includes remote access to total delay data and bandwidth data.
According to one embodiment of the present invention, evaluating system performance based on acquired delay data for each transmission stage in a transmission path includes:
delay data of each stage in the transmission path information of each access request is added to obtain total delay data of each access request.
According to one embodiment of the present invention, evaluating system performance based on acquired delay data for each transmission stage in a transmission path includes:
obtaining the maximum request book and the data packet size accessed between processors;
Bandwidth data is calculated using the following formula:
bandwidth = maximum number of requests sent × packet size/(total delay + maximum number of requests minus 1).
In another aspect of an embodiment of the present invention, there is provided an apparatus for evaluating performance of multiprocessor interconnect access, the apparatus including:
the modeling module is configured to model the multiprocessor interconnection accesses sharing the Cache consistency protocol, acquire the transmission path information of each access request and quantify the length of the path based on the transmission path information;
The acquisition module is configured to acquire delay data of each transmission stage in the transmission path according to the process delay model data of each processing component in the transmission path;
and the evaluation module is configured to evaluate the system performance according to the acquired delay data of each transmission stage in the transmission path.
According to one embodiment of the invention, system performance includes remote access to total delay data and bandwidth data.
According to one embodiment of the invention, the evaluation module is further configured to:
delay data of each stage in the transmission path information of each access request is added to obtain total delay data of each access request.
According to one embodiment of the invention, the evaluation module is further configured to:
obtaining the maximum request book and the data packet size accessed between processors;
Bandwidth data is calculated using the following formula:
bandwidth = maximum number of requests sent × packet size/(total delay + maximum number of requests minus 1).
In another aspect of the embodiments of the present invention, there is also provided a computer apparatus including:
At least one processor; and
And a memory storing computer instructions executable on the processor, the instructions when executed by the processor performing the steps of any of the methods described above.
In another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of any of the methods described above.
The invention has the following beneficial technical effects: according to the method for evaluating the performance of the multiprocessor interconnection access, which is provided by the embodiment of the invention, the multiprocessor interconnection access sharing the Cache consistency protocol is modeled, the transmission path information of each access request is obtained, and the length of a path is quantized based on the transmission path information; acquiring delay data of each transmission stage in the transmission path according to process delay model data of each processing component in the transmission path; according to the technical scheme for evaluating the system performance according to the acquired delay data of each transmission stage in the transmission path, the delay and the bandwidth of local and remote access of the multiprocessor under a specific consistency protocol can be measured and calculated, and the quantitative evaluation of the access delay and the bandwidth between nodes of the system can be realized at the beginning of collaborative chip design.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart diagram of a method of performance evaluation of multiprocessor interconnect accesses, according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an apparatus for performance evaluation of multiprocessor interconnect accesses, according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a computer device according to one embodiment of the invention;
Fig. 4 is a schematic diagram of a computer-readable storage medium according to one embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
With the above object in view, in a first aspect, an embodiment of a method for performance evaluation of multiprocessor interconnect accesses is presented. Fig. 1 shows a schematic flow chart of the method.
As shown in fig. 1, the method may include the steps of:
s1, modeling multiprocessor interconnection access sharing a Cache consistency protocol, acquiring transmission path information of each access request, and quantifying the length of a path based on the transmission path information.
By modeling and analyzing the multiprocessor shared Cache consistency protocol, path information of various access transmission processes can be obtained. For example, for a catalogue remote access request under a certain consistency protocol, the path accessed by the pen can be obtained according to the protocol jump process, and the lengths of various access transmission paths are quantized by taking a single Flit transmission delay as a unit through analyzing the protocol jump and transmission interface paths, wherein the Flit unit refers to the time for transmitting and processing one Flit message data packet.
S2, delay data of each transmission stage in the transmission path are obtained according to process delay model data of each processing component in the transmission path.
And assigning values to each stage of the memory access process based on the actual delay data of the specific processor interface, the realization process delay model data adopted by the cooperative chip, the cooperative chip frequency and the like. For example, in a specific implementation of a cooperative chip, according to the module division of the chip, by combining an operation frequency and a process delay model, delay is assumed, and assumed data is obtained through comprehensive calculation of a design and a reference model, as shown in the following table:
TABLE 1 delay time of each processing element
Processing component By time delay
Physical layer 15ns
Link layer 15ns
Distributing device 5ns
Remote protocol processing 15ns
On-chip switching 9ns
Network interface 15ns
Network interface link layer 25ns
Native protocol processing 15ns
Accessing CPU memory 60ns
Collaborative chip directory Cache hit latency 5ns
Monitoring a node delay 50ns
And S3, evaluating the system performance according to the acquired delay data of each transmission stage in the transmission path.
And calculating the total path delay and the local and remote average delay of various memory access transactions through path delay accumulation, wherein the data volume obtained in unit time divided by time is the bandwidth, namely the bandwidth=maximum transmission request number is the data packet size/(the total delay+the maximum request number minus 1). For example, under the assumption of delay in a co-chip, the maximum number of requests is 1024, the packet size is 64 bytes, and the total delay is 725ns, then the remote bandwidth of a single CPU through a single co-chip is approximately: 1024 x 64 byte/(725+1023) =37 GB/s.
The processor cooperation chip is an interconnection chip based on a multi-layer structure design of a Cache consistency protocol among multiple processors. The complete Cache coherence protocol generally comprises a plurality of layers of sub-protocols, and must comprise a protocol layer, a link layer, a physical layer, a transmission layer, a routing layer and the like for transmitting and forwarding protocol messages. The architecture of a processor co-chip generally includes a physical layer (responsible for interconnection with a processor), a link layer (responsible for packaging, streaming and forwarding messages to the physical layer, or receiving from the physical layer), a dispatch module (responsible for dispatching interface messages to or from protocol processing modules), a protocol processing engine (typically implemented by dividing the protocol processing engine into multiple protocol processing pipelines according to remote and local agents), on-chip switching (routing, switching of messages to the protocol processing modules and network interfaces), a network interface (an interconnection interface between system nodes through which the co-chip interconnects with other co-chips).
The invention mainly aims to solve the problem that the performance evaluation of the multiprocessor has no effective method in the initial stage of the collaborative chip design of the processor. The multi-processor system, especially the multi-processor system with the collaborative chip, can carry out quantitative evaluation on the access delay and the bandwidth among the nodes of the system at the beginning of the collaborative chip design, and the evaluation result can be used as an important evaluation basis for the feasibility of the design scheme.
According to the technical scheme, the delay and the bandwidth of local and remote access of the multiprocessor under the specific consistency protocol can be measured and calculated, and the access delay and the bandwidth between the nodes of the system can be quantitatively evaluated at the beginning of the collaborative chip design.
In a preferred embodiment of the invention, system performance includes remote access to total delay data and bandwidth data.
In a preferred embodiment of the present invention, evaluating the system performance based on the acquired delay data for each transmission stage in the transmission path comprises:
Delay data of each stage in the transmission path information of each access request is added to obtain total delay data of each access request. And calculating the total path delay and the local and remote average delays of various access transactions through path delay accumulation. For example, under the assumption of delay in a co-chip, the total delay for remote access is: 725ns. Including the following delay sums: READCLEAN TO RDEX delay: 135ns; RDEX to readclean delay: 200ns; compdata _ uc to PGER delay: 170ns; PGER to compdata _uc delay: 135ns; compack delay: 85ns.
In a preferred embodiment of the present invention, evaluating the system performance based on the acquired delay data for each transmission stage in the transmission path comprises:
obtaining the maximum request book and the data packet size accessed between processors;
Bandwidth data is calculated using the following formula:
Bandwidth = maximum number of requests sent × packet size/(total delay + maximum number of requests minus 1). The data amount obtained in unit time divided by time is the bandwidth, that is, the bandwidth=maximum number of transmission requests. For example, under the assumption of delay in a co-chip, the maximum number of requests is 1024, the packet size is 64 bytes, and the total delay is 725ns, then the remote bandwidth of a single CPU through a single co-chip is approximately: 1024 x 64 byte/(725+1023) =37 GB/s.
According to the technical scheme, the delay and the bandwidth of local and remote access of the multiprocessor under the specific consistency protocol can be measured and calculated, and the access delay and the bandwidth between the nodes of the system can be quantitatively evaluated at the beginning of the collaborative chip design.
It should be noted that, it will be understood by those skilled in the art that all or part of the procedures in implementing the methods of the above embodiments may be implemented by a computer program to instruct related hardware, and the above program may be stored in a computer readable storage medium, and the program may include the procedures of the embodiments of the above methods when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random access memory (Random Access Memory, RAM), or the like. The computer program embodiments described above may achieve the same or similar effects as any of the method embodiments described above.
Furthermore, the method disclosed according to the embodiment of the present invention may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. When executed by a CPU, performs the functions defined above in the methods disclosed in the embodiments of the present invention.
With the above object in view, in a second aspect, an apparatus for evaluating performance of multiprocessor interconnection accesses is provided, as shown in fig. 2, where an apparatus 200 includes:
The modeling module 201, the modeling module 201 is configured to model the multiprocessor interconnection access sharing the Cache consistency protocol, obtain the transmission path information of each access request, and quantify the length of the path based on the transmission path information;
an acquisition module 202, the acquisition module 202 being configured to acquire delay data for each transmission stage in the transmission path according to the process delay model data for each processing component in the transmission path;
And an evaluation module 203, wherein the evaluation module 203 is configured to evaluate the system performance according to the acquired delay data of each transmission stage in the transmission path.
In a preferred embodiment of the invention, system performance includes remote access to total delay data and bandwidth data.
In a preferred embodiment of the invention, the evaluation module is further configured to:
delay data of each stage in the transmission path information of each access request is added to obtain total delay data of each access request.
In a preferred embodiment of the invention, the evaluation module is further configured to:
obtaining the maximum request book and the data packet size accessed between processors;
Bandwidth data is calculated using the following formula:
bandwidth = maximum number of requests sent × packet size/(total delay + maximum number of requests minus 1).
Based on the above object, a third aspect of the embodiments of the present invention proposes a computer device. Fig. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention. As shown in fig. 3, an embodiment of the present invention includes the following means: at least one processor 21; and a memory 22, the memory 22 storing computer instructions 23 executable on the processor, the instructions when executed by the processor performing the method of:
Modeling multiprocessor interconnection access sharing a Cache consistency protocol, acquiring transmission path information of each access request, and quantifying the length of a path based on the transmission path information;
Acquiring delay data of each transmission stage in the transmission path according to process delay model data of each processing component in the transmission path;
And evaluating the system performance according to the acquired delay data of each transmission stage in the transmission path.
In a preferred embodiment of the invention, system performance includes remote access to total delay data and bandwidth data.
In a preferred embodiment of the present invention, evaluating the system performance based on the acquired delay data for each transmission stage in the transmission path comprises:
delay data of each stage in the transmission path information of each access request is added to obtain total delay data of each access request.
In a preferred embodiment of the present invention, evaluating the system performance based on the acquired delay data for each transmission stage in the transmission path comprises:
obtaining the maximum request book and the data packet size accessed between processors;
Bandwidth data is calculated using the following formula:
bandwidth = maximum number of requests sent × packet size/(total delay + maximum number of requests minus 1).
Based on the above object, a fourth aspect of the embodiments of the present invention proposes a computer-readable storage medium. Fig. 4 is a schematic diagram of an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 4, the computer-readable storage medium S31 stores a computer program S32 that, when executed by a processor, performs the following method:
Modeling multiprocessor interconnection access sharing a Cache consistency protocol, acquiring transmission path information of each access request, and quantifying the length of a path based on the transmission path information;
Acquiring delay data of each transmission stage in the transmission path according to process delay model data of each processing component in the transmission path;
And evaluating the system performance according to the acquired delay data of each transmission stage in the transmission path.
In a preferred embodiment of the invention, system performance includes remote access to total delay data and bandwidth data.
In a preferred embodiment of the present invention, evaluating the system performance based on the acquired delay data for each transmission stage in the transmission path comprises:
delay data of each stage in the transmission path information of each access request is added to obtain total delay data of each access request.
In a preferred embodiment of the present invention, evaluating the system performance based on the acquired delay data for each transmission stage in the transmission path comprises:
obtaining the maximum request book and the data packet size accessed between processors;
Bandwidth data is calculated using the following formula:
bandwidth = maximum number of requests sent × packet size/(total delay + maximum number of requests minus 1).
Furthermore, the method disclosed according to the embodiment of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. The above-described functions defined in the methods disclosed in the embodiments of the present invention are performed when the computer program is executed by a processor.
Furthermore, the above-described method steps and system units may also be implemented using a controller and a computer-readable storage medium storing a computer program for causing the controller to implement the above-described steps or unit functions.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general purpose or special purpose computer or general purpose or special purpose processor. Further, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (10)

1. A method of performance evaluation for multiprocessor interconnect access, comprising the steps of:
Modeling multiprocessor interconnection access sharing a Cache consistency protocol, obtaining transmission path information of each access request and quantifying the length of a path based on the transmission path information, wherein obtaining the transmission path information of each access request and quantifying the length of the path based on the transmission path information comprises: obtaining path information of various access transmission processes by modeling and analyzing a multiprocessor shared Cache consistency protocol, obtaining paths of access requests according to protocol jump processes aiming at directory remote access requests under the Cache consistency protocol, and quantifying the lengths of various access transmission paths by taking single Flit transmission delay as a unit, wherein Flit refers to the time taken for transmitting and processing a Flit message data packet;
acquiring delay data of each transmission stage in the transmission path according to the process delay model data of each processing component in the transmission path, wherein acquiring the delay data of each transmission stage in the transmission path according to the process delay model data of each processing component in the transmission path comprises: assigning values to each stage of the memory access process based on actual delay data of a specific processor interface, process delay model data realized by a cooperative chip and cooperative chip frequency, dividing the memory access process according to a module of the chip, and carrying out delay assumption by combining an operation frequency and a process delay model, wherein the assumption data is comprehensively calculated through a design and reference model;
And evaluating the system performance according to the acquired delay data of each transmission stage in the transmission path.
2. The method of claim 1, wherein the system performance comprises remote access to total delay data and bandwidth data.
3. The method of claim 2, wherein evaluating system performance based on the acquired delay data for each transmission stage in the transmission path comprises:
delay data of each stage in the transmission path information of each access request is added to obtain total delay data of each access request.
4. The method of claim 2, wherein evaluating system performance based on the acquired delay data for each transmission stage in the transmission path comprises:
Obtaining the maximum request data and the data packet size accessed between processors;
Bandwidth data is calculated using the following formula:
bandwidth = maximum number of requests × packet size/(total delay + maximum number of requests minus 1).
5. An apparatus for performance evaluation of multiprocessor interconnect accesses, the apparatus comprising:
The system comprises a modeling module, a processing module and a processing module, wherein the modeling module is configured to model multi-processor interconnection accesses sharing a Cache consistency protocol, acquire transmission path information of each access request and quantify the length of a path based on the transmission path information, the modeling module is further configured to acquire the path information of each type of memory transmission process by modeling and analyzing the multi-processor sharing the Cache consistency protocol, acquire the path of the access request according to a protocol jump process aiming at a directory remote access request under the Cache consistency protocol, and quantify the length of each memory transmission path by taking single Flit transmission delay as a unit, wherein Flit refers to the time for transmitting and processing one Flit message data packet;
The acquisition module is configured to acquire delay data of each transmission stage in the transmission path according to process delay model data of each processing component in the transmission path, and is further configured to assign values to each stage of the memory access process based on actual delay data of a specific processor interface, process delay model data realized by a cooperative chip and cooperative chip frequency, and perform delay assumption according to module division of the chip and combining the operation frequency and the process delay model, wherein the assumption data is obtained through comprehensive calculation of a design and reference model;
and the evaluation module is configured to evaluate the system performance according to the acquired delay data of each transmission stage in the transmission path.
6. The apparatus of claim 5, wherein the system performance comprises remote access to total delay data and bandwidth data.
7. The apparatus of claim 6, wherein the evaluation module is further configured to:
delay data of each stage in the transmission path information of each access request is added to obtain total delay data of each access request.
8. The apparatus of claim 6, wherein the evaluation module is further configured to:
Obtaining the maximum request data and the data packet size accessed between processors;
Bandwidth data is calculated using the following formula:
bandwidth = maximum number of requests × packet size/(total delay + maximum number of requests minus 1).
9. A computer device, comprising:
At least one processor; and
A memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method of any one of claims 1-4.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any of claims 1-4.
CN202111158110.7A 2021-09-30 2021-09-30 Method, apparatus, device and readable medium for evaluating performance of multiprocessor interconnection Active CN113868109B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111158110.7A CN113868109B (en) 2021-09-30 2021-09-30 Method, apparatus, device and readable medium for evaluating performance of multiprocessor interconnection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111158110.7A CN113868109B (en) 2021-09-30 2021-09-30 Method, apparatus, device and readable medium for evaluating performance of multiprocessor interconnection

Publications (2)

Publication Number Publication Date
CN113868109A CN113868109A (en) 2021-12-31
CN113868109B true CN113868109B (en) 2024-04-19

Family

ID=79000909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111158110.7A Active CN113868109B (en) 2021-09-30 2021-09-30 Method, apparatus, device and readable medium for evaluating performance of multiprocessor interconnection

Country Status (1)

Country Link
CN (1) CN113868109B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102474468A (en) * 2010-01-25 2012-05-23 松下电器产业株式会社 Semiconductor system, relay apparatus, and chip circuit
KR20120063900A (en) * 2010-12-08 2012-06-18 삼성전자주식회사 Latency management system and method for a multi-processor system
CN102591759A (en) * 2011-12-29 2012-07-18 中国科学技术大学苏州研究院 Clock precision parallel simulation system for on-chip multi-core processor
CN102693213A (en) * 2012-05-16 2012-09-26 南京航空航天大学 System-level transmission delay model building method applied to network on chip
CN110022261A (en) * 2019-05-20 2019-07-16 北京邮电大学 Multi-path transmission method and apparatus based on SCTP-CMT transport protocol
CN111464374A (en) * 2020-02-21 2020-07-28 中国电子技术标准化研究院 Network delay control method, equipment and device
CN111884885A (en) * 2020-07-31 2020-11-03 中国工商银行股份有限公司 Access information determination method, device, system, electronic device and medium
CN112436601A (en) * 2020-10-31 2021-03-02 高鹤庭 Information flow processing method of intelligent substation, computer equipment and storage medium
CN112751718A (en) * 2021-01-28 2021-05-04 深圳市晨北科技有限公司 Bandwidth adjusting method and device, terminal and storage medium
CN113395216A (en) * 2020-03-11 2021-09-14 辉达公司 Techniques to transfer data between hardware devices
CN113438707A (en) * 2020-03-23 2021-09-24 诺基亚通信公司 Apparatus, method and computer program for routing data in a dual-connection or multi-connection configuration

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9930133B2 (en) * 2014-10-23 2018-03-27 Netapp, Inc. System and method for managing application performance
US10367722B2 (en) * 2017-02-27 2019-07-30 International Business Machines Corporation Optimizing performance of computer networks
CN109981765B (en) * 2019-03-18 2023-03-24 北京百度网讯科技有限公司 Method and apparatus for determining access path of content distribution network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102474468A (en) * 2010-01-25 2012-05-23 松下电器产业株式会社 Semiconductor system, relay apparatus, and chip circuit
KR20120063900A (en) * 2010-12-08 2012-06-18 삼성전자주식회사 Latency management system and method for a multi-processor system
CN102591759A (en) * 2011-12-29 2012-07-18 中国科学技术大学苏州研究院 Clock precision parallel simulation system for on-chip multi-core processor
CN102693213A (en) * 2012-05-16 2012-09-26 南京航空航天大学 System-level transmission delay model building method applied to network on chip
CN110022261A (en) * 2019-05-20 2019-07-16 北京邮电大学 Multi-path transmission method and apparatus based on SCTP-CMT transport protocol
CN111464374A (en) * 2020-02-21 2020-07-28 中国电子技术标准化研究院 Network delay control method, equipment and device
CN113395216A (en) * 2020-03-11 2021-09-14 辉达公司 Techniques to transfer data between hardware devices
CN113438707A (en) * 2020-03-23 2021-09-24 诺基亚通信公司 Apparatus, method and computer program for routing data in a dual-connection or multi-connection configuration
CN111884885A (en) * 2020-07-31 2020-11-03 中国工商银行股份有限公司 Access information determination method, device, system, electronic device and medium
CN112436601A (en) * 2020-10-31 2021-03-02 高鹤庭 Information flow processing method of intelligent substation, computer equipment and storage medium
CN112751718A (en) * 2021-01-28 2021-05-04 深圳市晨北科技有限公司 Bandwidth adjusting method and device, terminal and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MPSoC片上互连网络缓冲管理与高速互连技术研究;尹亚明;《中国博士学位论文库》;全文 *
微处理器片上存储系统性能优化关键技术研究;刘松鹤;《中国博士学位论文库》;全文 *

Also Published As

Publication number Publication date
CN113868109A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
CN103870435B (en) server and data access method
US9892043B2 (en) Nested cache coherency protocol in a tiered multi-node computer system
CN106790629A (en) Data synchronization unit and its realize the method for data syn-chronization, client access system
CN103092807B (en) Node Controller, parallel computation server system and method for routing
JP2015525392A (en) Cache processing method, node, and computer-readable medium for distributed storage system
CN103430161B (en) The method of a kind of Based PC IE Switch communication, Apparatus and system
BR112012025008A2 (en) call resource management method for communication over long connections and handset for managing long connections
CN101557427A (en) Method for providing diffluent information and realizing the diffluence of clients, system and server thereof
CN103607424B (en) Server connection method and server system
CN110071978A (en) A kind of method and device of cluster management
CN111966289A (en) Partition optimization method and system based on Kafka cluster
CN103297490B (en) Information processing apparatus, distributed processing system, and distributed processing method
CN110119304A (en) A kind of interruption processing method, device and server
CN106874142A (en) A kind of real time data fault-tolerance processing method and system
US11409771B1 (en) Splitting partitions across clusters in a time-series database
US20150242318A1 (en) System and a method for data processing with management of a cache consistency in a network of processors with cache memories
CN113868109B (en) Method, apparatus, device and readable medium for evaluating performance of multiprocessor interconnection
CN105760391A (en) Data dynamic redistribution method and system, data node and name node
US11366598B1 (en) Dynamic lease assignments in a time-series database
US9940246B1 (en) Counter-based victim selection in a cache memory
CN107295059A (en) The statistical system and method for service propelling amount
CN116662022A (en) Distributed message processing method, system, device, communication equipment and storage medium
CN105871659A (en) Monitoring method and device of newly added server
CN114785662B (en) Storage management method, device, equipment and machine-readable storage medium
CN109522294A (en) A kind of distributed data cache system and data cache method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant