CN110609768A - Method and device for measuring xGMI2 bandwidth between two paths of CPUs - Google Patents

Method and device for measuring xGMI2 bandwidth between two paths of CPUs Download PDF

Info

Publication number
CN110609768A
CN110609768A CN201910781412.6A CN201910781412A CN110609768A CN 110609768 A CN110609768 A CN 110609768A CN 201910781412 A CN201910781412 A CN 201910781412A CN 110609768 A CN110609768 A CN 110609768A
Authority
CN
China
Prior art keywords
cpu
xgmi2
bandwidth
memory amount
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910781412.6A
Other languages
Chinese (zh)
Inventor
邱奕欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wave Intelligent Technology Co Ltd
Original Assignee
Suzhou Wave Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wave Intelligent Technology Co Ltd filed Critical Suzhou Wave Intelligent Technology Co Ltd
Priority to CN201910781412.6A priority Critical patent/CN110609768A/en
Publication of CN110609768A publication Critical patent/CN110609768A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/221Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test buses, lines or interfaces, e.g. stuck-at or open line faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method for measuring xGMI2 bandwidth between two paths of CPUs, which comprises the following steps: appointing a main CPU and a slave CPU, and appointing thread number according to the used CPU; reading the total memory amount of the main CPU and the slave CPU, and calculating the adaptive xGMI2 bandwidth measurement memory amount; and establishing an xGMI2 channel between the master CPU and the slave CPU to complete the throughput test. The invention also discloses a device for measuring the xGMI2 bandwidth between two paths of CPUs, adopts a mode of binding CPUs, applies script skills, utilizes the NUMA principle to sequentially run two or more paths of processors at one time to carry out throughput access test on a remote memory across xGMI2, and utilizes the obtained test result to effectively verify the actual bandwidth of xGMI 2.

Description

Method and device for measuring xGMI2 bandwidth between two paths of CPUs
Technical Field
The invention relates to the technical field of server testing, in particular to a method and a device for measuring xGMI2 bandwidth between two paths of CPUs.
Background
In a server architecture with more than two CPUs, taking the current AMD dual CPU architecture (ROME platform) as an example, the CPU cores are not shared for accessing the memory, and when different processors need to exchange data, the data is stored in the memory through the system bus, but when the number of cores is large, the exchange data becomes normal, the speed between the CPU and the memory cannot keep up with the processing speed of the CPU, and the more cores are, the more the overall performance is reduced.
In order to solve the access efficiency problem, the industry uses a Non-uniform memory access (NUMA) technology to solve the problem, where the NUMA divides the CPU and the memory into different nodes (different CPUs have respective memories), and the CPU nodes communicate with each other through an interconnect between the CPUs, such as an inter-chip global memory interface (xmmi) interface of the AMD CPU. In the prior art, a test instrument using high-speed signals analyzes the attenuation condition of query signals after the CPU and the CPU communicate with each other. This is not an actual data transfer, but only to verify that the pure xGMI2 on the signal is compliant with the AMD specification on the electronic signal and does not represent the actual representation of a memory copy.
Disclosure of Invention
The invention aims to provide a method and a device for measuring xGMI2 bandwidth between two paths of CPUs (central processing units), which measure whether the xGMI2 bandwidth meets the AMD specification speed or not through NUMA (non-uniform memory access) and effectively verify the actual xGMI2 bandwidth.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a method for measuring xGMI2 bandwidth between two paths of CPUs, which comprises the following steps:
appointing a main CPU and a slave CPU, and appointing thread number according to the used CPU;
reading the total memory amount of the main CPU and the slave CPU, and calculating the adaptive xGMI2 bandwidth measurement memory amount;
and establishing an xGMI2 channel between the master CPU and the slave CPU to complete the throughput test.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the specifying a thread number according to a CPU used specifically includes:
starting and binding the thread number of the master CPU, and accessing the memory amount appointed by the slave CPU;
starting and binding the thread number of the slave CPU, and accessing the memory amount appointed by the master CPU.
With reference to the first aspect, in a second possible implementation manner of the first aspect, the reading total memory amounts of the master CPU and the slave CPU, and calculating a bandwidth-adapted xGMI2 bandwidth measurement memory amount specifically includes:
reading the total memory amount of the main CPU by utilizing a built-in instruction of Linux, and calculating the memory amount of the main CPU which is adaptive to the xGMI2 bandwidth measurement according to the memory amount which is 80 percent/2 of the total memory amount;
reading the total amount of the slave CPU memory by using a built-in instruction of Linux, and calculating the slave CPU memory amount measured by adapting the xGMI2 bandwidth according to the memory amount which is 80 percent/2 of the total amount of the memory.
With reference to the first aspect, in a third possible implementation manner of the first aspect, the establishing an xGMI2 channel between a master CPU and a slave CPU to complete a throughput test specifically includes:
obtaining the test results of the memory throughput in two directions;
and comparing the throughput number obtained by the test with the number of xGMI2 bandwidth and the speed for verification.
The second aspect of the present invention provides a device for measuring the xGMI2 bandwidth between two CPU channels, comprising:
the thread binding module is used for appointing a main CPU and a slave CPU and appointing thread number according to the used CPU;
the memory amount calculation module is used for reading the total memory amount of the main CPU and the slave CPU and calculating the adaptive xGMI2 bandwidth measurement memory amount;
and the bandwidth test module establishes an xGMI2 channel between the master CPU and the slave CPU to complete throughput test.
With reference to the second aspect, in a first possible implementation manner of the second aspect, the thread binding module includes:
the main CPU binding unit starts and binds the thread number of the main CPU and accesses the memory amount appointed by the slave CPU;
and the slave CPU binding unit starts and binds the thread number of the slave CPU and accesses the memory amount specified by the master CPU.
With reference to the second aspect, in a second possible implementation manner of the second aspect, the memory amount calculation module includes:
the main CPU memory amount calculating unit reads the total memory amount of the main CPU by using a built-in instruction of Linux, and calculates the main CPU memory amount which is measured by adapting to the xGMI2 bandwidth according to the memory amount which is 80 percent/2 of the total memory amount;
and the slave CPU memory amount calculating unit reads the total amount of the slave CPU memory by using a built-in instruction of Linux, and calculates the slave CPU memory amount measured by adapting the xGMI2 bandwidth according to the memory amount which is the total amount of the memory 80%/2.
With reference to the second aspect, in a third possible implementation manner of the second aspect, the bandwidth testing module includes:
the test result reading unit is used for acquiring the test results of the memory throughput in two directions;
and the test result comparison unit compares the throughput number obtained by the test with the number of xGMI2 bandwidth and the speed for verification.
The device for measuring the bandwidth of the xGMI2 between two CPUs according to the second aspect of the present invention can implement the methods according to the first aspect and the respective implementation manners of the first aspect, and achieve the same effects.
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
the invention adopts a mode of binding the CPU, applies script skill, utilizes NUMA principle to sequentially run two or more paths of processors to cross the xGMI2 to perform throughput access test on the remote memory at one time, and utilizes the obtained test result to effectively verify the actual bandwidth of the xGMI 2. Compared with a general one-by-one program implementation mode, the script test method can also run on a multi-path processor to generate test results with unified test conditions. And the actual xGMI2 data transmission quantity is used for specifically verifying the xGMI2 bandwidth, so that the xGMI2 performance in the hardware design of a processor with more than one path is ensured to have no bottleneck problem.
Drawings
FIG. 1 is a flow chart of a method embodiment of the present invention;
FIG. 2 is a flow chart of an embodiment of the method of the present invention;
FIG. 3 is a schematic diagram of an implementation using NUMA access to memory;
FIG. 4 is a flowchart of an embodiment of the method of the present invention;
FIG. 5 is a flow chart of a fourth embodiment of the method of the present invention;
FIG. 6 is a schematic view of an embodiment of the apparatus of the present invention;
FIG. 7 is a schematic view of a second embodiment of the apparatus of the present invention;
FIG. 8 is a third schematic view of an embodiment of the apparatus of the present invention;
FIG. 9 is a fourth schematic view of an embodiment of the apparatus of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
As shown in fig. 1, a method for measuring xGMI2 bandwidth between two CPUs includes the following steps:
s1 designates the master CPU and the slave CPU, and the number of threads is designated according to the CPU used.
S2, reading the total memory amount of the main CPU and the slave CPU, and calculating the adaptive xGMI2 bandwidth measurement memory amount.
S3, establishing an xGMI2 channel between the master CPU and the slave CPU, and completing the throughput test.
As shown in fig. 2, the step S1 specifies the thread number according to the CPU used, and specifically includes:
and S11, starting and binding the thread number of the master CPU, and accessing the memory amount appointed by the slave CPU.
And S12, starting and binding the thread number of the slave CPU, and accessing the memory amount appointed by the master CPU.
As shown in fig. 3, the number of CPU cores and threads required for the normal application program to run are managed by the operating system. The operating system schedules the application process to run on different cores in turn. After a process/thread is bound to a specific cpu core, the process can run on the core all the time and can not be scheduled to other cores by the operating system.
In an operating system (such as Linux OS), the CPU is automatically scheduled to directly access a local memory bank or a remote memory bank (another CPU memory bank) according to the work of the process, which belongs to the work of the scheduler in the operating system. In principle, if there is room in the local memory banks, the operating system does not schedule the CPU to access the remote memory, which is a performance optimization principle. That is, so-called one-time/automatic, is to designate the master CPU, the slave CPU, and the corresponding remote memory for access by binding.
Using the basic principle described above, we use instructions to specify the host CPU to access the memory from the host CPU, and the test result of the memory access, that number, is the memory throughput. Therefore, the mode that the main CPU thread is bound to access the slave CPU memory is used for avoiding the operating system from arranging two CPUs to access the memory of the operating system by self.
As shown in fig. 4, in step S2, reading the total memory amount of the master CPU and the slave CPU, and calculating the adaptive xGMI2 bandwidth measurement memory amount specifically includes:
and S21, reading the total memory amount of the main CPU by using a built-in instruction of Linux, and calculating the memory amount of the main CPU which is adaptive to the xGMI2 bandwidth measurement according to the memory amount which is 80 percent/2 of the total memory amount.
And S22, reading the total amount of the slave CPU memory by using a built-in Linux instruction, and calculating the slave CPU memory amount measured by adapting the xGMI2 bandwidth according to the memory amount which is 80 percent/2 of the total amount of the memory.
The/proc/meminfo file stores information about memory usage, which can be seen by cat retrieval.
As shown in fig. 5, in step S3, establishing an xGMI2 channel between the master CPU and the slave CPU, and completing the throughput test specifically includes:
and S31, obtaining the test results of the memory throughput in two directions.
And S32, comparing the throughput number obtained by the test with the number of xGMI2 bandwidth and verifying the speed.
In one embodiment, the method for measuring the xGMI2 bandwidth between two CPU channels comprises the following steps,
1. and installing an lmbench test tool with an applicable version at the server end to be tested.
2. And under the path of the lmbench, creating a test script xGMI2test.
3. And editing the test script, wherein the NUMA of each CPU accesses the memory of other CPUs.
Thread count is the number of all threads in a single processor (including hyper-threads).
Memory size 80%/2 (all memory size available at/proc/meminfo).
3. Add script execution permission chmod + x mtest.
4. Test executed/xgmi2test.
5. And obtaining the test result (the default unit is MB/s) of the throughput of the lmbench memory.
6. Verification is made with the throughput figures obtained from the test according to the amount of actual xGMI2 bandwidth and speed.
As shown in fig. 6, a two-way CPU xGMI2 bandwidth measuring device includes:
a thread binding module 11 for designating a master CPU and a slave CPU and designating the thread number according to the used CPU;
the memory amount calculation module 12 reads the total memory amount of the master CPU and the slave CPU, and calculates the adaptive xGMI2 bandwidth measurement memory amount;
and the bandwidth test module 13 establishes an xGMI2 channel between the master CPU and the slave CPU to complete throughput test.
As shown in fig. 7, the thread binding module includes:
a master CPU binding unit 111 that starts and binds the master CPU thread count, and accesses the memory amount specified by the slave CPU;
the slave CPU binding unit 112 starts and binds the slave CPU thread count, and accesses the memory amount specified by the master CPU.
As shown in fig. 8, the memory amount calculation module includes:
the main CPU memory amount calculating unit 121 reads the total amount of the main CPU memory by using a Linux built-in instruction, and calculates the main CPU memory amount adapted to the xGMI2 bandwidth measurement according to the memory amount equal to the total memory amount × 80%/2;
the slave CPU memory amount calculation unit 122 reads the total amount of the slave CPU memory by using the Linux built-in instruction, and calculates the slave CPU memory amount measured by adapting the xGMI2 bandwidth according to the memory amount, which is the total amount of the memory × 80%/2.
As shown in fig. 9, the bandwidth testing module includes:
a test result reading unit 131 for obtaining test results of memory throughput in two directions;
the test result comparing unit 132 compares the throughput number obtained by the test with the number of xGMI2 bandwidths and verifies the comparison with the speed.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (8)

1. A method for measuring xGMI2 bandwidth between two paths of CPUs is characterized by comprising the following steps:
appointing a main CPU and a slave CPU, and appointing thread number according to the used CPU;
reading the total memory amount of the main CPU and the slave CPU, and calculating the adaptive xGMI2 bandwidth measurement memory amount;
and establishing an xGMI2 channel between the master CPU and the slave CPU to complete the throughput test.
2. The method for measuring xGMI2 bandwidth between two CPUs as claimed in claim 1, wherein the specifying the number of threads according to the CPU used comprises:
starting and binding the thread number of the master CPU, and accessing the memory amount appointed by the slave CPU;
starting and binding the thread number of the slave CPU, and accessing the memory amount appointed by the master CPU.
3. The method for measuring the xGMI2 bandwidth between two CPUs as claimed in claim 2, wherein the reading of the total memory amount of the master CPU and the slave CPU and the calculation of the adaptive xGMI2 bandwidth measurement memory amount specifically comprises:
reading the total memory amount of the main CPU by utilizing a built-in instruction of Linux, and calculating the memory amount of the main CPU which is adaptive to the xGMI2 bandwidth measurement according to the memory amount which is 80 percent/2 of the total memory amount;
reading the total amount of the slave CPU memory by using a built-in instruction of Linux, and calculating the slave CPU memory amount measured by adapting the xGMI2 bandwidth according to the memory amount which is 80 percent/2 of the total amount of the memory.
4. The method for measuring xGMI2 bandwidth between two CPUs as claimed in claim 3, wherein the establishing of the xGMI2 channel between the master CPU and the slave CPU to complete throughput testing specifically comprises:
obtaining the test results of the memory throughput in two directions;
and comparing the throughput number obtained by the test with the number of xGMI2 bandwidth and the speed for verification.
5. A device for measuring xGMI2 bandwidth between two CPU paths is characterized by comprising:
the thread binding module is used for appointing a main CPU and a slave CPU and appointing thread number according to the used CPU;
the memory amount calculation module is used for reading the total memory amount of the main CPU and the slave CPU and calculating the adaptive xGMI2 bandwidth measurement memory amount;
and the bandwidth test module establishes an xGMI2 channel between the master CPU and the slave CPU to complete throughput test.
6. The dual-channel inter-CPU xGMI2 bandwidth measurement device of claim 5, wherein the thread binding module comprises:
the main CPU binding unit starts and binds the thread number of the main CPU and accesses the memory amount appointed by the slave CPU;
and the slave CPU binding unit starts and binds the thread number of the slave CPU and accesses the memory amount specified by the master CPU.
7. The dual-channel inter-CPU xGMI2 bandwidth measurement apparatus of claim 5, wherein the memory amount calculation module comprises:
the main CPU memory amount calculating unit reads the total memory amount of the main CPU by using a built-in instruction of Linux, and calculates the main CPU memory amount which is measured by adapting to the xGMI2 bandwidth according to the memory amount which is 80 percent/2 of the total memory amount;
and the slave CPU memory amount calculating unit reads the total amount of the slave CPU memory by using a built-in instruction of Linux, and calculates the slave CPU memory amount measured by adapting the xGMI2 bandwidth according to the memory amount which is the total amount of the memory 80%/2.
8. The dual-channel inter-CPU xGMI2 bandwidth measurement device of claim 5, wherein the bandwidth test module comprises:
the test result reading unit is used for acquiring the test results of the memory throughput in two directions;
and the test result comparison unit compares the throughput number obtained by the test with the number of xGMI2 bandwidth and the speed for verification.
CN201910781412.6A 2019-08-23 2019-08-23 Method and device for measuring xGMI2 bandwidth between two paths of CPUs Withdrawn CN110609768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910781412.6A CN110609768A (en) 2019-08-23 2019-08-23 Method and device for measuring xGMI2 bandwidth between two paths of CPUs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910781412.6A CN110609768A (en) 2019-08-23 2019-08-23 Method and device for measuring xGMI2 bandwidth between two paths of CPUs

Publications (1)

Publication Number Publication Date
CN110609768A true CN110609768A (en) 2019-12-24

Family

ID=68889870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910781412.6A Withdrawn CN110609768A (en) 2019-08-23 2019-08-23 Method and device for measuring xGMI2 bandwidth between two paths of CPUs

Country Status (1)

Country Link
CN (1) CN110609768A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306775A (en) * 2020-11-19 2021-02-02 山东云海国创云计算装备产业创新中心有限公司 Method, device, equipment and medium for testing communication link between two-way CPUs (central processing unit)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306775A (en) * 2020-11-19 2021-02-02 山东云海国创云计算装备产业创新中心有限公司 Method, device, equipment and medium for testing communication link between two-way CPUs (central processing unit)
CN112306775B (en) * 2020-11-19 2023-03-14 山东云海国创云计算装备产业创新中心有限公司 Method, device, equipment and medium for testing communication link between two-way CPUs (central processing unit)

Similar Documents

Publication Publication Date Title
US6539500B1 (en) System and method for tracing
US8327039B2 (en) Integrated DMA processor and PCI express switch for a hardware-based functional verification system
US9600618B2 (en) Implementing system irritator accelerator FPGA unit (AFU) residing behind a coherent attached processors interface (CAPI) unit
US9335947B2 (en) Inter-processor memory
JP6326705B2 (en) Test, verification and debug architecture program and method
CN101616174A (en) A kind of storage system IO handles the method that the path dynamic tracking realizes the optimization system performance
US8954644B2 (en) Apparatus and method for controlling memory
US20160299859A1 (en) Apparatus and method for external access to core resources of a processor, semiconductor systems development tool comprising the apparatus, and computer program product and non-transitory computer-readable storage medium associated with the method
CN104850480A (en) Method and device for testing performance of hard disk of high-density storage server
JP2014532861A (en) Programmable test equipment
CN103198001A (en) Storage system capable of self-testing peripheral component interface express (PCIE) interface and test method
CN110609768A (en) Method and device for measuring xGMI2 bandwidth between two paths of CPUs
CN109992458A (en) A kind of UPI bandwidth detection method, apparatus, equipment and readable storage medium storing program for executing
WO2013148439A1 (en) Hardware managed allocation and deallocation evaluation circuit
US7373557B1 (en) Performance monitor for data processing systems
CN110191010B (en) Pressure testing method of server
US10949330B2 (en) Binary instrumentation to trace graphics processor code
CN114666103B (en) Trusted measurement device, equipment, system and trusted identity authentication method
CN113098730B (en) Server testing method and equipment
CN111095228A (en) First boot with one memory channel
US20160077942A1 (en) Storage system and test method for testing pci express interface
US7000148B2 (en) Program-controlled unit
CN111695314A (en) Multi-core chip simulation test method and device
US7197677B1 (en) System and method to asynchronously test RAMs
US8090991B2 (en) Information processing apparatus, method, and computer program product for self-diagnosis for utilizing multiple diagnostic devices, each having exclusive access to a resource

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20191224