CN105117310A - Linux system-based memory read-write bandwidth optimization test method - Google Patents
Linux system-based memory read-write bandwidth optimization test method Download PDFInfo
- Publication number
- CN105117310A CN105117310A CN201510457917.9A CN201510457917A CN105117310A CN 105117310 A CN105117310 A CN 105117310A CN 201510457917 A CN201510457917 A CN 201510457917A CN 105117310 A CN105117310 A CN 105117310A
- Authority
- CN
- China
- Prior art keywords
- test
- memory
- run
- test method
- bandwidth optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Debugging And Monitoring (AREA)
Abstract
本发明公开一种基于linux系统的内存读写带宽优化测试方法,涉及内存优化测试技术,首先通过优化测试环境,获取系统信息,然后设置测试最优参数和测试循环次数,编译测试源码,并运行脚本进行测试,最后获取测试数据,并保留最优数据。本发明充分利用系统资源,减少运算中跨节点访存带来的系统消耗,实现CPU与内存的协调工作,自动实现测试优化,大大节约测试的时间成本,能够保证测试数据可靠性。
The invention discloses a memory read-write bandwidth optimization test method based on a linux system, which relates to a memory optimization test technology. Firstly, the system information is obtained by optimizing the test environment, and then the optimal test parameters and the number of test cycles are set, and the test source code is compiled and run. The script is tested, and finally the test data is obtained, and the optimal data is kept. The invention makes full use of system resources, reduces system consumption caused by cross-node memory access during calculation, realizes the coordinated work of CPU and memory, automatically realizes test optimization, greatly saves time and cost of testing, and can ensure the reliability of test data.
Description
技术领域 technical field
本发明涉及内存优化测试技术,具体的说是一种基于linux系统的内存读写带宽优化测试方法。 The invention relates to a memory optimization test technology, in particular to a memory read-write bandwidth optimization test method based on a linux system.
背景技术 Background technique
服务器板级设计是服务器设计中最重要及最复杂的一个环节,如何保证板级部分供电、信号的完整,SI测试阶段都会有一套完整的验证流程,但由于计算机的复杂性,SI部门也很难覆盖周全。系统验证部门务必保证产品在系统级层面性能的可靠性及稳定性,最大限度保证整机的质量。随着各种系统的不断完善,系统结构愈加复杂,用于系统优化的程序包裹在系统核心之外,如何剖开系统负责的外表,以在系统基础层面保证产品系统硬件的可靠性,也成为测试验证工作的一个重要话题及研究方向。 Server board-level design is the most important and complicated link in server design. How to ensure the integrity of power supply and signals at the board level? There will be a complete verification process in the SI test phase. However, due to the complexity of the computer, the SI department is also very concerned. Difficult to cover comprehensively. The system verification department must ensure the reliability and stability of the product's performance at the system level, and ensure the quality of the whole machine to the greatest extent. With the continuous improvement of various systems, the system structure is becoming more and more complex, and the program used for system optimization is wrapped outside the system core. How to cut the surface responsible for the system to ensure the reliability of the product system hardware at the basic level of the system has also become a problem. An important topic and research direction of test verification work.
随着大数据、云计算、高性能计算机的快速发展及应用,海量数据处理需求正在快速增加。通常当CPU接收到指令后,最先向CPU中的一级缓存(L1Cache)去寻找相关的数据,虽然一级缓存是与CPU同频运行的,但是由于容量较小,所以不可能每次都命中。这时CPU会继续向下一级的二级缓存(L2Cache)寻找,同样的道理,当所需要的数据在二级缓存中也没有的话,会继续转向L3Cache、内存和硬盘。由于系统处理的数据量都是相当巨大的,因此几乎每一步操作都得经过内存,这也是整个系统中工作最为频繁的部件。 With the rapid development and application of big data, cloud computing, and high-performance computers, the demand for massive data processing is increasing rapidly. Usually, when the CPU receives an instruction, it first searches for relevant data from the L1 cache (L1Cache) in the CPU. Although the L1 cache runs at the same frequency as the CPU, due to its small capacity, it is impossible to read it every time. hit. At this time, the CPU will continue to look for the next-level secondary cache (L2Cache). For the same reason, if the required data is not in the secondary cache, it will continue to turn to L3Cache, memory and hard disk. Since the amount of data processed by the system is quite huge, almost every operation has to go through the memory, which is also the most frequent component in the entire system.
如此一来,内存的性能及利用率水平就在一定程度上决定了整个计算机系统的表现。以往的内存带宽测试跨节点访存量大,CPU利用率低。如何更加准确保证每个产品内存带宽性能,也越来越成为测试部门的一项重要工作。 In this way, the performance and utilization level of the memory determines the performance of the entire computer system to a certain extent. In previous memory bandwidth tests, the amount of cross-node memory access was large, and the CPU utilization rate was low. How to more accurately guarantee the memory bandwidth performance of each product has increasingly become an important task for the testing department.
发明内容 Contents of the invention
本发明针对目前需求以及现有技术发展的不足之处,提供一种基于linux系统的内存读写带宽优化测试方法。 Aiming at the current demand and the shortcomings of the development of the prior art, the present invention provides a memory read-write bandwidth optimization test method based on a linux system.
本发明所述一种基于linux系统的内存读写带宽优化测试方法,解决上述技术问题采用的技术方案如下:所述内存读写带宽优化测试方法,首先通过优化测试环境,获取系统信息,然后设置测试最优参数和测试循环次数,编译测试源码,并运行脚本进行测试,最后获取测试数据,并保留最优数据。 A kind of memory reading and writing bandwidth optimization test method based on linux system of the present invention, the technical scheme adopted for solving the above-mentioned technical problem is as follows: described memory reading and writing bandwidth optimization test method, at first by optimizing test environment, obtain system information, then set Test the optimal parameters and the number of test cycles, compile the test source code, and run the script for testing, and finally obtain the test data and keep the optimal data.
优选的,所述获取系统信息的步骤包括:将info.sh拷贝到root目录下,并运行info.sh,以获取系统CPU、内存、BIOS、sockets、cpunode、DIMMnumber等基本信息,此类信息自动保存在系统目录下,供程序调用。 Preferably, the step of obtaining system information includes: copy info.sh to the root directory, and run info.sh to obtain basic information such as system CPU, memory, BIOS, sockets, cpunode, DIMMnumber, etc., such information automatically It is stored in the system directory for program calling.
优选的,运行run_stream.sh脚本,侦测该系统配置下最优N值。 Preferably, run the run_stream.sh script to detect the optimal N value under the system configuration.
优选的,运行run_stream.max,该脚本通过前面获取的系统信息,对系统运行的服务程序进一步优化,并循环多次,取最优结果存放在文件下。 Preferably, run run_stream.max, the script further optimizes the service program running on the system through the system information obtained earlier, and loops it multiple times, and takes the optimal result and stores it in the file.
本发明所述一种基于linux系统的内存读写带宽优化测试方法,与现有技术相比具有的有益效果是:利用对测试系统的分析,屏蔽系统不相关服务的影响,可以最大化的充分利用系统资源,尽可能减少运算中跨节点访存带来的系统消耗,实现CPU与内存的协调工作,自动实现测试优化,实现测试的半自动化,实施过程简单方便,大大节约测试的时间成本,保证测试的准确度以及测试数据的可靠性。 A kind of memory reading and writing bandwidth optimization test method based on the linux system of the present invention has the beneficial effect compared with the prior art: utilize the analysis to the test system, shield the influence of the irrelevant service of the system, can maximize the full Utilize system resources to minimize the system consumption caused by cross-node memory access in computing, realize the coordination of CPU and memory, automatically realize test optimization, and realize semi-automated testing. The implementation process is simple and convenient, which greatly saves the time and cost of testing. Ensure the accuracy of the test and the reliability of the test data.
附图说明 Description of drawings
附图1为所述内存读写带宽优化测试方法的实施图。 Accompanying drawing 1 is the implementation figure of described memory reading and writing bandwidth optimization test method.
具体实施方式 Detailed ways
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明所述一种基于linux系统的内存读写带宽优化测试方法进一步详细说明。 In order to make the purpose, technical solution and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, a kind of memory reading and writing bandwidth optimization test method based on linux system of the present invention is described in further detail.
本发明所述基于linux系统的内存读写带宽优化测试方法,利用用户自定义的数组数量级上进行持续循环的数据复制、标量、加法及复合四种基本运算,来运算来测试内存带宽。由于用户自定义的数组一般非常大,使得每一步的运算都会经过内存,对CPU与内存之间的数据通讯质量要求很高,从而达到内存带宽测试的要求。 The memory read-write bandwidth optimization test method based on the linux system of the present invention utilizes four basic operations of data replication, scalar, addition, and compound that are continuously circulated on the order of the user-defined array to perform operations to test the memory bandwidth. Since the user-defined array is generally very large, each step of the calculation will go through the memory, which requires a high quality of data communication between the CPU and the memory, so as to meet the requirements of the memory bandwidth test.
所述数据复制、标量、加法及复合四种基本运算的C代码表示: The C code representation of the four basic operations of data copying, scalar, addition and compounding:
Copyc[j]=a[j];数据复制 Copyc[j]=a[j]; data copy
Scaleb[j]=scalar*c[j];标量 Scaleb[j]=scalar*c[j]; scalar
Addc[j]=a[j]+b[j];加法 Addc[j]=a[j]+b[j]; addition
Triada[j]=b[j]+scalar*c[j];复合运算 Triada[j]=b[j]+scalar*c[j]; compound operation
其中,Copy运算操作为先访问一个内存单元读出其中的值,再将值写入到另一个内存单元;Scale运算操作先从内存单元读出其中的值,作一个乘法运算,再将结果写入到另一个内存单元;Add运算操作先从内存单元读出两个值,做加法运算,再将结果写入到另一个内存单元;Triad运算操作则是从内存单元读出两个数据,做复合运算后存入另一个内存单元。 Among them, the Copy operation is to first access a memory unit to read the value, and then write the value to another memory unit; the Scale operation first reads the value from the memory unit, performs a multiplication operation, and then writes the result to into another memory unit; the Add operation first reads two values from the memory unit, performs an addition operation, and then writes the result to another memory unit; the Triad operation reads two values from the memory unit and performs Stored in another memory unit after compound operation.
本发明基于基于Linux平台,以循环的数据复制、标量、加法及复合运算四种基本运算为核心,实现测试调优的自动化,节约测试时间成本,保证测试准确性。 Based on the Linux platform, the present invention takes the four basic operations of cyclic data copying, scalar, addition and compound operation as the core, realizes the automation of test tuning, saves test time and cost, and ensures test accuracy.
实施例: 本实施例所述一种基于linux系统的内存读写带宽优化测试方法,首先通过优化测试环境,获取系统信息,然后设置测试最优参数和测试循环次数,编译测试源码,并运行脚本进行测试,最后获取测试数据,并保留最优数据。 Embodiment: A kind of memory reading and writing bandwidth optimization test method based on linux system described in this embodiment first obtains system information by optimizing the test environment, then sets the optimal test parameters and the number of test cycles, compiles the test source code, and runs the script Test, finally get the test data, and keep the best data.
附图1为所述内存读写带宽优化测试方法的实施图,如附图1所示,该测试方法的具体实施过程包括:准备测试环境并优化测试环境,获取系统信息,然后设置测试最优参数以及测试循环次数,编译测试源码,运行脚本进行测试循环多次,最后获取测试数据,并保留最优数据。 Accompanying drawing 1 is the implementation figure of described memory reading and writing bandwidth optimization test method, as shown in accompanying drawing 1, the specific implementation process of this test method comprises: prepare test environment and optimize test environment, obtain system information, then set test optimal Parameters and the number of test cycles, compile the test source code, run the script to perform the test cycle multiple times, and finally obtain the test data and keep the optimal data.
进行该内存读写带宽优化测试方法,首先准备测试环境,对于系统支持RDIMM,使用所有的内存通道,每个内存通道分别使用1、2、3根内存;软件要求:Intel编码器13,redhat6.2及以上; To perform this memory read and write bandwidth optimization test method, first prepare the test environment. For the system that supports RDIMM, use all memory channels, and each memory channel uses 1, 2, and 3 memory sticks respectively; software requirements: Intel encoder 13, redhat6. 2 and above;
BIOS下优化设置如下表: The optimization settings under BIOS are as follows:
OS下优化设置如下: The optimization settings under the OS are as follows:
ForRHEL6,leavethoseservicesenabledanddisableotherservices. For RHEL6, leave those services enabled and disable other services.
获取系统信息: Get system information:
将info.sh拷贝到root目录下,用以获取系统内存、CPU、线程数等基本信息: Copy info.sh to the root directory to obtain basic information such as system memory, CPU, number of threads, etc.:
基本代码: Basic code:
#!/bin/sh #!/bin/sh
#timesynchronize #timesynchronize
#ntpdate133.133.133.1 #ntpdate133.133.133.1
#sntp-Pno-r133.133.133.1/*设置SNTP服务器地址*/ #sntp-Pno-r133.133.133.1/*Set SNTP server address*/
#getprocessormodelname #getprocessormodelname
CPUMODE=`grep"modelname"/proc/cpuinfo|sort-u|tr-s''|awk'BEGIN{FS=":"}{print$2}'` CPUMODE=`grep "modelname" /proc/cpuinfo|sort-u|tr-s''|awk'BEGIN{FS=":"}{print$2}'`
echo$CPUMODE echo $CPUMODE
#getprocessorfrequency #getprocessorfrequency
CPUFREQ=`awk'{printf"%.2f",$1/1000000}'/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq` CPUFREQ=`awk'{printf "%.2f",$1/1000000}'/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq`
echo"Frequency:"$CPUFREQGHz echo "Frequency:"$CPUFREQGHz
#getcachesize #getcachesize
CACHESIZE=`grep'cachesize'/proc/cpuinfo|sort-u|awk'{print$4/1024}'` CACHESIZE=`grep 'cachesize' /proc/cpuinfo|sort -u|awk'{print $4/1024}'`
echo"Cachesize:"$CACHESIZEMB echo "Cachesize:"$CACHESIZEMB
#getnumberofsockets #getnumberofsockets
NUMSOCK=${1:-`grep'physicalid'/proc/cpuinfo|sort-u|wc-l`} NUMSOCK=${1:-`grep 'physicalid' /proc/cpuinfo|sort -u|wc -l`}
echo$NUMSOCKsockets echo $NUMSOCKsockets
#getidofeachsocket #getidofeachsocket
socketidlist=`grep"physicalid"/proc/cpuinfo|sort-u|awk'{(es=="")?es=$4:es=es""$4}END{printes}'` socketidlist=`grep "physicalid" /proc/cpuinfo|sort-u|awk'{(es=="")?es=$4:es=es""$4}END{printes}'`
echo"IDsofSockets:"$socketidlist echo "IDsofSockets:"$socketidlist
#getnumberoflogicalcores #getnumberoflogicalcores
NUMPROC=${1:-`grep-c'processor'/proc/cpuinfo`} NUMPROC=${1:-`grep -c 'processor' /proc/cpuinfo`}
echo""$NUMPROClogicalcoresintotal echo ""$NUMPROC logical cores in total
#getnumberofphysicalcorespersocket #getnumberofphysicalcorespersocket
NUMPHYSCORE=${1:-`grep'cpucores'/proc/cpuinfo|sort-u|awk'{print$4}'`} NUMPHYSCORE=${1:-`grep 'cpucores' /proc/cpuinfo|sort -u|awk'{print $4}'`}
#TOTALCORES=${1:-`grep'coreid'/proc/cpuinfo|wc-l`} #TOTALCORES=${1:-`grep 'coreid' /proc/cpuinfo|wc -l`}
#NUMPHYSCORE=`expr$TOTALCORES/$NUMSOCK` #NUMPHYSCORE=`expr $TOTALCORES/$NUMSOCK`
echo""$NUMPHYSCOREphysicalcorespersocket echo ""$NUMPHYSCOREphysicalcorespersocket
#getnumberoflogicalcorespersocket #getnumberoflogicalcorespersocket
firstsocket=`echo$socketidlist|cut-d''-f1` firstsocket=`echo $socketidlist|cut -d'' -f1`
NUMLOGICORE=${1:-`awk'/physicalid\t:'$firstsocket'/'/proc/cpuinfo|wc-l`} NUMLOGICORE=${1:-`awk'/physicalid\t:'$firstsocket'/'/proc/cpuinfo|wc -l`}
#or #or
#NUMLOGICORE=${1:-`expr$NUMPROC/$NUMSOCK`} #NUMLOGICORE=${1:-`expr $NUMPROC/$NUMSOCK`}
echo""$NUMLOGICORElogicalcorespersocket echo ""$NUMLOGICORE logical cores per socket
#getnumberofnumanodes #getnumberofnumanodes
NUMNODE=${1:-`numactl--hardware|awk'/available:/{print$2}'`} NUMNODE=${1:-`numactl --hardware|awk'/available:/{print $2}'`}
#socketnum=$NUMSOCK #socketnum=$NUMSOCK
#while(($socketnum>${2:-0}));do #while(($socketnum>${2:-0}));do
#corenum=`awk'BEGIN{print'$socketnum'*'$NUMLOGICORE'}'/dev/null` #corenum=`awk'BEGIN{print'$socketnum'*'$NUMLOGICORE'}'/dev/null`
#cpuset=`grep-E"processor|physicalid|coreid|cpuid"/proc/cpuinfo|grep-A2"processor"|awk'BEGIN{FS="\n";RS="--\n"}{print$2,$3,$1}'|awk'{print$4,$8,a[$4$8]++,$11}'|sort-r|awk'NR<='$corenum'{(es=="")?es=$4:es=es","$4}END{printes}'` #cpuset=`grep-E"processor|physicalid|coreid|cpuid"/proc/cpuinfo|grep-A2"processor"|awk'BEGIN{FS="\n";RS="--\n"}{print $2,$3,$1}'|awk'{print$4,$8,a[$4$8]++,$11}'|sort-r|awk'NR<='$corenum'{(es=="")? es=$4:es=es","$4}END{prints}'`
#echo$corenum #echo $corenum
#echo$cpuset #echo $cpuset
#socketnum=`expr$socketnum/2` #socketnum=`expr$socketnum/2`
#done #done
sudo/usr/sbin/dmidecode-t0,2,17>dmiinfo sudo /usr/sbin/dmidecode -t0,2,17>dmiinfo
#getbaseboardinfo #getbaseboardinfo
BOARD=`grep-i-A3"baseboard"dmiinfo|awk'BEGIN{FS=":"}/Manufacturer|ProductName/{(es=="")?es=$2:es=es","$2}END{printes}'` BOARD=`grep-i-A3"baseboard"dmiinfo|awk'BEGIN{FS=":"}/Manufacturer|ProductName/{(es=="")?es=$2:es=es","$2}END {prints}'`
echoMotherBoard:$BOARD echoMotherBoard:$BOARD
#getbiosversion #getbiosversion
BIOS=`grep-i-A3"biosinfo"dmiinfo|awk'BEGIN{FS=":"}/Version|ReleaseDate/{(es=="")?es=$2:es=es","$2}END{printes}'` BIOS=`grep-i-A3"biosinfo"dmiinfo|awk'BEGIN{FS=":"}/Version|ReleaseDate/{(es=="")?es=$2:es=es","$2}END {prints}'`
echo"BIOS:"$BIOS echo "BIOS:" $BIOS
#getmemoryinfo #getmemoryinfo
DIMMNUM=`grep-i-A16"memorydevice$"dmiinfo|grep-c'Size:[0-9]'` DIMMNUM=`grep -i-A16 "memorydevice$"dmiinfo|grep -c 'Size:[0-9]'`
DIMMSIZE=`grep-i-A16"memorydevice$"dmiinfo|grep-m1'Size:[0-9]'|awk'{print$2/1024}'` DIMMSIZE=`grep -i-A16 "memorydevice$"dmiinfo|grep -m1'Size:[0-9]'|awk'{print$2/1024}'`
DIMMTYPE=`grep-i-A16"memorydevice$"dmiinfo|grep-m1'Type:'|awk'BEGIN{FS=":"}{print$2}'` DIMMTYPE=`grep-i-A16"memorydevice$"dmiinfo|grep-m1'Type:'|awk'BEGIN{FS=":"}{print$2}'`
DIMMSPEED=`grep-m1'Speed'dmiinfo|awk'{print$2}'` DIMMSPEED=`grep -m1 'Speed'dmiinfo|awk'{print$2}'`
DIMMPART=`grep-m1'PartNumber:[[:alnum:]]'dmiinfo|awk'{print$3}'` DIMMPART=`grep -m1'PartNumber:[[:alnum:]]'dmiinfo|awk'{print$3}'`
rm-fdmiinfo rm-fdmiinfo
echoMemory:${DIMMNUM}x${DIMMSIZE}GB${DIMMTYPE}-${DIMMSPEED}MHz,${DIMMPART} echoMemory:${DIMMNUM}x${DIMMSIZE}GB ${DIMMTYPE}-${DIMMSPEED}MHz, ${DIMMPART}
#getOSinfo #getOSinfo
OS=`[-e/etc/issue]&&head-n1/etc/issue` OS=`[-e /etc/issue]&&head -n1 /etc/issue`
OS=`[-e/etc/issue]&&cat/etc/issue|sed'/^$/d'|head-n1` OS=`[-e /etc/issue]&&cat /etc/issue|sed'/^$/d'|head-n1`
echoOS:$OS echoOS:$OS
KERNEL=`uname-rm` KERNEL=`uname-rm`
echo""Kernel:$KERNEL echo ""Kernel:$KERNEL
MACHINECONF="$CPUMODE,${CPUFREQ}GHz,${CACHESIZE}MB,${DIMMNUM}x${DIMMSIZE}GB${DIMMTYPE}-${DIMMSPEED},$TURBO,$HT,$NUMA" MACHINECONF="$CPUMODE,${CPUFREQ}GHz,${CACHESIZE}MB,${DIMMNUM}x${DIMMSIZE}GB${DIMMTYPE}-${DIMMSPEED},$TURBO,$HT,$NUMA"
echo$MACHINECONF echo $MACHINECONF
运行info.sh,获取系统CPU、内存、BIOS、sockets、cpunode、DIMMnumber等信息,此类信息自动保存在系统目录下,供程序调用。 Run info.sh to obtain system CPU, memory, BIOS, sockets, cpunode, DIMMnumber and other information. Such information is automatically saved in the system directory for program calls.
设置测试最优参数以及测试循环次数,编译测试源码: Set the optimal parameters of the test and the number of test cycles, and compile the test source code:
运行run_stream.sh脚本,此脚本可以在N值10000000~2000000000范围内侦测该系统配置下最优N值: Run the run_stream.sh script, this script can detect the optimal N value under the system configuration in the range of N value 10000000~2000000000:
#!/bin/bash #!/bin/bash
foriin$(seq1200)/*循环200次对N值进行测试,选取其中最优数据*/ foriin$(seq1200)/*Cycle 200 times to test the N value, select the best data among them*/
do do
echo-n"${i}0000000"/*将i值乘以10000000作为N值*/ echo-n "${i}0000000"/*Multiply the i value by 10000000 as the N value*/
icc-O3-openmp-DSTREAM_ARRAY_SIZE=${i}0000000-opt-prefetch-distance=64,8-opt-streaming-cache-evict=0-ffreestandingstream2.c-ostream2 icc-O3-openmp-DSTREAM_ARRAY_SIZE=${i}0000000-opt-prefetch-distance=64,8-opt-streaming-cache-evict=0-ffreestandingstream2.c-ostream2
/*对编辑完成的stream脚本进行重新编译*/ /*Recompile the edited stream script*/
#./stream2|grep"^Tri" #./stream2|grep "^Tri"
numactl--physcpubind=0-59./stream2|grep"^Tri"/*绑定CPU物理内核,进行stream测试,并输出结果*/ numactl--physcpubind=0-59./stream2|grep "^Tri"/*Bind CPU physical core, perform stream test, and output the result*/
done。 done.
运行脚本进行测试循环多次,最后获取测试数据,并保留最优数据: Run the script to test the loop multiple times, and finally get the test data and keep the best data:
运行run_stream.max,该脚本通过前面获取的系统信息,对系统运行的服务程序进一步优化,并循环多次,取最优结果存放在stream_omp_v5.4_IC12.0.3.174_80M.csv文件下: Run run_stream.max, the script further optimizes the service program running on the system through the system information obtained earlier, and loops it multiple times, and saves the optimal result in the stream_omp_v5.4_IC12.0.3.174_80M.csv file:
#!/bin/sh #!/bin/sh
#SetupenvandrunStream #SetupenvandrunStream
.~/info.sh/*运行测试准备阶段拷贝在root目录下的info.sh脚本,获取系统信息*/ .~/info.sh/* Run the test preparation stage to copy the info.sh script in the root directory to obtain system information */
mkdir-parchive/*在运行目录下创建一个archive文件夹,用于测试结果的保存*/ mkdir-parchive/*Create an archive folder in the running directory for saving test results*/
#Captureplatformspecifics获取平台信息 #CaptureplatformspecificsCapture platform information
logfile=machine-config-`hostname`.log/*获取主机名*/ logfile=machine-config-`hostname`.log/*get hostname*/
date>$logfile/*获取时间信息*/ date>$logfile/*Get time information*/
uname-a>>$logfile/*获取系统信息*/ uname-a>>$logfile/*get system information*/
cat/etc/redhat-release>>$logfile cat /etc/redhat-release>>$logfile
cat/proc/cpuinfo|grep-i'cpumhz'|uniq>>$logfile/*获取CPU频率信息*/ cat /proc/cpuinfo|grep -i'cpumhz'|uniq>>$logfile/*Get CPU frequency information*/
cat/proc/cpuinfo|grep-icache|uniq>>$logfile/*获取CPU缓存信息*/ cat /proc/cpuinfo|grep-icache|uniq>>$logfile/*Get CPU cache information*/
if[-f/proc/pal/cpu0/cache_info];then if[-f /proc/pal/cpu0/cache_info];then
cat/proc/pal/cpu0/cache_info|grep-B1Size>>$logfile cat /proc/pal/cpu0/cache_info|grep -B1Size>>$logfile
fi the fi
cat/proc/meminfo|grep-i^mem[ft]>>$logfile/*获取CPU对应的内存信息*/ cat/proc/meminfo|grep-i^mem[ft]>>$logfile/*Get the memory information corresponding to the CPU*/
#stream=stream_omp_11.1.073.160M/*定义stream路径*/ #stream=stream_omp_11.1.073.160M/*Define stream path*/
stream=stream_omp_v5.4_IC12.0.3.174_80M stream=stream_omp_v5.4_IC12.0.3.174_80M
#_v5.4_IC11.1.073.160M #_v5.4_IC11.1.073.160M
#LD_LIBRARY_PATH=/usr/local/lib/*在系统环境变量下添加stream信息*/ #LD_LIBRARY_PATH=/usr/local/lib/*Add stream information under system environment variables*/
#LD_LIBRARY_PATH=/root/benchmark/Stream/stream_omp_v5.4_IC12.0.3.174 #LD_LIBRARY_PATH=/root/benchmark/Stream/stream_omp_v5.4_IC12.0.3.174
exportLD_LIBRARY_PATH exportLD_LIBRARY_PATH
exportKMP_AFFINITY=scatter/*添加scaterr服务*/ exportKMP_AFFINITY=scatter/*add scaterr service*/
exportOMP_NUM_THREADS=$NUMPROC exportOMP_NUM_THREADS=$NUMPROC
echonever>/sys/kernel/mm/redhat_transparent_hugepage/enabled/*开启redhat_transparent_hugepage服务*/ echonever>/sys/kernel/mm/redhat_transparent_hugepage/enabled/*Enable redhat_transparent_hugepage service*/
cat/sys/kernel/mm/redhat_transparent_hugepage/enabled/*开启redhat_transparent_hugepage服务*/ cat/sys/kernel/mm/redhat_transparent_hugepage/enabled/*Enable redhat_transparent_hugepage service*/
resfile=$stream-`hostname`-allresult.csv/*将测试结果保存为对应csv文件*/ resfile=$stream-`hostname`-allresult.csv/*Save the test results as corresponding csv files*/
if[!-f"$resfile"];then if[!-f "$resfile"];then
echo"TIME,CPU,FREQ,CACHE,MEM,TURBO,HT,NUMA,logicalcores,#ofSockets,Copy,Scale,Add,Triad">>$resfile echo "TIME,CPU,FREQ,CACHE,MEM,TURBO,HT,NUMA,logicalcores,#ofSockets,Copy,Scale,Add,Triad">>$resfile
fi the fi
ttime=`date+%y-%m-%d-%H:%M:%S` ttime=`date+%y-%m-%d-%H:%M:%S`
runlog=$stream-${NUMSOCK}S${NUMPROC}T-`hostname`-$ttime.log runlog=$stream-${NUMSOCK}S${NUMPROC}T-`hostname`-$ttime.log
/*测试记录log信息*/ /*Test record log information*/
#loop=10/*设置每次运行次循环次数*/ #loop=10/*Set the number of loops per run*/
max_triad=0/*定义max_traid用于保存测试结果的最大值*/ max_triad=0/*Define max_traid to save the maximum value of test results*/
while(($loop>0));do while(($loop>0));do
OMP_NUM_THREADS=$NUMPROCnumactl--physcpubind=${cpuset[NUMSOCK]}-l./$stream|tee-a$runlog OMP_NUM_THREADS=$NUMPROCnumactl --physcpubind=${cpuset[NUMSOCK]}-l./$stream|tee-a$runlog
tail-n7$runlog>temp_result tail -n7 $runlog>temp_result
temp=`awk'/Triad:/{printf("%.0f",$2*10)}'temp_result` temp=`awk'/Triad:/{printf("%.0f",$2*10)}'temp_result`
if[$temp-gt$max_triad];then if[$temp-gt$max_triad];then
copy=`awk'/Copy:/{printf("%.1f",$2)}'temp_result` copy=`awk'/Copy:/{printf("%.1f",$2)}'temp_result`
scale=`awk'/Scale:/{printf("%.1f",$2)}'temp_result` scale=`awk'/Scale:/{printf("%.1f",$2)}'temp_result`
add=`awk'/Add:/{printf("%.1f",$2)}'temp_result` add=`awk'/Add:/{printf("%.1f",$2)}'temp_result`
triad=`awk'/Triad:/{printf("%.1f",$2)}'temp_result` triad=`awk'/Triad:/{printf("%.1f",$2)}'temp_result`
max_triad=$temp max_triad=$temp
fi the fi
loop=`expr$loop-1` loop=`expr$loop-1`
done done
rmtemp_result rmtemp_result
echo$ttime,$MACHINECONF,$NUMPROC,$NUMSOCK-Socket,$copy,$scale,$add,$triad>>$resfile/*将测试结果拷贝到定义的文件夹内*/ echo $ttime, $MACHINECONF, $NUMPROC, $NUMSOCK-Socket, $copy, $scale, $add, $triad>>$resfile/*Copy the test results to the defined folder*/
mv*.logarchive/。 mv*.logarchive/.
从本实施例所述内存读写带宽优化测试方法的技术方案可知,该测试方法通过脚本自动屏蔽系统中很多不相关服务占据系统资源所导致测试数据不准确的问题,并可以通过对系统的分析,自动获取该系统下较优的参数设置,从而自动输出测试结果,实现了内存读写带宽优化测试的半自动化。 As can be seen from the technical scheme of the memory read-write bandwidth optimization test method described in this embodiment, the test method automatically shields the problem of inaccurate test data caused by the occupation of system resources by many irrelevant services in the system through scripts, and can be analyzed through the system. , to automatically obtain the optimal parameter settings under the system, so as to automatically output the test results, and realize the semi-automation of the memory read and write bandwidth optimization test.
上述具体实施方式仅是本发明的具体个案,本发明的专利保护范围包括但不限于上述具体实施方式,任何符合本发明的权利要求书的且任何所属技术领域的普通技术人员对其所做的适当变化或替换,皆应落入本发明的专利保护范围。 The above-mentioned specific embodiments are only specific cases of the present invention, and the scope of patent protection of the present invention includes but is not limited to the above-mentioned specific embodiments, any claims that meet the claims of the present invention and any ordinary skilled person in the technical field. Appropriate changes or substitutions should fall within the scope of patent protection of the present invention.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510457917.9A CN105117310A (en) | 2015-07-30 | 2015-07-30 | Linux system-based memory read-write bandwidth optimization test method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510457917.9A CN105117310A (en) | 2015-07-30 | 2015-07-30 | Linux system-based memory read-write bandwidth optimization test method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105117310A true CN105117310A (en) | 2015-12-02 |
Family
ID=54665308
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510457917.9A Pending CN105117310A (en) | 2015-07-30 | 2015-07-30 | Linux system-based memory read-write bandwidth optimization test method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105117310A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105653978A (en) * | 2015-12-29 | 2016-06-08 | 北京握奇智能科技有限公司 | Method and system for improving TEE command execution speed |
CN107168836A (en) * | 2017-05-15 | 2017-09-15 | 郑州云海信息技术有限公司 | A kind of method and device of server memory bandwidth test |
CN107562585A (en) * | 2017-08-11 | 2018-01-09 | 郑州云海信息技术有限公司 | A kind of method of automatic test memory performance |
CN109117328A (en) * | 2018-07-24 | 2019-01-01 | 郑州云海信息技术有限公司 | Memory bandwidth optimization method, device and the storage medium of isomery mixing memory system |
CN114237496A (en) * | 2021-12-01 | 2022-03-25 | 苏州浪潮智能科技有限公司 | Method, device and computer equipment for optimizing multi-channel system memory read and write performance |
CN116737481A (en) * | 2023-08-07 | 2023-09-12 | 麒麟软件有限公司 | Operating system optimization method for scanning size in automatic NUMA balance characteristic |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090300444A1 (en) * | 2008-06-03 | 2009-12-03 | Micron Technology, Inc. | Method and apparatus for testing high capacity/high bandwidth memory devices |
CN102567158A (en) * | 2011-12-31 | 2012-07-11 | 曙光信息产业股份有限公司 | Testing method and testing device for memory bandwidth |
CN102681940A (en) * | 2012-05-15 | 2012-09-19 | 兰雨晴 | Method for carrying out performance test on memory management subsystem of Linux operation system |
CN103605589A (en) * | 2013-11-19 | 2014-02-26 | 浪潮电子信息产业股份有限公司 | Method for optimizing performance test on internal memory of Intel MIC (microphone) card |
CN104268050A (en) * | 2014-10-17 | 2015-01-07 | 浪潮电子信息产业股份有限公司 | Simple method for testing memory bandwidth |
CN104268076A (en) * | 2014-09-23 | 2015-01-07 | 浪潮电子信息产业股份有限公司 | Testing method suitable for automatically testing memory bandwidth of each processor platform |
-
2015
- 2015-07-30 CN CN201510457917.9A patent/CN105117310A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090300444A1 (en) * | 2008-06-03 | 2009-12-03 | Micron Technology, Inc. | Method and apparatus for testing high capacity/high bandwidth memory devices |
CN102567158A (en) * | 2011-12-31 | 2012-07-11 | 曙光信息产业股份有限公司 | Testing method and testing device for memory bandwidth |
CN102681940A (en) * | 2012-05-15 | 2012-09-19 | 兰雨晴 | Method for carrying out performance test on memory management subsystem of Linux operation system |
CN103605589A (en) * | 2013-11-19 | 2014-02-26 | 浪潮电子信息产业股份有限公司 | Method for optimizing performance test on internal memory of Intel MIC (microphone) card |
CN104268076A (en) * | 2014-09-23 | 2015-01-07 | 浪潮电子信息产业股份有限公司 | Testing method suitable for automatically testing memory bandwidth of each processor platform |
CN104268050A (en) * | 2014-10-17 | 2015-01-07 | 浪潮电子信息产业股份有限公司 | Simple method for testing memory bandwidth |
Non-Patent Citations (1)
Title |
---|
赵吉志: "内存带宽性能工具STREAM服务器性能测试利器(之二)", 《科技浪潮》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105653978A (en) * | 2015-12-29 | 2016-06-08 | 北京握奇智能科技有限公司 | Method and system for improving TEE command execution speed |
CN105653978B (en) * | 2015-12-29 | 2018-07-24 | 北京握奇智能科技有限公司 | A kind of method and system for improving TEE orders and executing speed |
CN107168836A (en) * | 2017-05-15 | 2017-09-15 | 郑州云海信息技术有限公司 | A kind of method and device of server memory bandwidth test |
CN107562585A (en) * | 2017-08-11 | 2018-01-09 | 郑州云海信息技术有限公司 | A kind of method of automatic test memory performance |
CN109117328A (en) * | 2018-07-24 | 2019-01-01 | 郑州云海信息技术有限公司 | Memory bandwidth optimization method, device and the storage medium of isomery mixing memory system |
CN114237496A (en) * | 2021-12-01 | 2022-03-25 | 苏州浪潮智能科技有限公司 | Method, device and computer equipment for optimizing multi-channel system memory read and write performance |
CN116737481A (en) * | 2023-08-07 | 2023-09-12 | 麒麟软件有限公司 | Operating system optimization method for scanning size in automatic NUMA balance characteristic |
CN116737481B (en) * | 2023-08-07 | 2023-11-24 | 麒麟软件有限公司 | Operating system optimization method for scanning size in automatic NUMA balance characteristic |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lowe-Power et al. | The gem5 simulator: Version 20.0+ | |
Zeuch et al. | Analyzing efficient stream processing on modern hardware | |
CN112041823B (en) | Selective tracing of portions of computer process execution | |
Karandikar et al. | A hardware accelerator for protocol buffers | |
CN105117310A (en) | Linux system-based memory read-write bandwidth optimization test method | |
Murray et al. | {CIEL}: A universal execution engine for distributed {Data-Flow} computing | |
CN112041824B (en) | Selective tracing of portions of computer process execution | |
US9740595B2 (en) | Method and apparatus for producing a benchmark application for performance testing | |
CN103577328A (en) | Method and device for analyzing performance of application | |
Shahid et al. | Additivity: A selection criterion for performance events for reliable energy predictive modeling | |
Servat et al. | Framework for a productive performance optimization | |
US20220100512A1 (en) | Deterministic replay of a multi-threaded trace on a multi-threaded processor | |
Devarajan et al. | Vidya: Performing code-block I/O characterization for data access optimization | |
Sasongko et al. | ComDetective: a lightweight communication detection tool for threads | |
Munera et al. | Experiences on the characterization of parallel applications in embedded systems with extrae/paraver | |
Sharma et al. | Hardware‐assisted instruction profiling and latency detection | |
Vergé et al. | Hardware‐assisted software event tracing | |
Awan et al. | Node architecture implications for in-memory data analytics on scale-in clusters | |
CN106371956A (en) | Method for automatically testing memory performance | |
Loghin et al. | A time–energy performance analysis of MapReduce on heterogeneous systems with GPUs | |
Liu et al. | Pac-Sim: Simulation of multi-threaded workloads using intelligent, live sampling | |
Stunkel et al. | Collecting address traces from parallel computers | |
US11341023B2 (en) | Apparatus, method, and non-transitory computer-readable medium for analyzing trace information | |
Delgado et al. | A case study on porting scientific applications to GPU/CUDA | |
Du et al. | FITDOC: fast virtual machines checkpointing with delta memory compression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20151202 |