CN103218285B - Based on internal memory performance method for supervising and the device of CPU register - Google Patents

Based on internal memory performance method for supervising and the device of CPU register Download PDF

Info

Publication number
CN103218285B
CN103218285B CN201310097941.7A CN201310097941A CN103218285B CN 103218285 B CN103218285 B CN 103218285B CN 201310097941 A CN201310097941 A CN 201310097941A CN 103218285 B CN103218285 B CN 103218285B
Authority
CN
China
Prior art keywords
read
grades
write
cpu
msr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310097941.7A
Other languages
Chinese (zh)
Other versions
CN103218285A (en
Inventor
曹瑞
王雁鹏
王晓静
魏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310097941.7A priority Critical patent/CN103218285B/en
Publication of CN103218285A publication Critical patent/CN103218285A/en
Application granted granted Critical
Publication of CN103218285B publication Critical patent/CN103218285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The present invention proposes a kind of internal memory performance method for supervising based on CPU register, comprises the steps: the type detecting CPU; Type according to CPU obtains three grades of cache miss number of times of the CPU in the unit interval, three grades of buffer memory write-back number of times and buffer storage length; Internal memory according to three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation CPU reads and writes width in real time.The present invention can monitor internal memory performance in real time, thus provides effective guiding opinion for internal memory type selecting, thus controls cost, and increases work efficiency.The invention also discloses a kind of internal memory performance supervising device based on CPU register.

Description

Based on internal memory performance method for supervising and the device of CPU register
Technical field
The present invention relates to computer science and technology field, particularly a kind of internal memory performance method for supervising based on CPU register and device.
Background technology
Along with the development of computer technology, the rise of internet, the data volume of computing machine is explosive growth, therefore needs more powerful computing power as support.What bring is that server product becomes increasingly complex thereupon, and server product relates to CPU, internal memory, SSD(SolidStateDisk, solid state hard disc), the multiple hardwares parts such as hard disk, network interface card.Each hardware component is disparate development on performance, capacity, if mis-arrange, will have a strong impact on overall performance.Therefore how for dissimilar software application, select suitable hardware configuration, become challenge gradually.
In order to for dissimilar software application, select suitable hardware configuration, need to evaluate server memory system requirements.Evaluation content, except traditional capacity requirement, also needs to monitor server memory service condition.
Prior art generally has two kinds of methods for the monitoring of internal memory service condition, below to use X86 server to be widely introduced these two kinds of methods at present:
(1), the order provided in operating system is adopted to monitor, as orders such as top, free.
(2), by the nominal value of internal memory directly obtain, or pass through formula: internal memory performance=internal memory frequency of operation X rambus width/8 calculate internal memory performance.
But existing scheme has following shortcoming:
For the first scheme, although can accomplish real-time monitoring, the use amount be only limited to for internal memory is monitored, and does not monitor in real time internal memory performance.
For first scheme, the numerical value that the result obtained is just theoretical, and the result of non real-time monitoring.The performance of internal memory cannot be detected different application programs, obtain actual service condition, therefore cause internal memory performance not play completely or performed to the phenomenon of the limit.
Meanwhile, existing memory bandwidth analytical technology, cannot the readwrite bandwidth of Real-time Obtaining internal memory under practical situations only based on the frequency of internal memory.
Summary of the invention
The present invention is intended at least to solve one of technical matters existed in prior art.
For this reason, one object of the present invention is to propose a kind of internal memory performance method for supervising based on CPU register, based on CPU, monitors internal memory performance in real time, thus provides effective guiding opinion for internal memory type selecting, reduces costs, increases work efficiency.
Second object of the present invention is to propose a kind of internal memory performance supervising device based on CPU register.
For achieving the above object, the embodiment of first aspect present invention proposes a kind of internal memory performance method for supervising based on CPU register, comprises the steps: the type detecting CPU; Type according to described CPU obtains three grades of cache miss number of times of the CPU in the unit interval, three grades of buffer memory write-back number of times and buffer storage length; According to described three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation, the internal memory of CPU reads and writes width in real time.
Detect the type of CPU according to the internal memory performance method for supervising based on CPU register of the embodiment of the present invention, obtain the disappearance number of times of three grades of buffer memorys of CPU, write-back number of times and buffer storage length, calculate CPU internal memory and read and write width in real time.Can be compatible with x86CPU according to embodiments of the invention, and can monitor in real time the actual service condition of internal memory performance, enrich the content of CPU internal memory monitoring.Better can instruct the type selecting of internal memory, be different application customization internal memory model, improve efficiency, provide cost savings.
In one embodiment of the invention, when type CPU being detected is WESTMERE_EP or NEHALEM_EP, utilize MSR_UNCORE_PerfCntrn register as counter, wherein, the corresponding MSR_UNCORE_PerfEvtSelx configuration of each MSR_UNCORE_PerfCntrn register is deposited, MSR_UNCORE_PerfEvtSelx register comprises function masked bits and function selects position, the corresponding positions in position is selected to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, and obtain the operation for DRAM by the corresponding positions in read functions masked bits, according to the write operation number of times of three grades of buffer memorys and read operation number of times with control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM, obtained by the value read in MSR_UNCORE_PerfCntrn and read and write number of times accordingly.
In one embodiment of the invention, when type CPU being detected is NEHALEM_EX or NEHALEM_EX, utilizes the control counting unit MSR_M0_PMON_CTR0 in M-Box and MSR_M1_PMON_CTR to read operation counting and control; Utilize the control counting unit MSR_B0_PMON_CTR1 in B-Box and MSR_B1_PMON_CTR1 write operation counting and control; Select the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, and obtain the operation for DRAM by the corresponding positions in read functions masked bits; According to the write operation number of times of three grades of buffer memorys and read operation number of times with control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM; Obtained by the value reading M-Box and B-Box and read and write number of times accordingly.
In one embodiment of the invention, when type CPU being detected is SandyBridge-EP, utilize mmap function that passages different for described CPU is mapped to internal memory according to its base address, for mapped passage, utilize corresponding PmonCntr_0 and PmonCntr_1 two address spaces as read-write counter, the function of wherein said PmonCntr_0 address space, specified by PmonCntrCfg_0 register, the function of described PmonCntr_1 address space is specified by PmonCntrCfg_1 register, wherein, PmonCntrCfg_x comprises function masked bits and function selects position, select the corresponding positions in position to carry out record to read-write number of times by read functions, and obtain read operation number of times and the write operation number of times of three grades of buffer memorys by the corresponding positions in read functions masked bits, according to read-write number of times and and the read operation number of times of three grades of buffer memorys and write operation number of times from passage, obtain the number of times of read-write three buffer memorys.
In one embodiment of the invention, the internal memory of described CPU according to described three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation reads and writes width in real time, comprises the steps: described three grades of cache miss number of times, three grades of buffer memory write-back number of times and described buffer storage length to carry out the internal memory that multiplying obtains described CPU and reads and writes width in real time.
The embodiment of second aspect present invention proposes a kind of internal memory performance supervising device based on CPU register, comprises detection module, acquisition module and computing module.Detection module is for detecting the type of CPU; Acquisition module is used for obtaining three grades of cache miss number of times of the CPU in the unit interval, three grades of buffer memory write-back number of times and buffer storage length according to the type of described CPU; Computing module is used for the internal memory of CPU according to described three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation and reads and writes width in real time.
According to the internal memory performance supervising device based on CPU register of the embodiment of the present invention, detection module detects the type of CPU, acquisition module obtains the disappearance number of times of three grades of buffer memorys of CPU, write-back number of times and buffer storage length, and computing module calculates CPU internal memory and reads and writes width in real time.Can be compatible with x86CPU according to embodiments of the invention, and can monitor in real time the actual service condition of internal memory performance, enrich the content of CPU internal memory monitoring.Better can instruct the type selecting of internal memory, be different application customization internal memory model, improve efficiency, provide cost savings.
In one embodiment of the invention, when the type that described detection module detects CPU is WESTMERE_EP or NEHALEM_EP, described acquisition module utilizes MSR_UNCORE_PerfCntrn register as counter, wherein, the corresponding MSR_UNCORE_PerfEvtSelx configuration of each MSR_UNCORE_PerfCntrn register is deposited, MSR_UNCORE_PerfEvtSelx register comprises function masked bits and function selects position, described acquisition module selects the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, and obtain the operation for DRAM by the corresponding positions in read functions masked bits, described acquisition module is according to the write operation number of times of three grades of buffer memorys and read operation number of times and control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM, described acquisition module is obtained by the value read in MSR_UNCORE_PerfCntrn and reads and writes number of times accordingly.
In one embodiment of the invention, when the type that described detection module detects CPU is NEHALEM_EX or NEHALEM_EX, described acquisition module utilizes the control counting unit MSR_M0_PMON_CTR0 in M-Box and MSR_M1_PMON_CTR to read operation counting and controls, utilize the control counting unit MSR_B0_PMON_CTR1 in B-Box and MSR_B1_PMON_CTR1 write operation counting and control, described acquisition module selects the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, and obtain the operation for DRAM by the corresponding positions in read functions masked bits, described acquisition module is according to the write operation number of times of three grades of buffer memorys and read operation number of times and control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM, obtained by the value reading M-Box and B-Box and read and write number of times accordingly.
In one embodiment of the invention, when the type that described detection module detects CPU is SandyBridge-EP, described acquisition module utilizes mmap function that passages different for described CPU is mapped to internal memory according to its base address, for mapped passage, utilize corresponding PmonCntr_0 and PmonCntr_1 two address spaces as read-write counter, the function of wherein said PmonCntr_0 address space, specified by PmonCntrCfg_0 register, the function of described PmonCntr_1 address space is specified by PmonCntrCfg_1 register, wherein, PmonCntrCfg_x comprises function masked bits and function selects position, described acquisition module selects the corresponding positions in position to carry out record to read-write number of times by read functions, and read operation number of times and the write operation number of times of three grades of buffer memorys is obtained by the corresponding positions in read functions masked bits, described acquisition module according to read-write number of times and and the read operation number of times of three grades of buffer memorys and write operation number of times from passage, obtain the number of times of read-write three buffer memorys.
In one embodiment of the invention, described three grades of cache miss number of times, three grades of buffer memory write-back number of times and described buffer storage length are carried out the internal memory that multiplying obtains described CPU and read and write width in real time by described computing module.
Additional aspect of the present invention and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or additional aspect of the present invention and advantage will become obvious and easy understand from accompanying drawing below combining to the description of embodiment, wherein:
Fig. 1 is the process flow diagram of the internal memory performance method for supervising based on CPU register according to a first embodiment of the present invention;
Fig. 2 is the memory architecture schematic diagram of certain CPU according to the embodiment of the present invention;
Fig. 3 is the process flow diagram of the internal memory performance method for supervising based on CPU register according to a second embodiment of the present invention; With
Fig. 4 is the structural representation of the internal memory performance supervising device based on CPU register according to the embodiment of the present invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.
Describe the internal memory performance method for supervising based on CPU register according to the embodiment of the present invention below with reference to Fig. 1, comprise the following steps:
Step S110: the type detecting CPU.
Step S120: the type according to CPU obtains three grades of cache miss number of times of the CPU in the unit interval, three grades of buffer memory write-back number of times and buffer storage length.
Step S130: the internal memory according to three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation CPU reads and writes width in real time.
Can the compatible product based on x86CPU according to the internal memory performance method for supervising based on CPU register of the embodiment of the present invention, with the be applied as example of this method on dissimilar x86CPU, this method is described in detail below.Be understandable that, for the embodiment on x86CPU only for example object, the internal memory performance method for supervising based on CPU register according to the embodiment of the present invention is not limited thereto.
Figure 2 shows that the memory architecture of certain CPU, CPU100 comprises kernel 110, level cache (L1cache) 120, L2 cache (L2cache) 130, three grades of buffer memorys (L3cache) 140 and Memory Controller Hub 150.Wherein, kernel can have multiple, from high-speed cache, read data, and high-speed cache is divided into level cache, L2 cache and three grades of buffer memorys, and CPU access order priority is level cache, L2 cache, three grades of buffer memorys.High-speed cache is connected with Memory Controller Hub 150, and Memory Controller Hub 150 controls the exchanges data between internal memory 200 and CPU100.Use DRAM (DynamicRandomAccessMemory, dynamic RAM) as internal memory in this example.DRAM is Installed System Memory type common at present.Be understandable that, only for illustrative purposes, the embodiment of the present invention is not limited thereto above-mentioned architecture.
When internal memory and CPU exchange data, Memory Controller Hub 150 is inquired about in three grades of buffer memorys 140 according to the physical address of internal memory, if not hit, appropriate address in internal memory is write three grades of buffer memorys 140 by Memory Controller Hub 150 in units of buffer storage length.When three grades of buffer memorys 140 need to swap out, the data in three grades of buffer memorys 140 write back internal memory equally in units of buffer storage length.
In one embodiment of the invention, the internal memory getting three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation CPU reads and writes width in real time, three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer storage length is carried out the internal memory that multiplying obtains CPU and reads and writes width in real time.
PMC unit in X86CPU can realize the counting to CPU internal event, but because the realization of cache in dissimilar architecture differs greatly, therefore, the event of the three grades of cache miss number of times and three grades of buffer memory write-back number of times that are equivalent to CPU also has larger difference, needs to combine multiple event analysis.Therefore, as shown in step S110, the type detecting CPU is first needed.Be described for these three kinds of X86CPU of WESTMERE_EP or NEHALEM_EP, EHALEM_EX or NEHALEM_EX and SandyBridge-EP respectively below, be understandable that, only for illustrative purposes, embodiments of the invention are not limited thereto this three types.
Below with reference to Fig. 3, the internal memory performance method for supervising based on CPU register of the embodiment of the present invention is described in detail.
Step S310, judges CPU model, if CPU model is WESTMERE_EP or NEHALEM_EP, then performs step S311, if CPU model is NEHALEM_EX or NEHALEM_EX, performs step S312, if CPU model is Jaketown.Then, step S314 and step S315 is performed respectively.Wherein, step S314 calculates the register value read.Step S315 is for exporting result of calculation.Respectively the process of three types CPU counter register value is described below.
(1), when type CPU being detected is WESTMERE_EP or NEHALEM_EP, following steps are performed:
Step S1211: utilize MSR_UNCORE_PerfCntrn register as counter.
Wherein, the corresponding MSR_UNCORE_PerfEvtSelx configuration register of each MSR_UNCORE_PerfCntrn register, MSR_UNCORE_PerfEvtSelx register comprises function masked bits and function selects position, concrete, [15:8] position of MSR_UNCORE_PerfEvtSelx register is function masked bits, and [7:0] position is that function selects position.
Step S1212: select the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, and obtain the operation for DRAM by the corresponding positions in read functions masked bits.
Concrete, select, in position, to choose the write operation number of times that 2FH is record three grades of buffer memorys, choose the read operation number of times that 2CH is record three grades of buffer memorys in function.In function masked bits, choosing 07H is the operation carried out DRAM.
Step S1213: according to the write operation number of times of three grades of buffer memorys and read operation number of times with control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM.By conbined usage function selection code position and function masked bits, according to write operation number of times and the read operation number of times of three grades of buffer memorys with control for the operation of DRAM, realize the record to three grades of cache read write operation number of times.
Step S1214: obtained by the value read in MSR_UNCORE_PerfCntrn and read and write number of times accordingly.
(2), when type CPU being detected is NEHALEM_EX or NEHALEM_EX, Box is employed in CPU due to NEHALEM_EX or NEHALEM_EX class, Box deposits the control register of same function and the container of counter register, therefore needs to obtain the numerical value of relevant each Box.
Step S1221: utilize the control counting unit MSR_M0_PMON_CTR0 in M-Box and MSR_M1_PMON_CTR to read operation counting and control.
M-Box is used to carry out read operation counting and control.Comprise two groups in M-Box and control counting unit, be respectively MSR_M0_PMON_CTR0 and MSR_M1_PMON_CTR0, wherein, the function of read operation counting and control is defined in MSR_M0_PMON_EVNT_SEL0 and MSR_M1_PMON_EVNT_SEL0 two registers.
Step S1222: utilize the control counting unit MSR_B0_PMON_CTR1 in B-Box and MSR_B1_PMON_CTR1 write operation counting and control.
B-Box is used to carry out write operation counting and control.Comprise two groups in B-Box and control counting unit, be respectively MSR_B0_PMON_CTR1 and MSR_B1_PMON_CTR1.Write operation counting and the function controlled are defined in MSR_B0_PMON_EVNT_SEL1 and MSR_B1_PMON_EVNT_SEL1 two registers.
Step S1223: select the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, and obtain the operation for DRAM by the corresponding positions in read functions masked bits.
Concrete, select, in position, to choose the read operation number of times that 0DH records three grades of buffer memorys, choose the write operation number of times that 18H records three grades of buffer memorys in function.In function masked bits, choose 07H and represent operation for DRAM.
Step S1224: according to the write operation number of times of three grades of buffer memorys and read operation number of times with control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM.
By conbined usage function selection code and function mask, according to write operation number of times and the read operation number of times of three grades of buffer memorys with control for the operation of DRAM, record can be carried out to the read-write operation number of times of three grades of buffer memorys.
Step S1225: obtained by the value reading M-Box and B-Box and read and write number of times accordingly.
(3), when detecting that the type of CPU is SandyBridge-EP(and Jaketown) time, need the number of operations obtaining three grades of buffer memorys from the passage of memory system, specifically need perform following steps:
Step S1231: utilize mmap function that passages different for CPU is mapped to internal memory according to its base address, for mapped passage, utilize corresponding PmonCntr_0 and PmonCntr_1 two address spaces as read-write counter, the wherein function of PmonCntr_0 address space, specified by PmonCntrCfg_0 register, the function of PmonCntr_1 address space is specified by PmonCntrCfg_1 register, and wherein, PmonCntrCfg_x comprises function masked bits and function selects position.
Concrete, in PmonCntrCfg_x, [15:8] is function masked bits, and [7:0] is function selection position.
Step S1222: select the corresponding positions in position to carry out record to read-write number of times by read functions, and obtain read operation number of times and the write operation number of times of three grades of buffer memorys by the corresponding positions in read functions masked bits.
Concrete, selecting in position in function, choosing 04H for carrying out record to read-write number of times.In function masked bits, choose 03H and record is carried out to the read operation number of times of three grades of buffer memorys, choose 12H and record is carried out to the write operation number of times of three grades of buffer memorys
Step S1223: according to read-write number of times and and the read operation number of times of three grades of buffer memorys and write operation number of times from passage, obtain the number of times of read-write three buffer memorys.
By conbined usage function selection code position and function masked bits, according to read-write number of times and and the read operation number of times of three grades of buffer memorys and write operation number of times, the number of times of read-write three grades of buffer memorys can be obtained from different passages.
According to the embodiment of the present invention based on CPU register internal memory performance method for supervising by be CPU architecture analysis after, the method of a kind of real-time calculating memory bandwidth proposed, first the type of CPU is detected, according to dissimilar CPU being detected, the analysis of CPU internal event is combined to the disappearance number of times of the three grades of buffer memorys obtaining CPU, write-back number of times and buffer storage length, calculate CPU internal memory and read and write width in real time.This method can be monitored in real time to the actual service condition of internal memory performance, has enriched the content of CPU internal memory monitoring.Better can instruct the type selecting of internal memory, be different application customization internal memory model, improve efficiency, provide cost savings.
Describe the internal memory performance supervising device 200 based on CPU register according to the embodiment of the present invention below with reference to Fig. 4, comprise detection module 410, acquisition module 420 and computing module 430.Wherein, detection module 410 is for detecting the type of CPU; Acquisition module 420 obtains three grades of cache miss number of times of the CPU in the unit interval, three grades of buffer memory write-back number of times and buffer storage length for the type according to CPU; Computing module 430 is for reading and writing width in real time according to the internal memory of three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation CPU.
Can compatible x86CPU according to the internal memory performance supervising device based on CPU register of the embodiment of the present invention, with the be applied as example of this device on dissimilar x86CPU, this device is described in detail below.Be understandable that, for the embodiment on x86CPU only for example object, the internal memory performance supervising device based on CPU register according to the embodiment of the present invention is not limited thereto.
Figure 2 shows that the memory architecture of certain CPU, CPU100 comprises kernel 110, level cache 120, L2 cache 130, three grades of buffer memorys 140 and Memory Controller Hub 150.Wherein, kernel 110 can have multiple, from high-speed cache, read data, and high-speed cache is divided into level cache, L2 cache and three grades of buffer memorys, and CPU access order priority is level cache, L2 cache, three grades of buffer memorys.High-speed cache is connected with Memory Controller Hub 150, and Memory Controller Hub 150 controls the exchanges data between internal memory 200 and CPU100.Use DRAM (DynamicRandomAccessMemory, dynamic RAM) as internal memory in this example.DRAM is Installed System Memory type common at present.Be understandable that, only for illustrative purposes, the embodiment of the present invention is not limited thereto above-mentioned architecture.
When internal memory and CPU exchange data, Memory Controller Hub 150 is inquired about in three grades of buffer memorys 140 according to the physical address of internal memory, if not hit, appropriate address in internal memory is write three grades of buffer memorys 140 by Memory Controller Hub 150 in units of buffer storage length.When three grades of buffer memorys 140 need to swap out, the data in three grades of buffer memorys 140 write back internal memory equally in units of buffer storage length.
In one embodiment of the invention, three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer storage length are carried out the internal memory that multiplying obtains CPU and read and write width in real time by computing module 430.
PMC unit in X86CPU can realize the counting to CPU internal event, but because the realization of cache in dissimilar architecture differs greatly, therefore, the event of the three grades of cache miss number of times and three grades of buffer memory write-back number of times that are equivalent to CPU also has larger difference, needs to combine multiple event analysis.Therefore first detection module 410 detects the type of CPU.Be described for these three kinds of X86CPU of WESTMERE_EP or NEHALEM_EP, NEHALEM_EX or NEHALEM_EX and SandyBridge-EP respectively below, be understandable that, only for illustrative purposes, embodiments of the invention are not limited thereto this three types.
(1), when the type that detection module 410 detects CPU is WESTMERE_EP or NEHALEM_EP, acquisition module 420 utilizes MSR_UNCORE_PerfCntrn register as counter.Wherein, the corresponding MSR_UNCORE_PerfEvtSelx configuration of each MSR_UNCORE_PerfCntrn register is deposited, and MSR_UNCORE_PerfEvtSelx register comprises function masked bits and function selects position.Concrete, [15:8] position of MSR_UNCORE_PerfEvtSelx register is function masked bits, and [7:0] position is that function selects position.Acquisition module 420 selects the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, and obtains the operation for DRAM by the corresponding positions in read functions masked bits.Concrete, select, in position, to choose the write operation number of times that 2FH is record three grades of buffer memorys, choose the read operation number of times that 2CH is record three grades of buffer memorys in function.In function masked bits, choosing 07H is the operation carried out DRAM.Acquisition module 420 is according to the write operation number of times of three grades of buffer memorys and read operation number of times and control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM, and acquisition module 420 is obtained by the value read in MSR_UNCORE_PerfCntrn and reads and writes number of times accordingly.Acquisition module 420 is according to the write operation number of times of three grades of buffer memorys and read operation number of times and control for the operation of DRAM, realizes the record to three grades of cache read write operation number of times.
(2), when the type that detection module 410 detects CPU is NEHALEM_EX or NEHALEM_EX, Box is employed in CPU due to NEHALEM_EX or NEHALEM_EX class, Box deposits the control register of same function and the container of counter register, therefore needs to obtain the numerical value of relevant each Box.
Comprise two groups in M-Box and control counting unit, acquisition module 420 utilizes the control counting unit MSR_M0_PMON_CTR0 in M-Box and MSR_M1_PMON_CTR to read operation counting and controls, and utilizes the control counting unit MSR_B0_PMON_CTR1 in B-Box and MSR_B1_PMON_CTR1 to write operation counting and controls.
Acquisition module 420 selects the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, obtains the operation for DRAM by the corresponding positions in read functions masked bits.Acquisition module 420 is according to the write operation number of times of three grades of buffer memorys and read operation number of times and control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM, is obtained read and write number of times accordingly by the value reading M-Box and B-Box.
Wherein, comprise two groups in B-Box and control counting unit, be respectively MSR_B0_PMON_CTR1 and MSR_B1_PMON_CTR1.Write operation counting and the function controlled are defined in MSR_B0_PMON_EVNT_SEL1 and MSR_B1_PMON_EVNT_SEL1 two registers.Select, in position, to choose the read operation number of times that 0DH records three grades of buffer memorys, choose the write operation number of times that 18H records three grades of buffer memorys in function.In function masked bits, choose 07H and represent operation for DRAM.Acquisition module 420, by conbined usage function selection code and function mask, according to write operation number of times and the read operation number of times of three grades of buffer memorys with control for the operation of DRAM, can carry out record to the read-write operation number of times of three grades of buffer memorys.
(3), when the type that detection module 410 detects CPU is SandyBridge-EP, the number of operations obtaining three grades of buffer memorys from the passage of memory system is needed.Acquisition module 420 utilizes mmap function that passages different for CPU is mapped to internal memory according to its base address, for mapped passage, utilize corresponding PmonCntr_0 and PmonCntr_1 two address spaces as read-write counter, the wherein function of PmonCntr_0 address space, specified by PmonCntrCfg_0 register, the function of PmonCntr_1 address space is specified by PmonCntrCfg_1 register, and wherein, PmonCntrCfg_x comprises function masked bits and function selects position.Concrete, in PmonCntrCfg_x, [15:8] is function masked bits, and [7:0] is function selection position.Acquisition module 420 selects the corresponding positions in position to carry out record to read-write number of times by read functions, and obtains read operation number of times and the write operation number of times of three grades of buffer memorys by the corresponding positions in read functions masked bits.Concrete, selecting in position in function, choosing 04H for carrying out record to read-write number of times.In function masked bits, choose 03H and record carried out to the read operation number of times of three grades of buffer memorys, choose 12H be to the write operation number of times of three grades of buffer memorys carry out recording acquisition module 420 according to read-write number of times and and the read operation number of times of three grades of buffer memorys and write operation number of times from passage, obtain the number of times of read-write three buffer memorys.Acquisition module 220 by conbined usage function selection code position and function masked bits, according to read-write number of times and and the read operation number of times of three grades of buffer memorys and write operation number of times, the number of times of read-write three grades of buffer memorys can be obtained from different passages.
After passing through the analysis to CPU architecture according to the internal memory performance supervising device based on CPU register of the embodiment of the present invention, the device of a kind of real-time calculating memory bandwidth proposed, first detection module detects the type of CPU, acquisition module is according to dissimilar CPU being detected, the analysis of CPU internal event is combined to the disappearance number of times of the three grades of buffer memorys obtaining CPU, write-back number of times and buffer storage length, computing module calculates CPU internal memory and reads and writes width in real time.This device can be monitored in real time to the actual service condition of internal memory performance, has enriched the content of CPU internal memory monitoring.Better can instruct the type selecting of internal memory, be different application customization internal memory model, improve efficiency, provide cost savings.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, identical embodiment or example are not necessarily referred to the schematic representation of above-mentioned term.And the specific features of description, structure, material or feature can combine in an appropriate manner in any one or more embodiment or example.
Although illustrate and describe embodiments of the invention, for the ordinary skill in the art, be appreciated that and can carry out multiple change, amendment, replacement and modification to these embodiments without departing from the principles and spirit of the present invention, scope of the present invention is by claims and equivalency thereof.

Claims (10)

1., based on an internal memory performance method for supervising for CPU register, it is characterized in that, comprise the steps:
Detect the type of CPU;
Type according to described CPU obtains three grades of cache miss number of times of the CPU in the unit interval, three grades of buffer memory write-back number of times and buffer storage length; And
According to described three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation, the internal memory of CPU reads and writes width in real time.
2. the method for claim 1, is characterized in that, when type CPU being detected is WESTMERE_EP or NEHALEM_EP, utilizes MSR_UNCORE_PerfCntrn register as counter,
Wherein, the corresponding MSR_UNCORE_PerfEvtSelx register of each MSR_UNCORE_PerfCntrn register, MSR_UNCORE_PerfEvtSelx register comprises function masked bits and function selects position,
Select the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, and obtain the operation for DRAM by the corresponding positions in read functions masked bits;
According to the write operation number of times of three grades of buffer memorys and read operation number of times with control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM;
Obtained by the value read in MSR_UNCORE_PerfCntrn and read and write number of times accordingly.
3. the method for claim 1, is characterized in that, when type CPU being detected is WESTMERE_EX or NEHALEM_EX,
Utilize the control counting unit MSR_M0_PMON_CTR0 in M-Box and MSR_M1_PMON_CTR to read operation counting and control;
Utilize the control counting unit MSR_B0_PMON_CTR1 in B-Box and MSR_B1_PMON_CTR1 write operation counting and control;
Select the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by reading the function controlling counting unit MSR_M1_PMON_CTR and MSR_B1_PMON_CTR1, and obtain the operation for DRAM by the corresponding positions read in the function masked bits controlling counting unit MSR_M1_PMON_CTR and MSR_B1_PMON_CTR1;
According to the write operation number of times of three grades of buffer memorys and read operation number of times with control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM;
Obtained by the value reading M-Box and B-Box and read and write number of times accordingly.
4. the method for claim 1, is characterized in that, when type CPU being detected is SandyBridge-EP,
Utilize mmap function that passages different for described CPU is mapped to internal memory according to its base address, for mapped passage, utilize corresponding PmonCntr_0 and PmonCntr_1 two address spaces as read-write counter, wherein, the function of described PmonCntr_0 address space is specified by PmonCntrCfg_0 register, the function of described PmonCntr_1 address space is specified by PmonCntrCfg_1 register, and wherein, PmonCntrCfg_x comprises function masked bits and function selects position;
Select the corresponding positions in position to carry out record to read-write number of times by read functions, and obtain read operation number of times and the write operation number of times of three grades of buffer memorys by the corresponding positions in read functions masked bits;
According to read-write number of times and and the read operation number of times of three grades of buffer memorys and write operation number of times from passage, obtain the number of times of read-write three buffer memorys.
5. the method for claim 1, is characterized in that, the internal memory of described CPU according to described three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation reads and writes width in real time, comprises the steps:
Described three grades of cache miss number of times, three grades of buffer memory write-back number of times and described buffer storage length are carried out the internal memory that multiplying obtains described CPU and read and write width in real time.
6., based on an internal memory performance supervising device for CPU register, it is characterized in that, comprising:
Detection module, for detecting the type of CPU;
Acquisition module, obtains three grades of cache miss number of times of the CPU in the unit interval, three grades of buffer memory write-back number of times and buffer storage length for the type according to described CPU; And
Computing module, the internal memory for CPU according to described three grades of cache miss number of times, three grades of buffer memory write-back number of times and buffer memory length computation reads and writes width in real time.
7. device as claimed in claim 6, it is characterized in that, when the type that described detection module detects CPU is WESTMERE_EP or NEHALEM_EP, described acquisition module utilizes MSR_UNCORE_PerfCntrn register as counter, wherein, the corresponding MSR_UNCORE_PerfEvtSelx register of each MSR_UNCORE_PerfCntrn register, MSR_UNCORE_PerfEvtSelx register comprises function masked bits and function selects position, described acquisition module selects the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by read functions, and obtain the operation for DRAM by the corresponding positions in read functions masked bits, described acquisition module is according to the write operation number of times of three grades of buffer memorys and read operation number of times and control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM, described acquisition module is obtained by the value read in MSR_UNCORE_PerfCntrn and reads and writes number of times accordingly.
8. device as claimed in claim 6, it is characterized in that, when the type that described detection module detects CPU is WESTMERE_EP or NEHALEM_EX, described acquisition module utilizes the control counting unit MSR_M0_PMON_CTR0 in M-Box and MSR_M1_PMON_CTR to read operation counting and controls, utilize the control counting unit MSR_B0_PMON_CTR1 in B-Box and MSR_B1_PMON_CTR1 write operation counting and control, described acquisition module selects the corresponding positions in position to obtain write operation number of times and the read operation number of times of three grades of buffer memorys respectively by reading the function controlling counting unit MSR_M1_PMON_CTR and MSR_B1_PMON_CTR1, and obtain the operation for DRAM by the corresponding positions read in the function masked bits controlling counting unit MSR_M1_PMON_CTR and MSR_B1_PMON_CTR1, described acquisition module is according to the write operation number of times of three grades of buffer memorys and read operation number of times and control to carry out record to the read-write operation number of times of three grades of buffer memorys for the operation of DRAM, obtained by the value reading M-Box and B-Box and read and write number of times accordingly.
9. device as claimed in claim 6, it is characterized in that, when the type that described detection module detects CPU is SandyBridge-EP, described acquisition module utilizes mmap function that passages different for described CPU is mapped to internal memory according to its base address, for mapped passage, utilize corresponding PmonCntr_0 and PmonCntr_1 two address spaces as read-write counter, the function of wherein said PmonCntr_0 address space, specified by PmonCntrCfg_0 register, the function of described PmonCntr_1 address space is specified by PmonCntrCfg_1 register, wherein, PmonCntrCfg_x comprises function masked bits and function selects position, described acquisition module selects the corresponding positions in position to carry out record to read-write number of times by read functions, and read operation number of times and the write operation number of times of three grades of buffer memorys is obtained by the corresponding positions in read functions masked bits, described acquisition module according to read-write number of times and and the read operation number of times of three grades of buffer memorys and write operation number of times from passage, obtain the number of times of read-write three buffer memorys.
10. device as claimed in claim 6, is characterized in that, described three grades of cache miss number of times, three grades of buffer memory write-back number of times and described buffer storage length are carried out the internal memory that multiplying obtains described CPU and read and write width in real time by described computing module.
CN201310097941.7A 2013-03-25 2013-03-25 Based on internal memory performance method for supervising and the device of CPU register Active CN103218285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310097941.7A CN103218285B (en) 2013-03-25 2013-03-25 Based on internal memory performance method for supervising and the device of CPU register

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310097941.7A CN103218285B (en) 2013-03-25 2013-03-25 Based on internal memory performance method for supervising and the device of CPU register

Publications (2)

Publication Number Publication Date
CN103218285A CN103218285A (en) 2013-07-24
CN103218285B true CN103218285B (en) 2015-11-25

Family

ID=48816104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310097941.7A Active CN103218285B (en) 2013-03-25 2013-03-25 Based on internal memory performance method for supervising and the device of CPU register

Country Status (1)

Country Link
CN (1) CN103218285B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786668A (en) * 2016-04-01 2016-07-20 浪潮电子信息产业股份有限公司 Memory error detection method based on Redhat system
CN107562585A (en) * 2017-08-11 2018-01-09 郑州云海信息技术有限公司 A kind of method of automatic test memory performance
CN107911752A (en) * 2017-11-15 2018-04-13 晶晨半导体(上海)股份有限公司 A kind of bandwidth analysis method
CN111338884B (en) * 2018-12-19 2023-06-16 北京嘀嘀无限科技发展有限公司 Cache miss rate monitoring method and device, electronic equipment and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101681289A (en) * 2007-06-27 2010-03-24 国际商业机器公司 Processor performance monitoring
CN102368224A (en) * 2011-06-29 2012-03-07 奇智软件(北京)有限公司 Processing method and device for hardware detection

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7500138B2 (en) * 2006-05-17 2009-03-03 International Business Machines Corporation Simplified event selection for a performance monitor unit

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101681289A (en) * 2007-06-27 2010-03-24 国际商业机器公司 Processor performance monitoring
CN102368224A (en) * 2011-06-29 2012-03-07 奇智软件(北京)有限公司 Processing method and device for hardware detection

Also Published As

Publication number Publication date
CN103218285A (en) 2013-07-24

Similar Documents

Publication Publication Date Title
EP0080875B1 (en) Data storage system for a host computer
CN103218285B (en) Based on internal memory performance method for supervising and the device of CPU register
Canim et al. An object placement advisor for DB2 using solid state storage
CN101221534A (en) Dynamic latency map for memory optimization
US8683160B2 (en) Method and apparatus for supporting memory usage accounting
CN103842975A (en) Systems and methods for monitoring and managing memory blocks to improve power savings
Li et al. On the Importance of Evaluating Storage {Systems’}{$ Costs}
KR20150089538A (en) Apparatus for in-memory data management and method for in-memory data management
KR20010039936A (en) A method and system for managing a cache memory
Nishikawa et al. Energy efficient storage management cooperated with large data intensive applications
Dimitrov et al. Memory system characterization of big data workloads
CN1928804A (en) Method and system for power management in a distributed file system
Huang et al. HaLock: Hardware-assisted lock contention detection in multithreaded applications
Xie et al. ASA-FTL: An adaptive separation aware flash translation layer for solid state drives
Uppal et al. Flashy prefetching for high-performance flash drives
CN102681792B (en) Solid-state disk memory partition method
CN103502925A (en) Management method and device of monitoring records
Emrich et al. On the impact of flash SSDs on spatial indexing
Baek et al. Prefetching with adaptive cache culling for striped disk arrays
US5696932A (en) Method and system for estimating minumun requirements on a cache in a computer based storage system
He et al. PSA: A performance and space-aware data layout scheme for hybrid parallel file systems
Karyakin et al. DimmStore: memory power optimization for database systems
US8650367B2 (en) Method and apparatus for supporting memory usage throttling
Kim et al. HybridPlan: a capacity planning technique for projecting storage requirements in hybrid storage systems
DE68924013T2 (en) Method and device for calculating disk access "footprints".

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant