WO2021238257A1 - 一种基于spd的内存监控及寿命预测方法和系统 - Google Patents

一种基于spd的内存监控及寿命预测方法和系统 Download PDF

Info

Publication number
WO2021238257A1
WO2021238257A1 PCT/CN2021/073439 CN2021073439W WO2021238257A1 WO 2021238257 A1 WO2021238257 A1 WO 2021238257A1 CN 2021073439 W CN2021073439 W CN 2021073439W WO 2021238257 A1 WO2021238257 A1 WO 2021238257A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
information
module
monitoring
life prediction
Prior art date
Application number
PCT/CN2021/073439
Other languages
English (en)
French (fr)
Inventor
张芳
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US17/928,118 priority Critical patent/US11714557B2/en
Publication of WO2021238257A1 publication Critical patent/WO2021238257A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • G06F3/0616Improving the reliability of storage systems in relation to life time, e.g. increasing Mean Time Between Failures [MTBF]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C16/00Erasable programmable read-only memories
    • G11C16/02Erasable programmable read-only memories electrically programmable
    • G11C16/06Auxiliary circuits, e.g. for writing into memory
    • G11C16/34Determination of programming status, e.g. threshold voltage, overprogramming or underprogramming, retention
    • G11C16/349Arrangements for evaluating degradation, retention or wearout, e.g. by counting erase cycles
    • G11C16/3495Circuits or methods to detect or delay wearout of nonvolatile EPROM or EEPROM memory devices, e.g. by counting numbers of erase or reprogram cycles, by using multiple memory areas serially or cyclically
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1051Data output circuits, e.g. read-out amplifiers, data output buffers, data output registers, data output level conversion circuits
    • G11C7/1063Control signal output circuits, e.g. status or busy flags, feedback command signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/22Read-write [R-W] timing or clocking circuits; Read-write [R-W] control signal generators or management 

Definitions

  • the present invention relates to the technical field of software development, in particular to a method and system for memory monitoring and life prediction based on SPD.
  • server users have a high demand for memory, both in terms of capacity and efficiency. Once the memory is damaged, the server will not be able to boot or other serious losses. Therefore, the need for memory monitoring and life prediction has long become a key requirement. .
  • the present invention proposes a method and system for memory monitoring and life prediction based on SPD, which can perform memory monitoring and life prediction according to demand.
  • the present invention proposes a SPD-based memory monitoring and life prediction method and system.
  • the method includes the following steps:
  • the remaining life of each memory bank is calculated by the method of segmental fitting using the least square method.
  • the method also includes setting a time to be executed, performing timing execution monitoring and memory bar life prediction;
  • each of the parameter information in the memory of said server comprising: a rewritable velocity v i, the data access delay time t i, the maximum operating frequency f i, the average operating temperature te i and an average voltage vo i.
  • rv i is the proportion information of the erasing and writing speed
  • rt i is the proportion information of the access data delay time
  • rf i is the proportion information of the maximum frequency during operation
  • rte i is the proportion information of the average temperature during operation
  • rvo i is the proportion information of the average voltage
  • V i is the configuration information erasing speed of the memory module
  • T i is the delay time for the configuration information of the memory module to access data
  • F i is the maximum frequency of operation of the configuration information of the memory module
  • TE i is the average operating temperature of the configuration information of the memory module
  • VO i is the average operating voltage of the configuration information of the memory module.
  • the formula for calculating the final memory state value according to the influence factor and state information is: the memory state
  • ⁇ i is an impact factor
  • ⁇ i represents a random error with a mean value of 0 and a variance of 0.1
  • c is a constant term.
  • the method of calculating the remaining life of each memory module by using the least squares method segmental fitting method according to the used time and status information of each memory module is as follows:
  • the present invention also proposes a SPD-based memory monitoring and life prediction system, which includes an acquisition setting module, a reading calculation module, a determination calculation module, and a fitting calculation module;
  • the obtaining and setting module is used for obtaining parameter information of each memory module in the server, and setting weights for the parameter information;
  • the reading calculation module is used to read the configuration information of each memory bank in the server, calculate the proportion information of each memory bank parameter according to the configuration information and parameter information; calculate the status of each memory bank according to the weight and proportion information information;
  • the determination calculation module is used to determine the impact factor according to the number of CPUs in the server, and the number and location of the memory bars in each CPU; calculate the final memory status value according to the impact factor and status information, and classify the status value;
  • the fitting calculation module is used to calculate the remaining life of each memory module by means of segmented fitting using the least square method according to the used time and status information of each memory module.
  • system also includes an execution module
  • the execution module is used to set the time to be executed, perform timing execution monitoring and memory bar life prediction; or set a cron expression to perform periodic execution monitoring and memory bar life prediction.
  • the embodiment of the present invention proposes a SPD-based memory monitoring and life prediction method and system.
  • the method includes the following steps: obtaining parameter information of each memory module in the server, and setting weights for the parameter information; reading each For the configuration information of the memory bank, calculate the proportion information of each memory bank parameter according to the configuration information and parameter information; calculate the status information of each memory bank according to the weight and proportion information; according to the number of CPUs in the server, and each CPU The number and location of the memory sticks are used to determine the impact factor; the final memory status value is calculated according to the impact factor and status information, and the status value is graded; according to the used time and status information of each memory stick, the least squares method is used for segmentation. Calculate the remaining life of each memory stick in a combined way.
  • the method also includes setting the time to be executed, performing timing execution monitoring and memory bar life prediction; setting cron expressions, performing periodic execution monitoring and memory bar life prediction.
  • the present invention also proposes a SPD-based memory monitoring and life prediction system.
  • the SPD-based memory monitoring and life prediction method of the present invention can provide an overview of the total health status of the server memory and the remaining life prediction, and the parameter status of each memory bar can be viewed.
  • the memory usage and health status can be dynamically monitored immediately, regularly or periodically as needed, and the memory failure can be prevented in advance based on the health status or remaining life, and the memory problem can be checked according to the generated log after the memory failure to locate which one The failure of the memory module and the cause of the failure.
  • Figure 1 shows a flow chart of an SPD-based memory monitoring and life prediction method based on Embodiment 1 of the present invention
  • Figure 2 shows a schematic diagram of an SPD-based memory monitoring and life prediction system based on Embodiment 1 of the present invention.
  • the present invention provides a method and system for memory monitoring and life prediction based on SPD, in which SPD (Serial Presence Detect) serial presence detection is a set of configuration information about memory modules.
  • Fig. 1 shows a flow chart of an SPD-based memory monitoring and life prediction method based on Embodiment 1 of the present invention.
  • step S101 start processing the flow
  • step S102 the parameter information of each of the memory of the server, and to set the weight parameter information; wherein, the parameter information of each memory module in the server comprises: erase speed v i, the data access delay time t i, The maximum frequency f i during operation, the average temperature te i during operation, and the average voltage vo i during operation . Where i is the number of the memory stick. Based on industry experience, the weight of experience value is set for each parameter information of the memory.
  • step S103 the memory-based SPD, reads the configuration information corresponding to respective memory, where, V i is a rewritable memory configuration information of velocity; T i is the delay time information access memory configuration data; F i Is the maximum operating frequency of the configuration information of the memory module ; TE i is the average operating temperature of the configuration information of the memory module; VO i is the average operating voltage of the configuration information of the memory module.
  • the formula for calculating the proportion information of each memory module parameter based on the configuration information and parameter information is:
  • rv i is the proportion information of the erasing and writing speed
  • rt i is the proportion information of the access data delay time
  • rf i is the proportion information of the maximum frequency during operation
  • rte i is the proportion information of the average temperature during operation
  • rvo i is the percentage information of the average voltage.
  • step S104 the impact factor is determined according to the number of CPUs in the server, and the number and location of memory modules in each CPU. There are multiple CPUs in a server, and each CPU can control multiple memory modules, which are inserted in slots at different positions.
  • the influence factor ⁇ i of each memory bar is determined according to the total number of CPUs in the current server, the controlled CPU of the memory bar, and the slot position of the memory.
  • the memory status ⁇ i represents a random error with a mean of 0 and a variance of 0.1; c is a constant term.
  • c is set to 0.05.
  • step S105 the memory state S is classified according to the preset segmentation function, and the corresponding health state is displayed:
  • step S106 the memory used in accordance with the time t d and the memory state s i calculated remaining lifetime T i.
  • Different memory modules have different discrete data.
  • this fitting method can get the fitting function f(s i ,t d ) with the smallest sum of square error, so that s i tends to 0, can get the total life of the predicted memory stick, and then get The remaining life of the memory stick TL i .
  • step S107 it is set whether to perform the monitoring and life prediction immediately, if it needs to be executed regularly, enter the time to be executed, and if it needs to be executed periodically, enter a cron expression.
  • step S108 the final memory monitoring information, the health status result, and the remaining life prediction TL i are output.
  • step S109 the result of each monitoring is saved as a log, which is saved locally or forwarded to a dedicated log server, which is convenient for statistics and viewing.
  • step S110 the entire flow ends.
  • the present invention also proposes an SPD-based memory monitoring and life prediction system, which includes an acquisition setting module, a reading calculation module, a determination calculation module, and a fitting calculation module.
  • the obtaining and setting module is used to obtain the parameter information of each memory module in the server, and to set the weight of the parameter information.
  • Each memory parameter information server comprises: erase speed v i, the data access delay time t i, the maximum operating frequency f i, the average operating temperature te i and an average voltage vo i. Where i is the number of the memory stick. Based on industry experience, the weight of experience value is set for each parameter information of the memory.
  • the reading calculation module is used to read the configuration information of each memory bank in the server, calculate the proportion information of each memory bank parameter according to the configuration information and parameter information; calculate the status information of each memory bank according to the weight and proportion information; wherein, V i is the configuration of the rewritable memory speed; T i is the delay time memory access configuration data; F i for the memory configuration information of the maximum frequency of operation; TE i is the memory configuration information Average temperature during operation; VO i is the average voltage during operation of the configuration information of the memory module.
  • the formula for calculating the proportion information of each memory module parameter based on the configuration information and parameter information is:
  • rv i is the proportion information of the erasing and writing speed
  • rt i is the proportion information of the access data delay time
  • rf i is the proportion information of the maximum frequency during operation
  • rte i is the proportion information of the average temperature during operation
  • rvo i is the percentage information of the average voltage.
  • the determining calculation module is used to determine the impact factor according to the number of CPUs in the server, and the number and location of memory bars in each CPU; to calculate the final memory status value according to the impact factor and status information, and to classify the status value.
  • the influence factor ⁇ i of each memory bar is determined according to the total number of CPUs in the current server, the controlled CPU of the memory bar, and the slot position of the memory.
  • the memory status ⁇ i represents a random error with a mean of 0 and a variance of 0.1; c is a constant term.
  • c is set to 0.05.
  • the memory state S is classified and processed, and the corresponding health state is displayed:
  • the fitting calculation module is used to calculate the remaining life of each memory module by using the least squares method of segmented fitting according to the used time and status information of each memory module.
  • Different memory modules have different discrete data. By observing the distribution of discrete data corresponding to each memory module, first perform Henmiter interpolation on it to obtain more samples of data for subsequent use.
  • this fitting method can get the fitting function f(s i ,t d ) with the smallest sum of square error, so that s i tends to 0, and the predicted total life of the memory stick can be obtained, and then get The remaining life of the memory stick TL i .
  • the system also includes an execution module; the execution module is used to set the time to be executed, perform timing execution monitoring and memory bar life prediction; or set a cron expression to perform periodic execution monitoring and memory bar life prediction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

一种基于SPD的内存监控及寿命预测方法和系统,该方法获取服务器中每个内存条的参数信息,并给参数信息设置权重;读取每个内存条的配置信息,根据配置信息与参数信息计算内存条参数的占比信息;根据权重和占比信息计算内存条的状态信息;根据服务器中CPU的数量,以及CPU中内存条的个数及位置,确定影响因子;根据影响因子和状态信息计算最终内存状态值,根据每个内存条已使用时间和状态信息,采用最小二乘法分段拟合的方式计算每个内存条的剩余寿命。基于该方法,还提出了预测系统。本发明基于SPD的内存监控和寿命预测方法能够提供服务器内存的总健康状态概览,以及剩余寿命预测,可查看各个内存条的参数情况。

Description

一种基于SPD的内存监控及寿命预测方法和系统
本申请要求于2020年5月27日提交中国专利局、申请号为202010463689.7、发明名称为“一种基于SPD的内存监控及寿命预测方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及软件开发技术领域,特别涉及一种基于SPD的内存监控及寿命预测方法和系统。
背景技术
目前,服务器的使用者对内存无论是在容量还是效率上均有较高的需求,一旦内存损坏会造成服务器无法开机或者其他严重的损失,所以对内存的监控和寿命预测的需求早已成为重点需求。
当前可以通过带外BMC界面查看每个内存条的信息概况,也可以通过Linux自带的某些命令进行内存信息的获取;但是这些都是仅能查看服务器器的内存信息概览,无法进行信息的内存健康状态的总结和寿命的预测,也无法自动的进行内存信息的监控和寿命的预测。当前也已经有很多的开发人员做出了内存监控的策略,但很少有人提出详细的监控方法和内存寿命预测方法。
发明内容
本发明提出了一种基于SPD的内存监控及寿命预测方法和系统,能够根据需求进行内存监控和寿命预测。
为了实现上述目的,本发明提出了一种基于SPD的内存监控及寿命预测方法和系统,该方法包括以下步骤:
获取服务器中每个内存条的参数信息,并给参数信息设置权重;
读取服务器中每个内存条的配置信息,根据配置信息与参数信息计算每个内存条参数的占比信息;根据权重和占比信息计算每个内存条的状态信息;
根据服务器中CPU的数量,以及每个CPU中内存条的个数及位置,确定影响因子;根据影响因子和状态信息计算最终内存状态值,并对状态值进行分级;
根据每个内存条已使用时间和状态信息,采用最小二乘法分段拟合的方式计算每个内存条的剩余寿命。
进一步的,所述方法还包括设置待执行时间,进行定时执行监控和内存条寿命预测;
设置cron表达式,进行周期执行监控和内存条寿命预测。
进一步的,所述服务器中每个内存条的参数信息包括:擦写速度v i、存取数据延迟时间t i、运行时最大频率f i、运行时平均温度te i和平均电压vo i
进一步的,所述根据配置信息与参数信息计算每个内存条参数的占比信息的公式为:
Figure PCTCN2021073439-appb-000001
其中,rv i为擦写速度的占比信息;rt i为存取数据延迟时间的占比信息;rf i为运行时最大频率的占比信息;rte i为运行时平均温度的占比信息;rvo i为平均电压的占比信息;V i为内存条的配置信息擦写速度;T i为内存条的配置信息存取数据的延迟时间;F i为内存条的配置信息运行的最大频率;TE i为内存条的配置信息运行时平均温度;VO i为内存条的配置信息运行时平均电压。
进一步的,所述根据权重和占比信息计算每个内存条的状态信息的公式为:每个内存条的状态信息s i=ω v*rv it*rt if*rf ite*rte ivo*rvo i;其中ω v为擦写速度的权重;ω t为存取数据延迟时间的权重;ω f为运行时最大频率的权重;ω te为运行时平均温度的权重;ω vo为运行时平均电压的权重。
进一步的,所述根据影响因子和状态信息计算最终内存状态值的公式为:所述内存状态
Figure PCTCN2021073439-appb-000002
其中,ω i为影响因子;ε i表示均值为0,方差为0.1的随机误差;c为常数项。
进一步的,所述根据每个内存条已使用时间和状态信息,采用最小二乘法分段拟合的方式计算每个内存条的剩余寿命的方法为:
对每个内存条对应的离散数据,进行数据的插值和分段最小二乘拟合,得到每个内存条对应的误差平方和最小的拟合函数f(s i,t d);
令s i趋于0,获得预测的内存条的总寿命,进而获得内存条剩余寿命TL i
本发明还提出了一种基于SPD的内存监控及寿命预测系统,包括获取设置模块、读取计算模块、确定计算模块和拟合计算模块;
所述获取设置模块用于获取服务器中每个内存条的参数信息,并给参数信息设置权重;
所述读取计算模块用于读取服务器中每个内存条的配置信息,根据配置信息与参数信息计算每个内存条参数的占比信息;根据权重和占比信息计算每个内存条的状态信息;
所述确定计算模块用于根据服务器中CPU的数量,以及每个CPU中内存条的个数及位置,确定影响因子;根据影响因子和状态信息计算最终内存状态值,并对状态值进行分级;
所述拟合计算模块用于根据每个内存条已使用时间和状态信息,采用最小二乘法分段拟合的方式计算每个内存条的剩余寿命。
进一步的,所述系统还包括执行模块;
所述执行模块用于设置待执行时间,进行定时执行监控和内存条寿命预测;或者设置cron表达式,进行周期执行监控和内存条寿命预测。
发明内容中提供的效果仅仅是实施例的效果,而不是发明所有的全部效果,上述技术方案中的一个技术方案具有如下优点或有益效果:
本发明实施例提出了一种基于SPD的内存监控及寿命预测方法和系统,该方法包括以下步骤:获取服务器中每个内存条的参数信息,并给参数信息设置权重;读取服务器中每个内存条的配置信息,根据配置信息与参数信息计算每个内存条参数的占比信息;根据权重和占比信息计算每个 内存条的状态信息;根据服务器中CPU的数量,以及每个CPU中内存条的个数及位置,确定影响因子;根据影响因子和状态信息计算最终内存状态值,并对状态值进行分级;根据每个内存条已使用时间和状态信息,采用最小二乘法分段拟合的方式计算每个内存条的剩余寿命。方法还包括设置待执行时间,进行定时执行监控和内存条寿命预测;设置cron表达式,进行周期执行监控和内存条寿命预测。基于本发明提出的一种基于SPD的内存监控及寿命预测方法,本发明还提出了一种基于SPD的内存监控及寿命预测系统。本发明基于SPD的内存监控和寿命预测方法能够提供服务器内存的总健康状态概览,以及剩余寿命预测,可查看各个内存条的参数情况。可根据需要即时、定时或周期性动态监测内存使用情况和健康状态,可根据健康状态或剩余寿命提前预防内存故障的发生,也可在内存故障后根据生成的日志进行内存问题检查,定位是哪个内存条的故障及故障原因。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
如图1给出了基于本发明实施例1提出的一种基于SPD的内存监控及寿命预测方法流程图;
如图2给出了基于本发明实施例1提出的一种基于SPD的内存监控及寿命预测系统示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的 范围。
在本发明的描述中,需要理解的是,术语“纵向”、“横向”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。
实施例1
本发明提出了一种基于SPD的内存监控及寿命预测方法和系统,其中SPD(Serial Presence Detect)串行存在检测,是一组关于内存模组的配置信息。如图1给出了基于本发明实施例1提出的一种基于SPD的内存监控及寿命预测方法流程图。
在步骤S101中,开始处理该流程;
在步骤S102中,获取服务器中每个内存条的参数信息,并给参数信息设置权重;其中,服务器中每个内存条的参数信息包括:擦写速度v i、存取数据延迟时间t i、运行时最大频率f i、运行时平均温度te i和平均电压vo i。其中表示i为内存条的编号。基于行业的使用经验,对内存的每个参数信息设置经验值权重,在本发明实施例1中,设置ω v=0.3,ω t=0.3,ω f=0.2,ω te=0.1,ω vo=0.1,其中ω v为擦写速度的权重;ω t为存取数据延迟时间的权重;ω f为运行时最大频率的权重;ω te为运行时平均温度的权重;ω vo为运行时平均电压的权重。本技术方案中保护的范围,不局限于实施例1中。
在步骤S103中,基于内存SPD,读取各个内存条对应的配置信息,其中,V i为内存条的配置信息擦写速度;T i为内存条的配置信息存取数据的延迟时间;F i为内存条的配置信息运行的最大频率;TE i为内存条的配置信息运行时平均温度;VO i为内存条的配置信息运行时平均电压。
根据配置信息与参数信息计算每个内存条参数的占比信息的公式为:
Figure PCTCN2021073439-appb-000003
其中,rv i为擦写速度的占比信息;rt i为存取数据延迟时间的占比信息;rf i为运行时最大频率的占比信息;rte i为运行时平均温度的占比信息;rvo i为平均电压的占比信息。
根据权重和占比信息计算每个内存条的状态信息;每个内存条的状态信息s i=ω v*rv it*rt if*rf ite*rte ivo*rvo i
在步骤S104中,根据服务器中CPU的数量,以及每个CPU中内存条的个数及位置,确定影响因子。一个服务器中存在多个CPU,每个CPU均可管控多根内存条,插在不同位置的插槽上。在此根据当前服务器中总的CPU个数,内存条所受管控的CPU,以及内存的插槽位置确定每根内存条的影响因子ω i
根据影响因子和状态信息计算最终内存状态值,内存状态
Figure PCTCN2021073439-appb-000004
ε i表示均值为0,方差为0.1的随机误差;c为常数项。在本发明实施例1中c设为0.05。
在步骤S105中,根据预设的分段函数将内存状态S进行分级处理,展示出对应的健康状态:
Figure PCTCN2021073439-appb-000005
在步骤S106中,根据内存条已使用时间t d和内存状态s i计算出剩余寿命T i。不同的内存条的离散数据不同,通过观察每一根内存条对应的离散数据的分布首先对其进行Henmiter插值,得到更多样本的数据,以便于后续使用,考虑到内存使用的衰减速度,进行分段最小二乘拟合,此拟合方式能够得到误差平方和最小的拟合函数f(s i,t d),令s i趋于0,能够获得预测的内存条的总寿命,进而获得内存条剩余寿命TL i
在步骤S107中,设置是否即时执行监控和寿命预测,如果需要定时执行,输入待执行时间,如果需要周期执行,输入cron表达式。
在步骤S108中,输出最终的内存监控信息、健康状态结果、剩余寿命预测TL i
在步骤S109中,将每次监控的结果存成日志,保存本地或转发至专门的日志服务器,便于统计和查看。
在步骤S110中,整个流程结束。
本发明还提出了一种基于SPD的内存监控及寿命预测系统,该系统包括获取设置模块、读取计算模块、确定计算模块和拟合计算模块。
获取设置模块用于获取服务器中每个内存条的参数信息,并给参数信息设置权重。服务器中每个内存条的参数信息包括:擦写速度v i、存取数据延迟时间t i、运行时最大频率f i、运行时平均温度te i和平均电压vo i。其中表示i为内存条的编号。基于行业的使用经验,对内存的每个参数信息设置经验值权重。
读取计算模块用于读取服务器中每个内存条的配置信息,根据配置信息与参数信息计算每个内存条参数的占比信息;根据权重和占比信息计算每个内存条的状态信息;其中,V i为内存条的配置信息擦写速度;T i为内存条的配置信息存取数据的延迟时间;F i为内存条的配置信息运行的最大频率;TE i为内存条的配置信息运行时平均温度;VO i为内存条的配置信息运行时平均电压。
根据配置信息与参数信息计算每个内存条参数的占比信息的公式为:
Figure PCTCN2021073439-appb-000006
其中,rv i为擦写速度的占比信息;rt i为存取数据延迟时间的占比信息;rf i为运行时最大频率的占比信息;rte i为运行时平均温度的占比信息;rvo i为平均电压的占比信息。
根据权重和占比信息计算每个内存条的状态信息;每个内存条的状态信息s i=ω v*rv it*rt if*rf ite*rte ivo*rvo i
确定计算模块用于根据服务器中CPU的数量,以及每个CPU中内存条的个数及位置,确定影响因子;根据影响因子和状态信息计算最终内存状态值,并对状态值进行分级。根据服务器中CPU的数量,以及每个CPU中内存条的个数及位置,确定影响因子。一个服务器中存在多个CPU,每个CPU均可管控多根内存条,插在不同位置的插槽上。在此根据当前服务器 中总的CPU个数,内存条所受管控的CPU,以及内存的插槽位置确定每根内存条的影响因子ω i
根据影响因子和状态信息计算最终内存状态值,内存状态
Figure PCTCN2021073439-appb-000007
ε i表示均值为0,方差为0.1的随机误差;c为常数项。在本发明实施例1中c设为0.05。
根据预设的分段函数将内存状态S进行分级处理,展示出对应的健康状态:
Figure PCTCN2021073439-appb-000008
拟合计算模块用于根据每个内存条已使用时间和状态信息,采用最小二乘法分段拟合的方式计算每个内存条的剩余寿命。根据内存条已使用时间t d和内存状态s i计算出剩余寿命T i。不同的内存条的离散数据不同,通过观察每一根内存条对应的离散数据的分布首先对其进行Henmiter插值,得到更多样本的数据,以便于后续使用,考虑到内存使用的衰减速度,进行分段最小二乘拟合,此拟合方式能够得到误差平方和最小的拟合函数f(s i,t d),令s i趋于0,能够获得预测的内存条的总寿命,进而获得内存条剩余寿命TL i
系统还包括执行模块;执行模块用于设置待执行时间,进行定时执行监控和内存条寿命预测;或者设置cron表达式,进行周期执行监控和内存条寿命预测。
以上内容仅仅是对本发明的结构所作的举例和说明,所属本技术领域的技术人员对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代,只要不偏离发明的结构或者超越本权利要求书所定义的范围,均应属于本发明的保护范围。

Claims (9)

  1. 一种基于SPD的内存监控及寿命预测方法,其特征在于,包括:
    获取服务器中每个内存条的参数信息,并给参数信息设置权重;
    读取服务器中每个内存条的配置信息,根据配置信息与参数信息计算每个内存条参数的占比信息;根据权重和占比信息计算每个内存条的状态信息;
    根据服务器中CPU的数量,以及每个CPU中内存条的个数及位置,确定影响因子;根据影响因子和状态信息计算最终内存状态值,并对状态值进行分级;
    根据每个内存条已使用时间和状态信息,采用最小二乘法分段拟合的方式计算每个内存条的剩余寿命。
  2. 根据权利要求1所述的一种基于SPD的内存监控及寿命预测方法,其特征在于,所述方法还包括:
    设置待执行时间,进行定时执行监控和内存条寿命预测;
    设置cron表达式,进行周期执行监控和内存条寿命预测。
  3. 根据权利要求1所述的一种基于SPD的内存监控及寿命预测方法,其特征在于,所述服务器中每个内存条的参数信息包括:擦写速度v i、存取数据延迟时间t i、运行时最大频率f i、运行时平均温度te i和平均电压vo i;其中表示i为内存条的编号。
  4. 根据权利要去3所述的一种基于SPD的内存监控及寿命预测方法,其特征在于,所述根据配置信息与参数信息计算每个内存条参数的占比信息的公式为:
    Figure PCTCN2021073439-appb-100001
    其中,rv i为擦写速度的占比信息;rt i为存取数据延迟时间的占比信息;rf i为运行时最大频率的占比信息;rte i为运行时平均温度的占比信息;rvo i为平均电压的占比信息;V i为内存条的配置信息擦写速度;T i为内存条的配置信息存取数据的延迟时间;F i为内存条的配置信息运行的最大频率;TE i为内存条的配置信息运行时平均温度;VO i为内存条的配置信息运行时平均 电压。
  5. 根据权利要求4所述的一种基于SPD的内存监控及寿命预测方法,其特征在于,所述根据权重和占比信息计算每个内存条的状态信息的公式为:每个内存条的状态信息s i=ω v*rv it*rt if*rf ite*rte ivo*rvo i;其中ω v为擦写速度的权重;ω t为存取数据延迟时间的权重;ω f为运行时最大频率的权重;ω te为运行时平均温度的权重;ω vo为运行时平均电压的权重。
  6. 根据权利要求5所述的一种基于SPD的内存监控及寿命预测方法,其特征在于,所述根据影响因子和状态信息计算最终内存状态值的公式为:所述内存状态
    Figure PCTCN2021073439-appb-100002
    其中,ω i为影响因子;ε i表示均值为0,方差为0.1的随机误差;c为常数项。
  7. 根据权利要求6所述的一种基于SPD的内存监控及寿命预测方法,其特征在于,所述根据每个内存条已使用时间和状态信息,采用最小二乘法分段拟合的方式计算每个内存条的剩余寿命的方法为:
    对每个内存条对应的离散数据,进行数据的插值和分段最小二乘拟合,得到每个内存条对应的误差平方和最小的拟合函数f(s i,t d);
    令s i趋于0,获得预测的内存条的总寿命,进而获得内存条剩余寿命TL i
  8. 一种基于SPD的内存监控及寿命预测系统,其特征在于,包括获取设置模块、读取计算模块、确定计算模块和拟合计算模块;
    所述获取设置模块用于获取服务器中每个内存条的参数信息,并给参数信息设置权重;
    所述读取计算模块用于读取服务器中每个内存条的配置信息,根据配置信息与参数信息计算每个内存条参数的占比信息;根据权重和占比信息计算每个内存条的状态信息;
    所述确定计算模块用于根据服务器中CPU的数量,以及每个CPU中内存条的个数及位置,确定影响因子;根据影响因子和状态信息计算最终 内存状态值,并对状态值进行分级;
    所述拟合计算模块用于根据每个内存条已使用时间和状态信息,采用最小二乘法分段拟合的方式计算每个内存条的剩余寿命。
  9. 根据权利要求8所述的一种基于SPD的内存监控及寿命预测系统,其特征在于,所述系统还包括执行模块;
    所述执行模块用于设置待执行时间,进行定时执行监控和内存条寿命预测;或者设置cron表达式,进行周期执行监控和内存条寿命预测。
PCT/CN2021/073439 2020-05-27 2021-01-23 一种基于spd的内存监控及寿命预测方法和系统 WO2021238257A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/928,118 US11714557B2 (en) 2020-05-27 2021-01-23 SPD-based memory monitoring and service life prediction method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010463689.7A CN111752481B (zh) 2020-05-27 2020-05-27 一种基于spd的内存监控及寿命预测方法和系统
CN202010463689.7 2020-05-27

Publications (1)

Publication Number Publication Date
WO2021238257A1 true WO2021238257A1 (zh) 2021-12-02

Family

ID=72674028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073439 WO2021238257A1 (zh) 2020-05-27 2021-01-23 一种基于spd的内存监控及寿命预测方法和系统

Country Status (3)

Country Link
US (1) US11714557B2 (zh)
CN (1) CN111752481B (zh)
WO (1) WO2021238257A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111752481B (zh) 2020-05-27 2022-08-02 苏州浪潮智能科技有限公司 一种基于spd的内存监控及寿命预测方法和系统
CN112463565A (zh) * 2020-11-30 2021-03-09 苏州浪潮智能科技有限公司 一种服务器寿命预测方法及相关设备
CN117407264B (zh) * 2023-12-13 2024-02-23 苏州元脑智能科技有限公司 内存老化剩余时间的预测方法、装置、计算机设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070263444A1 (en) * 2006-05-15 2007-11-15 Gorobets Sergey A Non-Volatile Memory System with End of Life Calculation
US20170131947A1 (en) * 2015-11-06 2017-05-11 Pho Hoang Data and collection methods to analyze life acceleration of SSD with real usages
CN109032807A (zh) * 2018-08-08 2018-12-18 郑州云海信息技术有限公司 一种批量监控内存状态及限制内存功耗的方法及系统
CN111752481A (zh) * 2020-05-27 2020-10-09 苏州浪潮智能科技有限公司 一种基于spd的内存监控及寿命预测方法和系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727556A (zh) * 2019-09-21 2020-01-24 苏州浪潮智能科技有限公司 一种bmc健康状态监控方法、系统、终端及存储介质
CN110781027B (zh) * 2019-10-29 2023-01-10 苏州浪潮智能科技有限公司 内存ecc报错阈值的确定方法、确定装置及确定设备
CN111198764B (zh) * 2019-12-31 2024-04-26 江苏省未来网络创新研究院 一种基于sdn的负载均衡实现系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070263444A1 (en) * 2006-05-15 2007-11-15 Gorobets Sergey A Non-Volatile Memory System with End of Life Calculation
US20170131947A1 (en) * 2015-11-06 2017-05-11 Pho Hoang Data and collection methods to analyze life acceleration of SSD with real usages
CN109032807A (zh) * 2018-08-08 2018-12-18 郑州云海信息技术有限公司 一种批量监控内存状态及限制内存功耗的方法及系统
CN111752481A (zh) * 2020-05-27 2020-10-09 苏州浪潮智能科技有限公司 一种基于spd的内存监控及寿命预测方法和系统

Also Published As

Publication number Publication date
CN111752481B (zh) 2022-08-02
US20230195322A1 (en) 2023-06-22
CN111752481A (zh) 2020-10-09
US11714557B2 (en) 2023-08-01

Similar Documents

Publication Publication Date Title
WO2021238257A1 (zh) 一种基于spd的内存监控及寿命预测方法和系统
US7882075B2 (en) System, method and program product for forecasting the demand on computer resources
US20190372832A1 (en) Method, apparatus and storage medium for diagnosing failure based on a service monitoring indicator
CN112150311B (zh) 能耗异常状态检测方法、装置、计算机设备和存储介质
CN109388888B (zh) 一种基于车辆荷载空间分布的桥梁结构使用性能预测方法
CN106776288B (zh) 一种基于Hadoop的分布式系统的健康度量方法
CN111881023B (zh) 一种基于多模型对比的软件老化预测方法及装置
CN116340112B (zh) 一种基于大数据分析和边缘计算的设备状态监测系统
WO2022001125A1 (zh) 一种存储系统的存储故障预测方法、系统及装置
CN110059894A (zh) 设备状态评估方法、装置、系统及存储介质
CN115248757A (zh) 一种硬盘健康评估方法和存储设备
CN113821934B (zh) 一种工况参数的预测方法、装置、设备及存储介质
CN108459991B (zh) 一种获得设备可靠性数据的方法
JP7161591B2 (ja) 特徴量選別方法及び健康状態評価方法
US20210405905A1 (en) Operation management device and operation management method
CN113568798A (zh) 服务器故障定位方法、装置、电子设备及存储介质
CN115544803B (zh) 一种变压器剩余寿命预测方法、装置、设备及存储介质
CN107527659A (zh) 用于改善非挥发性闪存装置的利用率的系统及方法
US11157348B1 (en) Cognitive control of runtime resource monitoring scope
CN110866325B (zh) 一种基于间接监测数据设备剩余寿命不完美维护预测方法
JP5223602B2 (ja) ストレージシステム、その性能判定方法、ディスクアレイ制御部
CN108763007A (zh) 一种计步异常设备的识别方法、装置及服务器
US8780471B2 (en) Linking errors to particular tapes or particular tape drives
CN113835970B (zh) 一种计算机存储器优化装置及其优化方法
RU2809254C1 (ru) Способ и система мониторинга автоматизированных систем

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21811970

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21811970

Country of ref document: EP

Kind code of ref document: A1