CN115168173A - Fault prediction model training method, equipment fault determination method, device and equipment - Google Patents

Fault prediction model training method, equipment fault determination method, device and equipment Download PDF

Info

Publication number
CN115168173A
CN115168173A CN202210880637.9A CN202210880637A CN115168173A CN 115168173 A CN115168173 A CN 115168173A CN 202210880637 A CN202210880637 A CN 202210880637A CN 115168173 A CN115168173 A CN 115168173A
Authority
CN
China
Prior art keywords
data
pieces
sample data
downtime
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210880637.9A
Other languages
Chinese (zh)
Inventor
彭寒秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210880637.9A priority Critical patent/CN115168173A/en
Publication of CN115168173A publication Critical patent/CN115168173A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Abstract

The embodiment of the application provides a fault prediction model training method, an equipment fault determination method, a device and equipment, wherein the method comprises the following steps: acquiring a plurality of pieces of initial data from a plurality of electronic devices; respectively carrying out characteristic statistics on initial data corresponding to each electronic device according to the device information and the data sampling time to obtain a plurality of pieces of statistical data; determining the labeling results of the plurality of statistical data according to the downtime information of the plurality of pieces of electronic equipment, wherein the labeling results are used for indicating the duration to be halted between the acquisition time of the statistical data and the downtime time of the electronic equipment; and performing model training according to the plurality of statistical data and the labeling result corresponding to each statistical data to obtain a target model, wherein the target model is used for determining whether the electronic equipment is down due to memory failure in a future period. The accuracy of the downtime prediction of the electronic equipment is improved.

Description

Fault prediction model training method, equipment fault determination method, device and equipment
Technical Field
The application relates to the technical field of computers, in particular to a fault prediction model training method, a device fault determination device and equipment.
Background
Electronic devices (e.g., servers) may provide a variety of services for data computation, data storage, running programs, and so forth. In order to prevent the electronic equipment from being down due to the memory fault to influence the service, the down prediction can be carried out on the electronic equipment.
In the related art, log information and device information of an electronic device may be collected, and the downtime prediction of the electronic device may be performed through the log information and the device information. The device information may include the manufacturer, location, etc. of the electronic device. However, the log information and the device information may not change significantly before the downtime, which results in poor accuracy of the downtime prediction for the electronic device.
Disclosure of Invention
Various aspects of the present application provide a fault prediction model training method, an equipment fault determination method, an apparatus and equipment, so as to improve accuracy of performing downtime prediction on electronic equipment.
In a first aspect, an embodiment of the present application provides a method for training a fault prediction model, including:
obtaining a plurality of pieces of initial data from a plurality of electronic devices, the initial data including: the method comprises the steps of obtaining device information, memory information of a memory in the electronic device and data sampling time, wherein the memory information comprises memory fault information and memory performance information;
respectively carrying out characteristic statistics on initial data corresponding to each electronic device according to the device information and the data sampling time to obtain a plurality of pieces of statistical data;
determining the labeling results of the plurality of statistical data according to the downtime information of the plurality of pieces of electronic equipment, wherein the labeling results are used for indicating the duration to be halted between the acquisition time of the statistical data and the downtime of the electronic equipment;
and performing model training according to the plurality of statistical data and the labeling result corresponding to each statistical data to obtain a target model, wherein the target model is used for determining whether the electronic equipment is down due to memory failure in a future period.
In a possible implementation manner, performing feature statistics on initial data corresponding to each electronic device according to the device information and the data sampling time to obtain a plurality of pieces of statistical data respectively includes:
determining initial data corresponding to each electronic device according to the electronic device information;
for each electronic device, dividing initial data corresponding to the electronic device into a plurality of data groups according to the data sampling time and a preset time window, wherein the data sampling time of the initial data in one data group is located in the corresponding time window;
and respectively carrying out characteristic statistics on the initial data in each data group to obtain statistical data corresponding to the electronic equipment, wherein one data group corresponds to one piece of statistical data.
In one possible implementation, the memory failure information includes: error reporting times and fault positions corresponding to each fault type;
for any one of the plurality of data sets; performing feature statistics on the initial data in the data group to obtain statistical data corresponding to the data group, including:
performing characteristic statistics on error reporting times corresponding to each fault type in the data group to obtain error reporting statistics values corresponding to each fault type;
performing characteristic statistics on the memory performance information in the data group to obtain a memory performance statistic value;
counting the fault positions in the data group to obtain a fault frequency counting value of each block in the memory in a time window corresponding to the data group;
wherein, the statistical data corresponding to the data group comprises: and the error report statistic, the memory performance statistic and the failure times statistic corresponding to each failure type.
In a possible implementation manner, the downtime information includes a downtime moment; aiming at any piece of statistical data corresponding to any piece of electronic equipment; determining the labeling result of the statistical data according to the downtime information of the electronic equipment, wherein the labeling result comprises the following steps:
determining the acquisition time of the statistical data according to the data sampling time in the initial data corresponding to the statistical data;
acquiring initial time between the acquisition time and the downtime time;
rounding the initial time length according to a preset time unit to obtain the time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit;
and determining that the marking result of the statistical data comprises the duration of the downtime.
In a possible implementation manner, the downtime information includes a downtime identifier and a downtime moment, or the downtime information includes a non-downtime identifier;
aiming at any piece of statistical data corresponding to any piece of electronic equipment; determining the labeling result of the statistical data according to the downtime information of the electronic device, wherein the step comprises the following steps:
if the downtime information comprises the downtime identification and the downtime moment, determining a collection moment according to a data sampling moment in initial data corresponding to the statistical data; acquiring initial time between the acquisition time and the downtime time; rounding the initial time length according to a preset time unit to obtain the time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit; determining that the marking result of the statistical data comprises the duration of the downtime;
and if the downtime information comprises the non-downtime identification, determining that the identification result of the statistical data is non-downtime or the duration of waiting for downtime is greater than or equal to preset duration.
In a possible implementation manner, performing model training according to the plurality of pieces of statistical data and the labeling result corresponding to each piece of statistical data to obtain a target model, includes:
determining M positive sample data and N negative sample data in the plurality of statistical data according to the plurality of statistical data and the labeling result corresponding to each statistical data, wherein the downtime duration indicated by the labeling result corresponding to the positive sample data is less than or equal to a preset duration, the downtime duration indicated by the labeling result corresponding to the negative sample data is greater than the preset duration, and M and N are positive integers respectively;
according to the M and the N, determining a plurality of pieces of first positive sample data in the M pieces of positive sample data, and determining a plurality of pieces of first negative sample data in the N pieces of negative sample data, wherein the difference value between the number of the first positive sample data and the number of the first negative sample data is within a preset range;
and performing model training according to the multiple pieces of first positive sample data, the multiple pieces of first negative sample data, the labeling results corresponding to the multiple pieces of first positive sample data and the labeling results corresponding to the multiple pieces of first negative sample data to obtain the target model.
In a possible implementation manner, performing model training according to the multiple pieces of first positive sample data, the multiple pieces of first negative sample data, the labeling results corresponding to the multiple pieces of first positive sample data, and the labeling results corresponding to the multiple pieces of first negative sample data to obtain the target model, includes:
performing first model training according to the first positive sample data, the first negative sample data, the labeling results corresponding to the first positive sample data and the labeling results corresponding to the first negative sample data to obtain an intermediate model, and determining the importance degree of each data feature in the statistical data;
arranging according to the order of the importance degree of each data feature in the statistical data from top to bottom, reserving the feature values of the first K data features in the first positive samples to obtain a plurality of pieces of second positive sample data, and reserving the feature values of the first K data features in the first negative samples to obtain a plurality of pieces of second negative sample data;
and performing second model training on the intermediate model according to the plurality of pieces of second positive sample data, the plurality of pieces of second negative sample data, the labeling results corresponding to the plurality of pieces of second positive sample data and the labeling results corresponding to the plurality of pieces of second negative sample data to obtain the target model.
In one possible embodiment, the first positive sample data and the first negative sample data respectively include a plurality of data features therein;
performing a first model training according to the plurality of pieces of first positive sample data, the plurality of pieces of first negative sample data, the labeling results corresponding to the plurality of pieces of first positive sample data, and the labeling results corresponding to the plurality of pieces of first negative sample data to obtain an intermediate model, including:
determining a first data feature in the plurality of data features according to the feature value of each data feature in the plurality of pieces of first positive sample data and the feature value of each data feature in the plurality of pieces of first negative sample data; wherein a degree of difference between a feature value corresponding to the first data feature in the first positive sample and a feature value corresponding to the first data feature in the first negative sample is greater than or equal to a second threshold;
updating the plurality of pieces of first positive sample data and the plurality of pieces of first negative sample data according to the first data characteristics, wherein the updated plurality of pieces of first positive sample data and the updated plurality of pieces of first negative sample data comprise characteristic values of the first data characteristics;
and performing model training according to the updated plurality of pieces of first positive sample data, the updated plurality of pieces of first negative sample data, the labeling results corresponding to the updated plurality of pieces of first positive sample data and the labeling results corresponding to the updated plurality of pieces of first negative sample data to obtain the intermediate model.
In one possible implementation, determining, according to the M and the N, a plurality of pieces of first positive sample data among the M pieces of positive sample data, and a plurality of pieces of first negative sample data among the N pieces of negative sample data, includes:
if the M is larger than the N and the difference value between the M and the N is larger than or equal to a first threshold, performing downsampling on the M pieces of positive sample data, determining the downsampled positive sample data as the plurality of pieces of first positive sample data, and determining the N pieces of negative sample data as the plurality of pieces of first negative sample data; alternatively, the first and second electrodes may be,
if the N is larger than the M and the difference value between the N and the M is larger than or equal to the first threshold, performing downsampling on the N pieces of negative sample data, determining the downsampled negative sample data as the plurality of pieces of first negative sample data, and determining the M pieces of positive sample data as the plurality of pieces of first positive sample data.
In a second aspect, an embodiment of the present application provides an apparatus failure determining method, including:
obtaining a plurality of pieces of initial data from an electronic device, the initial data including: the method comprises the steps that device information, memory information of a memory in the electronic device and data sampling time are obtained, wherein the memory information comprises memory fault information and memory performance information;
according to the data sampling time, carrying out feature statistics on the plurality of pieces of initial data to obtain a plurality of pieces of statistical data;
processing the plurality of statistical data through the target model to obtain whether the electronic equipment is down due to memory failure in a future period; wherein the target model is trained according to the method of any one of the first aspect.
In one possible implementation, the memory failure information includes: error reporting times and fault positions corresponding to each fault type;
according to the data sampling time, carrying out feature statistics on the initial data to obtain a plurality of statistical data, including:
dividing the plurality of pieces of initial data into a plurality of data groups according to a preset time window, wherein the data sampling time of the initial data in one data group is positioned in the corresponding time window;
performing characteristic statistics on error reporting times corresponding to each fault type in the data group aiming at any one data group in the plurality of data groups to obtain an error reporting statistic value corresponding to each fault type;
performing characteristic statistics on the memory performance information in the data group to obtain a memory performance statistic value;
counting the fault positions in the data group to obtain a fault frequency counting value of each block in the memory in a time window corresponding to the data group;
wherein, the statistical data corresponding to the data group comprises: and the error report statistic, the memory performance statistic and the failure times statistic corresponding to each failure type.
In a third aspect, an embodiment of the present application provides a model training apparatus, including: an acquisition module, a statistic module, a determination module and a training module, wherein,
the obtaining module is configured to obtain a plurality of pieces of initial data from a plurality of electronic devices, where the initial data includes: the method comprises the steps that device information, memory information of a memory in the electronic device and data sampling time are obtained, wherein the memory information comprises memory fault information and memory performance information;
the statistical module is used for respectively carrying out characteristic statistics on initial data corresponding to each electronic device according to the device information and the data sampling time to obtain a plurality of pieces of statistical data;
the determining module is used for determining the marking results of the plurality of statistical data according to the downtime information of the plurality of pieces of electronic equipment, wherein the marking results are used for indicating the duration of downtime between the acquisition time of the statistical data and the downtime of the electronic equipment;
the training module is used for carrying out model training according to the plurality of statistical data and the labeling result corresponding to each statistical data to obtain a target model, and the target model is used for determining whether the electronic equipment is down due to memory failure in a future period.
In a possible implementation, the statistical module is specifically configured to:
determining initial data corresponding to each electronic device according to the electronic device information;
for each electronic device, dividing initial data corresponding to the electronic device into a plurality of data groups according to the data sampling time and a preset time window, wherein the data sampling time of the initial data in one data group is located in the corresponding time window;
and respectively carrying out characteristic statistics on the initial data in each data group to obtain statistical data corresponding to the electronic equipment, wherein one data group corresponds to one piece of statistical data.
In one possible implementation, the memory failure information includes: error reporting times and fault positions corresponding to each fault type; the statistics module is specifically configured to:
performing characteristic statistics on error reporting times corresponding to each fault type in the data group to obtain error reporting statistics values corresponding to each fault type;
performing characteristic statistics on the memory performance information in the data group to obtain a memory performance statistic value;
counting the fault positions in the data group to obtain a fault frequency counting value of each block in the memory in a time window corresponding to the data group;
wherein, the statistical data corresponding to the data group comprises: and the error report statistic, the memory performance statistic and the failure times statistic corresponding to each failure type.
In a possible implementation manner, the downtime information includes a downtime moment; aiming at any piece of statistical data corresponding to any piece of electronic equipment; the determining module is specifically configured to:
determining the acquisition time of the statistical data according to the data sampling time in the initial data corresponding to the statistical data;
acquiring initial time between the acquisition time and the downtime time;
rounding the initial time length according to a preset time unit to obtain the time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit;
and determining that the marking result of the statistical data comprises the duration of the downtime.
In a possible implementation manner, the downtime information includes a downtime identifier and a downtime moment, or the downtime information includes a non-downtime identifier;
aiming at any piece of statistical data corresponding to any piece of electronic equipment; the determining module is specifically configured to:
if the downtime information comprises the downtime identification and the downtime moment, determining a collection moment according to a data sampling moment in the initial data corresponding to the statistical data; acquiring initial time between the acquisition time and the downtime time; performing rounding processing on the initial time length according to a preset time unit to obtain the time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit; determining that the marking result of the statistical data comprises the duration of the downtime;
and if the downtime information comprises the non-downtime identification, determining that the identification result of the statistical data is non-downtime or the duration of waiting for downtime is greater than or equal to preset duration.
In a possible implementation, the training module is specifically configured to:
determining M positive sample data and N negative sample data in the plurality of statistical data according to the plurality of statistical data and the marking result corresponding to each statistical data, wherein the time length to be delayed indicated by the marking result corresponding to the positive sample data is less than or equal to a preset time length, the time length to be delayed indicated by the marking result corresponding to the negative sample data is greater than the preset time length, and M and N are positive integers respectively;
according to the M and the N, determining a plurality of pieces of first positive sample data in the M pieces of positive sample data, and determining a plurality of pieces of first negative sample data in the N pieces of negative sample data, wherein the difference value between the number of the first positive sample data and the number of the first negative sample data is within a preset range;
and performing model training according to the multiple pieces of first positive sample data, the multiple pieces of first negative sample data, the labeling results corresponding to the multiple pieces of first positive sample data and the labeling results corresponding to the multiple pieces of first negative sample data to obtain the target model.
In a possible implementation, the training module is specifically configured to:
performing first model training according to the first positive sample data, the first negative sample data, the labeling results corresponding to the first positive sample data and the labeling results corresponding to the first negative sample data to obtain an intermediate model, and determining the importance degree of each data feature in the statistical data;
arranging according to the order of importance degree of each data feature in the statistical data from top to bottom, reserving characteristic values of the front K data features in the first positive samples to obtain second positive sample data, and reserving characteristic values of the front K data features in the first negative samples to obtain second negative sample data;
and performing second-time model training on the intermediate model according to the plurality of pieces of second positive sample data, the plurality of pieces of second negative sample data, the labeling results corresponding to the plurality of pieces of second positive sample data and the labeling results corresponding to the plurality of pieces of second negative sample data to obtain the target model.
In one possible embodiment, the first positive sample data and the first negative sample data each include a plurality of data features therein; the training module is specifically configured to:
determining a first data feature in the plurality of data features according to the feature value of each data feature in the plurality of pieces of first positive sample data and the feature value of each data feature in the plurality of pieces of first negative sample data; wherein a degree of difference between a feature value corresponding to the first data feature in the first positive sample and a feature value corresponding to the first data feature in the first negative sample is greater than or equal to a second threshold;
updating the plurality of pieces of first positive sample data and the plurality of pieces of first negative sample data according to the first data characteristics, wherein the updated plurality of pieces of first positive sample data and the updated plurality of pieces of first negative sample data comprise characteristic values of the first data characteristics;
and performing model training according to the updated plurality of pieces of first positive sample data, the updated plurality of pieces of first negative sample data, the labeling results corresponding to the updated plurality of pieces of first positive sample data and the labeling results corresponding to the updated plurality of pieces of first negative sample data to obtain the intermediate model.
In a possible implementation, the training module is specifically configured to:
if the M is larger than the N and the difference value between the M and the N is larger than or equal to a first threshold, performing downsampling on the M pieces of positive sample data, determining the downsampled positive sample data as the plurality of pieces of first positive sample data, and determining the N pieces of negative sample data as the plurality of pieces of first negative sample data; alternatively, the first and second electrodes may be,
if the N is larger than the M and the difference value between the N and the M is larger than or equal to the first threshold, performing downsampling on the N pieces of negative sample data, determining the downsampled negative sample data as the plurality of pieces of first negative sample data, and determining the M pieces of positive sample data as the plurality of pieces of first positive sample data.
In a fourth aspect, an embodiment of the present application provides an apparatus for determining an equipment fault, including: an acquisition module, a statistical module and a processing module, wherein,
the acquisition module is configured to acquire a plurality of pieces of initial data from an electronic device, where the initial data includes: the method comprises the steps that device information, memory information of a memory in the electronic device and data sampling time are obtained, wherein the memory information comprises memory fault information and memory performance information;
the statistical module is used for carrying out characteristic statistics on the initial data according to the data sampling time to obtain statistical data;
the processing module is used for processing the plurality of statistical data through the target model to obtain whether the electronic equipment is down due to memory failure in a future period; wherein the target model is trained according to the method of any one of the first aspect.
In one possible implementation, the memory failure information includes: error reporting times and fault positions corresponding to each fault type; the statistics module is specifically configured to:
dividing the initial data into a plurality of data groups according to a preset time window, wherein the data sampling time of the initial data in one data group is positioned in the corresponding time window;
performing characteristic statistics on error reporting times corresponding to each fault type in the data group aiming at any one data group in the plurality of data groups to obtain an error reporting statistic value corresponding to each fault type;
performing characteristic statistics on the memory performance information in the data group to obtain a memory performance statistic value;
counting the fault positions in the data group to obtain a fault frequency counting value of each block in the memory in a time window corresponding to the data group;
wherein, the statistical data corresponding to the data group comprises: and the error report statistic, the memory performance statistic and the failure times statistic corresponding to each failure type.
In a fifth aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to cause the processor to perform the fault prediction model training method of any one of the first aspects.
In a sixth aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor;
the memory stores computer execution instructions;
the processor executes computer-executable instructions stored by the memory to cause the processor to perform the device failure determination method of any of the second aspects.
In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the computer-readable storage medium is configured to implement the fault prediction model training method according to any one of the first aspect.
In an eighth aspect, the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-readable storage medium is used to implement the device fault determination method according to the second aspect.
In a ninth aspect, the present application provides a computer program product, which includes a computer program that, when executed by a processor, implements the fault prediction model training method shown in any one of the first aspects.
In a tenth aspect, an embodiment of the present application provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the device failure determination method shown in any one of the second aspects.
The embodiment of the application provides a fault prediction model training method, a device fault determination method, a device and equipment, which can acquire a plurality of pieces of initial data from a plurality of pieces of electronic equipment, wherein the initial data can include equipment information, memory information of a memory in the electronic equipment and data sampling time, and the memory information includes memory fault information and memory performance information. The method comprises the steps of respectively carrying out characteristic statistics on initial data corresponding to each electronic device according to device information and data sampling time to obtain a plurality of pieces of statistical data, and determining marking results of the plurality of pieces of statistical data according to downtime information of the plurality of electronic devices. The model training device can determine positive sample data and negative sample data in the plurality of pieces of statistical data, further determine a plurality of pieces of first positive sample data and a plurality of pieces of first negative sample data according to the number of the positive sample data and the negative sample data, and perform model training according to the plurality of pieces of first positive sample data, the plurality of pieces of first negative sample data, the labeling results corresponding to the plurality of pieces of first positive sample data and the labeling results corresponding to the plurality of pieces of first negative sample data to obtain the target model. Because the sample data used for model training comprises the memory performance information and the memory fault information, the memory performance information and the memory fault information of the electronic equipment usually have more obvious changes before the electronic equipment goes down due to the memory fault; the sample data is statistical data in a period of time, and the statistical data can obviously reflect data change; the marking result comprises the time length to be delayed from the delay time, so that the trained target model can accurately predict the delay which may occur in the next day. The prediction accuracy of the target model is improved, so that whether the electronic equipment is down due to memory failure in the future period can be accurately predicted through the target model.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic diagram of an application scenario provided in an exemplary embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating a method for training a fault prediction model according to an exemplary embodiment of the present disclosure;
FIG. 3A is a first schematic diagram of a time window provided in an exemplary embodiment of the present application;
FIG. 3B is a diagram illustrating a second time window provided in an exemplary embodiment of the present application;
FIG. 4 is a schematic flow chart diagram illustrating another method for training a fault prediction model provided in an exemplary embodiment of the present application;
FIG. 5 is a schematic structural diagram of a dual inline memory module according to an exemplary embodiment of the present application;
FIG. 6 is a process diagram of a fault prediction model training method provided in an exemplary embodiment of the present application;
fig. 7 is a process diagram of an apparatus failure determination method according to an exemplary embodiment of the present application;
FIG. 8 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an apparatus fault determination device according to an exemplary embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a schematic diagram of an application scenario provided in an exemplary embodiment of the present application. Referring to fig. 1, the model training device and the plurality of electronic devices are included. For example, the plurality of electronic devices may include electronic device-1, electronic device-2, … …, electronic device-n. Communication can be performed between the model training device and any one of the electronic devices. The model training device may acquire initial data from a plurality of electronic devices, and perform model training based on the initial data acquired from the plurality of electronic devices to obtain the target model.
After the target model is obtained, the target model may be set in the electronic device or the fault prediction device, and the electronic device or the fault prediction device may perform fault prediction on the electronic device through the target model to determine whether the electronic device is down due to a memory fault in a future period.
In the related art, log information and device information of an electronic device may be collected, and the downtime prediction of the electronic device may be performed through the log information and the device information. However, the log information and the device information may not change significantly before the downtime, which results in poor accuracy of the downtime prediction for the electronic device.
In the embodiment of the application, the downtime prediction of the electronic equipment can be performed through the target model obtained through training. The sample data used for model training comprises memory performance information and memory fault information, before the electronic equipment is down due to memory fault, the memory performance information and the memory fault information of the electronic equipment usually have obvious changes, the sample data is statistical data (such as an average value, a maximum value, a variance and the like) in a period, and the statistical data can obviously reflect the data changes.
The technical means shown in the present application will be described in detail below with reference to specific examples. It should be noted that the following embodiments may exist alone or in combination with each other, and description of the same or similar contents is not repeated in different embodiments.
In the technical scheme of the application, the method can comprise 2 processes, namely a process of fault prediction model training and a process of equipment fault determination. Next, a process of training the failure prediction model will be described first with reference to fig. 2.
Fig. 2 is a schematic flowchart of a fault prediction model training method according to an exemplary embodiment of the present disclosure. Referring to fig. 2, the method may include:
s201, acquiring a plurality of pieces of initial data from a plurality of pieces of electronic equipment.
The execution subject of the embodiment of the application may be model training equipment, and may also be a model training device provided in the model training equipment. The model training device may be implemented by software, or may be implemented by a combination of software and hardware. For example, the model training device may be a computer, a server, or the like.
The initial data refers to relevant information obtained from the electronic device. The initial data may include device information, memory information stored in the electronic device, and data sampling time.
The device information may include a manufacturer, a number, a city, a time of online, a manufacturer of memory in the electronic device, etc. of the electronic device.
The memory information may include memory failure information and memory performance information.
Memory failures can be logged by deploying a standard tool on the electronic device that logs the failure. For example, the standard tool for logging failures can be mcelog, kernel log (kernel log), and the like.
The memory fault may include correctable faults (corrected memory error), memory scrubbing faults (memory scrubbing error), page offline faults (bad page offset), memory read errors (memory read errors), uncorrectable memory errors (uncorrectable memory error), memory write errors (memory write errors), and the like.
The memory performance information may include: recently accessed active memory (mem _ active), temporary storage for original disk blocks (mem _ buffers), page cache of files read from disk (mem _ cached), memory waiting to be written back to disk (mem _ dirty), recently unaccessed inactive memory (mem _ inactive), all available memory (mem _ memtotal), locked memory (mem _ mlocked), currently unused swap space (mem _ swapfree), untuned memory (mem _ unused), memory occupancy, memory read speed, etc.
The data sampling time refers to the time when the electronic equipment collects initial data. For example, the data sampling instant may be 2022/06/09.
The model training device can be connected with the plurality of electronic devices through a wireless network or a wired network, and initial data can be acquired from the plurality of electronic devices.
S202, respectively carrying out feature statistics on the initial data corresponding to each electronic device according to the device information and the data sampling time to obtain a plurality of pieces of statistical data.
In an alternative embodiment, the plurality of statistical data may be obtained by: determining initial data corresponding to each electronic device according to the electronic device information; for each electronic device, dividing initial data corresponding to the electronic device into a plurality of data groups according to data sampling time and a preset time window, wherein the data sampling time of the initial data in one data group is positioned in the corresponding time window; and respectively carrying out characteristic statistics on the initial data in each data group to obtain statistical data corresponding to the electronic equipment, wherein one data group corresponds to one piece of statistical data.
The time window may be a sliding time window, which may be one continuous time window or may comprise a plurality of small time windows.
The time window will be described with reference to fig. 3A to 3B.
Fig. 3A is a first schematic diagram of a time window provided in an exemplary embodiment of the present application. Referring to fig. 3A, if the time window is set as a sliding time window, the duration of the window is continuous l hours, and the sliding duration is 20min, the time window 1 may be continuous 1 hour between 10 and 11; if the sliding step is 20min, the time window 2 can be obtained. Time window 2 is 10; sliding for another 20min, time window 3 can be obtained. Time window 3 is 10.
Fig. 3B is a schematic diagram of a time window provided in an exemplary embodiment of the present application. Referring to fig. 3B, if the time window is set as a sliding time window, including two small time windows (duration is 10 min), and the sliding step is 5min, the time window 1 may include a time window 1-1 and a time window 1-2, the time window 1-1 is 2022/05/09 for 10min between 45, and the time window 1-2 is 022/05/10; if sliding for 5min, time window 2 can be obtained. Time window 2 may include time window 2-1 and time window 2-2, time window 2-1 being 10min between 2022/05/09 10.
The feature statistics may include summation, difference, variance, and other statistical means. The time window may be as shown in fig. 3A when the features are statistically summed, squared difference, etc. When the feature statistics are differential statistics, the time window may be as shown in fig. 3B.
Since the device information may include the number of the electronic device, the model training device may determine, according to the device information, initial data corresponding to each electronic device among the plurality of pieces of initial data.
For example, if the model training device can determine 150 pieces of initial data corresponding to the electronic device-1 from 1000 pieces of initial data according to the device information, if the preset time window is 1 hour and each piece of initial data has a corresponding data sampling time, the 150 pieces of initial data corresponding to the electronic device-1 can be divided into a plurality of data groups according to the data sampling time and the preset time duration window. For example, if the time window-1 is 2022/05/09 10 for 1 hour, the initial data-1 has a digital sampling timing of 2022/05/09 10, the initial data-2 has a digital sampling timing of 2022/05/09 10, and the initial data-3 has a digital sampling timing of 2022/05/09 10, then the initial data-1, the initial data-2, and the initial data-3 can be divided into one data group.
Assuming that 50 data sets can be obtained, each data set includes 3 initial data, for any one data set, statistics such as summation, average, difference, variance, etc. can be performed on the 3 initial data included in the data set to obtain the statistical data corresponding to the data set.
And S203, determining the labeling results of the plurality of statistical data according to the downtime information of the plurality of pieces of electronic equipment.
The downtime information refers to relevant information when the electronic equipment is crashed. For example, the downtime information may include a time of the downtime.
The annotation result can be used for indicating the duration of waiting for downtime between the acquisition time of the statistical data and the downtime of the electronic equipment. The labeling result may be represented numerically. For example, if the collection time of the statistical data is 2022/05/09/10 and the downtime time is 2022/05/12, the marking result may be-3, which indicates that the collection time of the statistical data is on the 3 rd day before the downtime time of the electronic device.
Optionally, for any piece of statistical data corresponding to any piece of electronic equipment, determining the labeling result of the statistical data may include the following 2 ways:
mode 1: and marking the time length to be delayed between the acquisition time of the statistical data and the delay time of the electronic equipment for any piece of statistical data.
In an alternative embodiment, the labeling result of the statistical data may be determined as follows: determining the acquisition time according to the data sampling time in the initial data corresponding to the statistical data; acquiring initial time between the acquisition time and the downtime time; rounding the initial time length according to a preset time unit to obtain a time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit; and determining that the marking result of the statistical data comprises the duration of the downtime.
The rounding process may include rounding up and rounding down. For example, if the initial duration between the acquisition time and the downtime time is 1 day and 15 hours, and the preset time unit is 1 day, the initial duration may be rounded up to obtain 2 days; if the initial duration between the collection time and the downtime time is 1 day and 2 hours, the initial duration may be rounded down to obtain 1 day.
For example, if the data group 1 includes the initial data 1, the initial data 2, the initial data 3, and the initial data 4 as shown in table 1, and the statistical data of the data group 1 is as shown in table 3, the data sampling time 2022/05/09 00 corresponding to the initial data 1 may be determined as the acquisition time. If the downtime of the electronic device-1 is determined to be 2022/05/12 13, and the preset time unit is day, it may be determined that the initial duration between the collection time 2022/05/09 10 and the downtime 2022/05/12 is 3 days, 3 hours and 27 minutes, the initial duration may be rounded down, the duration to be downtime is 3 days, and it may be determined that the labeling result of any piece of statistical data corresponding to the data set 1 is-3. Wherein, the '3' represents the time length of waiting for downtime, and the time length represents that the collection time of the statistical data is before the downtime time.
Mode 2: and marking according to the downtime identification and the downtime moment, or marking according to the non-downtime identification.
The downtime information may include a downtime identifier and a downtime moment, or the downtime information may include a non-downtime identifier.
Optionally, determining the labeling result of the statistical data in this manner may include the following 2 cases:
case 1: the downtime information comprises a downtime identification and a downtime moment.
Under the condition, the data sampling time in the initial data corresponding to the statistical data can be determined, and the acquisition time is determined; acquiring initial time between the acquisition time and the downtime time; rounding the initial time length according to a preset time unit to obtain a time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit; and determining that the marking result of the statistical data comprises the duration of the downtime. The time length of the downtime is less than the preset time length.
For example, if the preset time duration is 15 days, if the data group 1 includes the initial data 1, the initial data 2, the initial data 3, and the initial data 4 in table 1, and the statistical data of the data group 1 is shown in table 3, the data sampling time 2022/05/09 corresponding to the initial data 1 may be determined as the acquisition time. If the downtime of the electronic device-1 is determined to be 2022/05/15/17, and the preset time unit is day, the initial duration between the collection time 2022/05/09 10 and the downtime 2022/05/15/17 can be determined to be 6 days, 13 hours and 17 minutes, and then the initial duration can be rounded up to obtain the downtime waiting duration of 7 days. It can be determined that the labeling result of any piece of statistical data corresponding to the data set 1 is-7.
Case 2: the downtime information comprises a non-downtime identifier.
In this case, it may be determined that the identification result of the statistical data is not down. The non-downtime identification indicates that the electronic equipment is not crashed within a preset time after the initial data are sampled in the electronic equipment. The preset time period may be 10 days, 15 days, etc.
In an optional embodiment, a flag may be set in any one of the electronic devices to record whether the electronic device is down within a preset time, and the non-down identifier and the down identifier are determined according to the flag. If the flag bit is 0, determining that the non-downtime identifier is 0, and indicating that the electronic equipment is not crashed within the preset time length; if the flag bit is 1, the downtime flag can be determined to be 1, which indicates that the electronic equipment is crashed within the preset time. When the next preset duration starts, the flag bit may be reset from 1 to 0, so as to mark whether the downtime occurs in the next preset duration.
For example, if the downtime information of the electronic device-1 includes the non-downtime identification 0, it may be determined that the identification results of the statistical data corresponding to the electronic device-1 are all "non-downtime".
Optionally, the identification result of the statistical data may also be determined as the duration to be delayed, where the duration to be delayed is greater than or equal to the preset duration.
For example, if the preset time duration is 15 days, if the data group 1 includes the initial data 1, the initial data 2, the initial data 3, and the initial data 4 in table 1, and the statistical data of the data group 1 is shown in table 3, the data sampling time 2022/05/09 corresponding to the initial data 1 may be determined as the acquisition time. If the downtime of the electronic device-1 is determined to be 2022/05/26 15, and the preset time unit is day, the initial time duration between the collection time 2022/05/09 10 and the downtime 2022/05/26 34 may be determined to be 17 days, 5 hours and 34 minutes, and then the initial time duration may be rounded down, so that the downtime duration is 17 days. It can be determined that the labeling result of any piece of statistical data corresponding to the data set 1 is-17. Because the labeling result is-17, the preset time length is 15, and the absolute value 17 is greater than 15, the labeling results greater than 15 all indicate that the downtime does not occur within 15 days.
And S204, performing model training according to the plurality of statistical data and the labeling result corresponding to each statistical data to obtain a target model.
The target model may be used to determine whether the electronic device is down due to a memory failure within a future period of time.
In an optional embodiment, the model training device may determine positive sample data and negative sample data in the plurality of pieces of statistical data, determine first sample data in the positive sample data, determine first negative sample data in the negative sample data, and perform model training according to the first sample data, the first negative sample data, and the corresponding labeling result to obtain the target model.
Optionally, according to the multiple pieces of statistical data and the labeling result corresponding to each piece of statistical data, M pieces of positive sample data and N pieces of negative sample data are determined in the multiple pieces of statistical data, where M and N are positive integers, respectively.
The positive sample data refers to sample data of the electronic equipment which is down in the future period. And the time length of waiting for downtime indicated by the marking result corresponding to the positive sample data is less than or equal to the preset time length. For example, if the preset duration is 15 days and the labeling result of the statistical data 1 is-3, it may be determined that the duration of the downtime is 3 days, and the duration of the downtime 3 days is less than the preset duration 15 days, and then the statistical data-1 may be determined as positive sample data.
The negative sample data refers to sample data of the electronic equipment which cannot be down in the future time period. And the time length of the downtime indicated in the labeling result corresponding to the negative sample data is longer than the preset time length. For example, if the preset duration is 15 days, and the labeling result of the statistical data 2 is-17, it may be determined that the duration to be delayed is 17 days, and the duration to be delayed 17 days is less than the preset duration 15 days, and then the statistical data 2 may be determined as negative sample data.
If the corresponding labeling result of the statistical data is any value between-1 and-14, the statistical data can be determined as the positive sample data. If the labeling result corresponding to the statistical data is "not down" or any value with an absolute value greater than or equal to 15, the statistical data can be determined as negative sample data. Assuming that 100 positive sample data and 500 negative sample data can be determined in a plurality of pieces of statistical data according to the labeling result corresponding to each piece of statistical data, M is 100, and n is 500.
The quantity difference of the positive and negative sample data is large, so that the training of the target model is not facilitated, a plurality of pieces of first positive sample data can be determined in the M pieces of positive sample data according to M and N, a plurality of pieces of first negative sample data can be determined in the N pieces of negative sample data, and the difference value between the quantity of the first positive sample data and the quantity of the first negative sample data is within a preset range.
For example, if the number M of positive sample data is 100, the number N of negative sample data is 500, and the preset range is 50, since M is less than N, all 100 pieces of positive sample data may be determined as the first positive sample data, and 150 pieces of first negative sample data may be determined among 500 pieces of negative sample data. The number of the first positive sample data is 100, the number of the first negative sample data is 150, and the difference between the two data meets the preset range 50.
After the multiple pieces of first positive sample data and the multiple pieces of first negative sample data are determined, model training can be performed according to the multiple pieces of first positive sample data, the multiple pieces of first negative sample data, the labeling results corresponding to the multiple pieces of first positive sample data, and the labeling results corresponding to the multiple pieces of first negative sample data, so that a target model is obtained. The objective model may be referred to as a fault prediction model. The target model can be used for predicting the faults of the electronic equipment, namely whether the electronic equipment is down due to the memory faults can be predicted through the target model.
In model training, an ensemble learning algorithm may be employed for training to obtain a target model. For example, the ensemble learning algorithm may be an Extreme Gradient Boosting (XGBoost) algorithm, a Light Gradient Boosting Machine (Light GBM), a random forest, or the like.
In this embodiment of the application, the model training device may obtain multiple pieces of initial data from multiple pieces of electronic devices, where the initial data may include device information, memory information of a memory in the electronic device, and data sampling time, and the memory information includes memory fault information and memory performance information. The model training equipment can respectively perform characteristic statistics on initial data corresponding to each electronic device according to the equipment information and the data sampling time to obtain a plurality of pieces of statistical data, and can determine the labeling results of the plurality of pieces of statistical data according to the downtime information of the plurality of electronic devices. The model training device can determine positive sample data and negative sample data in the plurality of pieces of statistical data, further determine a plurality of pieces of first positive sample data and a plurality of pieces of first negative sample data according to the number of the positive sample data and the negative sample data, and perform model training according to the plurality of pieces of first positive sample data, the plurality of pieces of first negative sample data, the labeling results corresponding to the plurality of pieces of first positive sample data and the labeling results corresponding to the plurality of pieces of first negative sample data to obtain the target model. Because the sample data used for model training comprises the memory performance information and the memory fault information, the memory performance information and the memory fault information of the electronic equipment usually have more obvious changes before the electronic equipment goes down due to the memory fault; the sample data is statistical data in a period of time, and the statistical data can obviously reflect data change; the marking result comprises the time length to be delayed from the delay time, so that the trained target model can accurately predict the delay which may occur in the next day. And the prediction accuracy of the target model is improved by integrating the above 3 points.
The method for training the fault prediction model will be described in further detail below with reference to fig. 4. Fig. 4 is a schematic flowchart of another fault prediction model training method according to an exemplary embodiment of the present disclosure. Referring to fig. 4, the method may include:
s401, acquiring a plurality of pieces of initial data from a plurality of pieces of electronic equipment.
The model training device can be connected with the plurality of electronic devices through a wireless network or a wired network, and initial data can be acquired from the plurality of electronic devices. The initial data may include device information, memory information stored in the electronic device, and data sampling time.
S402, determining initial data corresponding to each electronic device according to the electronic device information.
For any electronic device, since the device information may include the number of the electronic device, the model training device may determine, according to the device information, initial data corresponding to each electronic device in the plurality of pieces of initial data.
After determining to obtain the initial data corresponding to the electronic device, performing feature statistics on the initial data corresponding to the electronic device, where the process of performing feature statistics on the initial data corresponding to each electronic device is the same, and the following description will take the process of performing feature statistics on the initial data corresponding to any one electronic device as an example.
And S403, dividing the initial data corresponding to the electronic equipment into a plurality of data groups according to the data sampling time and a preset time window.
It is assumed that 8 pieces of initial data corresponding to the electronic device-1 can be determined in the initial data according to the electronic device information, and the data sampling time of each piece of initial data is as shown in table 1:
TABLE 1
Initial data Data sampling time
Initial data 1 2022/05/09 10:00
Initial data 2 2022/05/09 10:20
Initial data 3 2022/05/09 10:40
Initial data 4 2022/05/09 11:00
Initial data 5 2022/05/10 10:00
Initial data 6 2022/05/10 10:20
Initial data 7 2022/05/10 10:40
Initial data 8 2022/05/10 11:00
When performing statistics such as summation and variance, if the preset time window 1 is set to be continuous 1h, the initial data corresponding to the electronic device-1 may be divided into 2 data groups according to the data sampling time and the preset time window 1, and the data groups are respectively recorded as data group 1 and data group 2. The data group 1 may include initial data 1, initial data 2, initial data 3, and initial data 4, and the data sampling time instants of the 4 initial data are all within 1h of 2022/05/09 10; the data group 2 may include initial data 5, initial data 6, initial data 7, and initial data 8, and the data sampling time instants of the 4 pieces of initial data are all within 1h of 2022/05/10.
In the difference (e.g., first order difference, second order difference, etc.) statistics, if the preset time window 2 includes 2 time windows of 10min, respectively 2022/05/09 10.
After 3 data sets are determined, feature statistics may be performed on initial data in the 3 data sets to obtain 3 statistical data corresponding to the electronic device-1, where one data set corresponds to one statistical data.
And S404, respectively carrying out feature statistics on the initial data in each data group to obtain statistical data corresponding to the electronic equipment.
Optionally, the initial data may be preprocessed and then subjected to feature statistics. Preprocessing may include culling some data that is not collected on most electronic devices, or may use a 0 instead for some occasionally missing data.
In an alternative embodiment, the initial data may be feature-counted as follows: performing characteristic statistics on error reporting times corresponding to each fault type in the data group to obtain error reporting statistics values corresponding to each fault type; carrying out characteristic statistics on the memory performance information in the data group to obtain a memory performance statistic value; counting the fault positions in the data group to obtain the fault times counting value of each block in the memory in the time window corresponding to the data group; wherein, the statistical data corresponding to the data group comprises: and error reporting statistics, memory performance statistics and failure times statistics corresponding to each failure type.
The failure location may be a detailed physical location of a DRAM failure that is parsed from a Dynamic random-access memory (DRAM) failure log.
Next, the failure position will be described in detail with reference to fig. 5.
Fig. 5 is a schematic structural diagram of a dual in-line memory module according to an exemplary embodiment of the present application. A Memory of an electronic device may include 24 Dual-Inline-Memory-Modules (DIMMs). Referring to FIG. 5, for any DIMM, the DIMM has 2 memory ranks (Rank), rank-1 and Rank-2 respectively. Each Rank includes 16 blocks (banks). For example, bank2-1, bank2-2, bank2-3, … …, and Bank2-16 may be included in Rank-2. There are 32 banks in total in Rank-1 and Rank-2. Only one of the 32 banks is accessed each time the memory is accessed.
Typically, there are 217217 rows (Row) and 210210 columns (Column, col) in each Bank. The tuple < dim, rank, bank, row, col > may be used to indicate which location of a memory in an electronic device has a DRAM failure, or may also indicate which location of a memory in an electronic device has a DRAM failure by the tuple < dim, rank, bank >. As shown in FIG. 3, if 5 DRAM faults occur in Dimm1, rank1, bank2-1, row 1, column 3, the fault location can be marked as < Dimm1, rank1, bank2-1,1,3>, and the number of faults is 5. For any Bank, the number of Bank failures is equal to the sum of the number of DRAM failures occurring at all locations within the Bank. For example, if 5 DRAM failures occur at the failure location < Dimm1, rank1, bank2-1,1,3 >; at the fault location < Dimm1, rank1, bank2-1,1,2>, DRAM faults occur 3 times, and then a total of 8 DRAM faults occur in < Dimm1, rank1, bank2-1 >.
For example, assume that each of the initial data in table 1 includes data contents as shown in table 2:
TABLE 2
Figure BDA0003764042690000131
Figure BDA0003764042690000141
If the data group 1 comprises initial data 1, initial data 2, initial data 3 and initial data 4; the array 2 comprises initial data 5, initial data 6, initial data 7 and initial data 8, and the data group 3 can comprise initial data 3 and initial data 7; the initial data in the 3 data sets can be counted, and the obtained statistical data can be shown in table 3, wherein in table 3, each Bank is located in dim 1 and Rank1, and for convenience of description, dim 1 and Rank1 are omitted in table 3:
TABLE 3
Figure BDA0003764042690000142
Figure BDA0003764042690000151
In actual work, the obtained statistical data may have different attributes, orders of magnitude and units, so that different statistical data cannot be trained. In order to eliminate the difference between different statistical data and facilitate subsequent model training, the statistical data can be normalized. The normalization process is to scale the statistical data to make the statistical data in the same data interval and range. For example, each type of statistical data may be normalized to a decimal between (0,1).
S405, determining the labeling results of the plurality of statistical data according to the downtime information of the plurality of pieces of electronic equipment.
It should be noted that, the specific execution process of step S405 may refer to step S203, and details are not described here.
S406, determining M pieces of positive sample data and N pieces of negative sample data in the plurality of pieces of statistical data according to the plurality of pieces of statistical data and the labeling result corresponding to each piece of statistical data.
The marking result can be represented by a number, and the absolute value of the number represents the time length of the downtime. The time length to be delayed indicated by the marking result corresponding to the positive sample data is less than or equal to the preset time length, and the time length to be delayed indicated by the marking result corresponding to the negative sample data is greater than the preset time length. For example, if the annotation results in-3. Then "3" indicates that the duration of downtime is 3 days, and "-" indicates that the collection time of the statistical data is before the downtime.
For example, if the preset time duration is 15 days, among the plurality of pieces of statistical data, the plurality of pieces of statistical data with the labeling results between-1 and-14 may be determined as the positive sample data; and determining the plurality of pieces of statistical data with the marked result of 'not going down' or the marked result of the absolute value of more than or equal to 15 as the negative sample data.
S407, according to M and N, determining multiple pieces of first positive sample data in the M pieces of positive sample data, and determining multiple pieces of first negative sample data in the N pieces of negative sample data.
Optionally, if the number of the positive sample data is M and the number of the negative sample data is N, a plurality of pieces of first positive sample data may be determined in the M pieces of positive sample data according to M and N, and a plurality of pieces of first negative sample data may be determined in the N pieces of negative sample data, so that a difference between the number of the first positive sample data and the number of the first negative sample data is within a preset range.
Determining the plurality of pieces of first positive sample data and the plurality of pieces of first negative sample data may include the following 2 cases:
case 1: if M is larger than N, and the difference between M and N is larger than or equal to the first threshold.
In this case, M pieces of positive sample data may be downsampled, and the downsampled positive sample data may be determined as a plurality of pieces of first positive sample data, and N pieces of negative sample data may be determined as a plurality of pieces of first negative sample data.
The first threshold may be determined in accordance with the requirements of the training target model. For example, the first threshold may be 50.
The down-sampling is generally suitable for the case that the number of positive and negative sample data is different greatly and the small sample data is insufficient. The down-sampling is to take out a part of sample data from the large sample data so that the number of positive and negative sample data is equivalent.
For example, if the number M of positive sample data is 1000, the number N of negative sample data is 600, the first threshold is 50, the preset range is 50, and since M is greater than N and the difference between M and N is 400, which is greater than the first threshold 50, the 1000 pieces of positive sample data may be downsampled to determine 640 pieces of first positive sample data; 600 pieces of negative sample data may each be determined as 600 pieces of first negative sample data. The difference between the number 640 of first positive sample data and the number 600 of first negative sample data is 40, which is within the preset range 50.
Case 2: if N is larger than M, and the difference between N and M is larger than or equal to the first threshold value
In this case, N pieces of negative sample data may be downsampled, and the downsampled negative sample data may be determined as a plurality of pieces of first negative sample data, and M pieces of positive sample data may be determined as a plurality of pieces of first positive sample data.
For example, if the number M of positive sample data is 700, the number N of negative sample data is 900, the first threshold is 50, the preset range is 50, and since N is greater than M and the difference between N and M is 200, which is greater than or equal to the first threshold 50, the 900 pieces of negative sample data may be downsampled to determine 742 pieces of first negative sample data; 700 pieces of positive sample data may each be determined as 700 pieces of first positive sample data. The difference between the number 700 of the first positive sample data and the number 742 of the first negative sample data is 42, which is within the preset range 50.
S408, according to the characteristic value of each data characteristic in the first positive sample data and the characteristic value of each data characteristic in the first negative sample data, determining a first data characteristic in the data characteristics.
The first data feature refers to a data feature having a significant difference in the first positive sample data and the first negative sample data. The feature value corresponding to the first data feature in the first positive sample and the feature value corresponding to the first data feature in the first negative sample differ by a degree greater than or equal to a second threshold.
The second threshold may be set according to a feature value corresponding to the first data feature. For example, if the first data characteristic is memory occupancy, the second threshold may be set to 20%; if the first data characteristic is the total number of Bank failures, the second threshold may be set to 100.
Optionally, when determining the first data characteristic, a hypothesis test may be used for the characteristic selection. Hypothesis testing is a statistical inference method that may be used to determine whether the difference between the first positive sample data and the first negative sample data is due to a sampling error or an intrinsic difference. For example, the hypothesis test may be a chi-square test, an F-test, or the like.
Assume that data included in a certain piece of first positive sample data and a certain piece of first negative sample data are as in table 4:
TABLE 4
Figure BDA0003764042690000161
If feature selection is performed on the first positive sample data and the first negative sample data by using hypothesis testing, the selected first data features may include: the number of times of memory read errors, the number of times of page offline errors, the number of times of memory occupancy rate, the memory read speed, and the total number of Bank failures. Since the number of memory scrub errors is 259 times in the first positive sample data and 236 times in the first negative sample data, there is no significant difference, and thus the number of memory scrub errors cannot be used as the first data characteristic.
And S409, updating the plurality of pieces of first positive sample data and the plurality of pieces of first negative sample data according to the first data characteristics.
In an optional embodiment, the plurality of pieces of first sample data and the plurality of pieces of first negative sample data are updated according to the first data feature, that is, the first data feature in the plurality of pieces of first sample data and the plurality of pieces of first negative sample data is retained, and the non-first data feature is removed, so that the plurality of pieces of updated first positive sample data and the plurality of pieces of updated first negative sample data include the feature value of the first data feature.
For example, if the first positive sample data and the first negative sample data are shown in table 4, and the selected first data characteristics include the number of times of memory read errors, the number of times of page offline errors, the number of times of memory occupancy, the read speed of the memory, and the total number of times of Bank failures, the number of times of memory read errors, the number of times of page offline errors, the number of times of memory occupancy, the read speed of the memory, and the total number of times of Bank failures in the first positive sample data and the first negative sample data may be retained, and the number of times of memory clearing errors may be removed. The updated first positive sample data and first negative sample data are shown in table 5:
TABLE 5
Figure BDA0003764042690000171
S410, performing first model training according to the multiple pieces of first positive sample data, the multiple pieces of first negative sample data, the labeling results corresponding to the multiple pieces of first positive sample data and the labeling results corresponding to the multiple pieces of first negative sample data to obtain an intermediate model, and determining the importance degree of each data feature in the statistical data.
For example, if the updated first positive sample data and the first negative sample data are shown in table 5, and the labeling result of each data in the first positive sample data is-3, and the labeling result of each data in the first negative sample data is-21, then model training may be performed according to the first positive sample data and the first negative sample data in table 5 and the corresponding labeling results, so as to obtain the intermediate model.
The intermediate model may determine the importance of each data feature. Alternatively, the degree of importance of the different levels may be indicated by a number between 1 and 10. The larger the value, the more important the data feature is.
For example, the importance of each data feature determined by the intermediate model may be as shown in table 6:
TABLE 6
Characteristic of each data Degree of importance
Number of memory read errors 3
Number of page offline errors 1
Average value of memory occupancy 6
Memory read speed 2
Total number of failures in Bank 8
It should be noted that, in actual work, the number of the first positive sample data and the first negative sample data may still be greatly different, and in the process of model training, a Focal Loss (Focal local) function may be used to replace the cross entropy Loss function, so as to improve the training effect of the model. Wherein, the Focal loss function is a loss function for processing serious imbalance of positive and negative sample data proportion.
S411, arranging the importance degrees of the data characteristics from high to low, and determining a plurality of pieces of second positive sample data and a plurality of pieces of second negative sample data.
In an optional embodiment, the importance degrees of the data features may be arranged in a descending order, the feature values of the first K data features are retained in the first positive sample data to obtain second positive sample data, and the feature values of the first K data features are retained in the first negative sample data to obtain second negative sample data.
For example, if the importance levels of the respective data features are shown in table 6, the importance levels of the respective data features can be arranged in descending order, and then the importance levels can be obtained as shown in table 7:
TABLE 7
Figure BDA0003764042690000172
Figure BDA0003764042690000181
If K is set to 3, it can be determined that the first 3 data features are: the total number of failures occurring in the Bank, the average value of the memory occupancy rate and the number of memory reading errors. If it is assumed that the first positive sample data and the first negative sample data are as described in table 5, the feature values of the 3 data features may be retained in the first positive sample data to obtain a plurality of pieces of second positive sample data, and the feature values of the 3 data features may be retained in the plurality of pieces of first negative sample data to obtain a plurality of pieces of second negative sample data. The second positive and negative sample data may be as shown in table 8:
TABLE 8
Figure BDA0003764042690000182
And S412, performing second-time model training on the intermediate model according to the plurality of pieces of second positive sample data, the plurality of pieces of second negative sample data, the labeling results corresponding to the plurality of pieces of second positive sample data and the labeling results corresponding to the plurality of pieces of second negative sample data to obtain the target model.
And determining the second positive sample data and the second negative sample data in the first positive sample data and the first negative sample data, wherein the labeling results corresponding to the first positive sample data and the first negative sample data are the labeling results corresponding to the second positive sample data and the second negative sample data.
For example, if the labeling result corresponding to the second positive sample data is-3 and the labeling result corresponding to the second negative sample data is-17, as shown in table 8, the second positive sample data and the second negative sample data and the labeling results corresponding to the second positive sample data and the second negative sample data respectively may be used to perform the second model training on the intermediate model to obtain the target model.
In actual work, hundreds of parameters are included in the target model when model training is performed. Some parameters can be optimized through model training, but some parameters cannot be optimized through training, and the parameters can be called as hyper-parameters. Optionally, for the hyper-parameters that cannot be optimized in the target model, other algorithms may be used to optimize the hyper-parameters. For example, the hyper-parameters may be tuned using a grid search, a random search, bayesian optimization, or like algorithm.
In the embodiment of the application, the model training device can acquire a plurality of pieces of initial data by a plurality of pieces of electronic equipment, determine the initial data corresponding to each piece of electronic equipment according to the information of the electronic equipment, and further divide the initial data corresponding to the electronic equipment into a plurality of data groups according to the data sampling time and a preset time window. The model training device can perform feature statistics on the initial data in each data group respectively to obtain statistical data corresponding to the electronic devices, and determine labeling results of the statistical data according to the downtime information of the electronic devices. The model training device may determine, according to the labeling result, a plurality of pieces of positive sample data and a plurality of pieces of negative sample data in the plurality of pieces of statistical data, determine a plurality of pieces of first positive sample data in the plurality of pieces of positive sample data, and determine a plurality of pieces of first negative sample data in the plurality of pieces of negative sample data. The method further comprises the steps of determining first data characteristics in a plurality of pieces of first positive sample data and a plurality of pieces of first negative sample data, updating the plurality of pieces of first positive sample data and the plurality of pieces of first negative sample data according to the first data characteristics, and then performing model training according to the updated plurality of pieces of first positive sample data, the updated plurality of pieces of first negative sample data, the labeling results corresponding to the updated plurality of pieces of first positive sample data and the labeling results corresponding to the updated plurality of pieces of first negative sample data to obtain an intermediate model and determine the importance degree of each data characteristic. The model training equipment can arrange the importance degrees of the data features from top to bottom, determine the first K data features, determine a plurality of pieces of second positive sample data in a plurality of pieces of first positive sample data and a plurality of pieces of second negative sample data in a plurality of pieces of first negative sample data according to the first K data features, and perform second model training on the intermediate model according to a plurality of pieces of second positive sample data, a plurality of pieces of second negative sample data, the labeling results corresponding to the plurality of pieces of second positive sample data and the labeling results corresponding to the plurality of pieces of second negative sample data to obtain the target model. Because the sample data used for model training comprises the memory performance information and the memory fault information, the memory performance information and the memory fault information of the electronic equipment usually have more obvious changes before the electronic equipment goes down due to the memory fault; the sample data is statistical data in a period of time, and the statistical data can obviously reflect data change; the secondary training can be carried out according to the first K data characteristics output by the intermediate model, the training effect of the model is improved, the above 3 points are integrated, the accurate target model can be obtained through training according to the fault prediction model training method, and the accuracy of performing downtime prediction on the electronic equipment can be improved through the target model.
The method for training the fault prediction model will be described in detail below with reference to fig. 6 by way of specific examples.
Fig. 6 is a process schematic diagram of a fault prediction model training method according to an exemplary embodiment of the present application. See fig. 6, including process 1, process 2, and process 3.
Referring to process 1, a model training device may obtain a plurality of pieces of initial data from a plurality of electronic devices. For example, the pieces of initial data may include initial data-1, initial data-2, initial data-3, … …, initial data-p.
For any initial data, the initial data may include device information, memory information of a memory in the electronic device, and data sampling time, where the memory information includes memory failure information and memory performance information. The model training device can determine initial data corresponding to each electronic device according to the device information, and for any one electronic device, the initial data corresponding to the electronic device can be divided into a plurality of data groups according to the data sampling time and a preset time window. As shown in fig. 6, assuming that p pieces of initial data include initial data corresponding to W pieces of electronic equipment, the initial data corresponding to the W pieces of electronic equipment may be divided according to a data sampling time and a preset time window, and assuming that q data groups are obtained in total, the q data groups include data groups corresponding to the W pieces of electronic equipment, respectively.
For any one data set, multiple initial data sets may be included in the data set. For example, the data group-1 may include initial data-1, initial data-2, and initial data-3. Characteristic statistics such as summation, variance, difference and the like can be performed on the initial data in each data group to obtain corresponding statistical data. The statistical data correspond to the data groups one by one, so that the statistical data comprise the statistical data corresponding to the W electronic devices respectively. And determining the labeling result corresponding to each statistical data according to the downtime information corresponding to the W pieces of electronic equipment respectively. For example, the labeling result of the statistical data-1 may be-3, which indicates that the collection time corresponding to the statistical data is located on the 3 rd day before the downtime of the electronic device.
Referring to process 2, M pieces of positive sample data and N pieces of negative sample data may be determined in the plurality of pieces of statistical data according to the labeling result corresponding to each piece of statistical data. If the difference between the number of the positive sample data and the number of the negative sample data is too large, the sample data with a large number can be downsampled, a plurality of pieces of first positive sample data are determined in the M pieces of positive sample data, and a plurality of pieces of first negative sample data are determined in the N pieces of negative sample data, so that the number of the first positive sample data and the number of the first negative sample data are balanced.
Optionally, the first data feature with significant difference may be determined in the plurality of data features according to the feature value of each data feature in the plurality of pieces of first positive sample data and the feature value of each data feature in the plurality of pieces of first negative sample data through hypothesis testing. The first sample data and the first negative sample data can be updated according to the first data characteristics, that is, the first data characteristics in the first sample data and the first negative sample data are reserved, and the non-first data characteristics are removed, so that the updated first positive sample data and the updated first negative sample data are obtained.
And performing model training according to the updated plurality of pieces of first positive sample data, the updated plurality of pieces of first negative sample data, the labeling results corresponding to the updated plurality of pieces of first positive sample data and the labeling results corresponding to the updated plurality of pieces of first negative sample data to obtain an intermediate model. The intermediate model may determine the importance of each data feature. The importance degree of each data feature can be arranged from top to bottom, and the first K data features can be determined.
Referring to process 3, according to the first K data features, the feature values of the first K data features may be retained in the updated plurality of pieces of first positive sample data to obtain a plurality of pieces of second positive sample data, and the feature values of the first K data features may be retained in the updated plurality of pieces of first negative sample data to obtain a plurality of pieces of second negative sample data. For example, if the updated plurality of pieces of first positive sample data include feature values of 10 data features, if K is set to 3, and the first 3 determined data features are data feature-1, data feature-2, and data feature-3, respectively, the feature values corresponding to the data feature-1, the data feature-2, and the data feature-3 may be retained in the 10 data features, and the feature values corresponding to the other 7 data features may be removed, so as to obtain second positive sample data. The second positive sample data includes characteristic values corresponding to the data characteristic-1, the data characteristic-2 and the data characteristic-3 respectively.
And performing second-time model training on the intermediate model according to the plurality of pieces of second positive sample data, the plurality of pieces of second negative sample data, the labeling results corresponding to the plurality of pieces of second positive sample data and the labeling results corresponding to the plurality of pieces of second negative sample data to obtain the target model.
In the embodiment of the application, the model training device may acquire multiple pieces of initial data from multiple pieces of electronic devices, determine the initial data corresponding to each piece of electronic device according to the information of the electronic devices, and further divide the initial data corresponding to the electronic devices into multiple data groups according to the data sampling time and a preset time window. The model training device can perform characteristic statistics on the initial data in each data group to obtain a plurality of pieces of statistical data, and can determine the labeling results of the plurality of pieces of statistical data according to the downtime information of the plurality of pieces of electronic equipment. The model training device can determine positive sample data and negative sample data in the plurality of pieces of statistical data, and further can determine a plurality of pieces of first positive sample data and a plurality of pieces of first negative sample data according to the number of the positive sample data and the negative sample data. The model training equipment determines first data characteristics through hypothesis testing, updates a plurality of pieces of first positive sample data and a plurality of pieces of first negative sample data according to the first data characteristics, and then performs model training according to the updated plurality of pieces of first positive sample data, the updated plurality of pieces of first negative sample data, the labeling results corresponding to the updated plurality of pieces of first positive sample data and the labeling results corresponding to the updated plurality of pieces of first negative sample data to obtain an intermediate model and determine the importance degree of each data characteristic. Determining the first K data characteristics, determining a plurality of pieces of second positive sample data in the plurality of pieces of first positive sample data according to the first K data characteristics, and determining a plurality of pieces of second negative sample data in the plurality of pieces of first negative sample data. And performing second-time model training on the intermediate model according to the plurality of pieces of second positive sample data, the plurality of pieces of second negative sample data, the labeling results corresponding to the plurality of pieces of second positive sample data and the labeling results corresponding to the plurality of pieces of second negative sample data to obtain the target model. Because the sample data used for model training comprises the memory performance information and the memory fault information, the memory performance information and the memory fault information of the electronic equipment usually have more obvious changes before the electronic equipment goes down due to the memory fault; the sample data is statistical data in a period of time, and the statistical data can obviously reflect data change; the secondary training can be carried out according to the first K data characteristics output by the intermediate model, the training effect of the model is improved, the above 3 points are integrated, and the accurate target model can be obtained through training according to the fault prediction model training method, so that the accuracy of the downtime prediction of the electronic equipment can be improved through the target model.
After the target model is obtained through training, the electronic devices can be predicted through the target model, so that whether the electronic devices are down due to memory faults in a future period of time is determined.
Optionally, when the target model is used to predict the multiple electronic devices, the target model is deployed in the failure prediction device, and the failure prediction device may obtain initial data of the multiple electronic devices and predict each electronic device through the target model; the target model may also be deployed in each electronic device to predict each electronic device through the target model.
Next, a device failure determination method will be described with reference to fig. 7.
Fig. 7 is a process diagram of an apparatus failure determination method according to an exemplary embodiment of the present application. Referring to fig. 6, the method may include:
s701, acquiring a plurality of pieces of initial data from the electronic equipment.
The initial data may include: the device information, the memory information of the memory in the electronic device, and the data sampling time may include memory failure information and memory performance information, and the memory failure information may include error reporting times and failure positions corresponding to each failure type.
Optionally, acquiring the initial data from the electronic device may include the following 2 ways:
mode 1: the initial data is acquired in real time.
For example, if the data sampling frequency is set to 15 minutes/time, each electronic device may perform sampling once every 15 minutes to obtain initial data, and transmit the initial data to the model training device, so that the model training device obtains the initial data of the plurality of electronic devices.
Mode 2: the initial data is acquired periodically.
For example, if the data sampling frequency is set to 15 minutes/time and the transmission cycle is set to 1h, each electronic device may perform sampling once every 15 minutes and transmit initial data to the model training device every 1h, so that the model training device acquires initial data of a plurality of electronic devices.
And S702, carrying out characteristic statistics on the plurality of pieces of initial data according to the data sampling time to obtain a plurality of pieces of statistical data.
In an alternative embodiment, the plurality of statistical data may be obtained by: dividing a plurality of pieces of initial data into a plurality of data groups according to a preset time window, wherein the data sampling time of the initial data in one data group is positioned in the corresponding time window; performing characteristic statistics on error reporting times corresponding to each fault type in the data group aiming at any one data group in the plurality of data groups to obtain an error reporting statistic value corresponding to each fault type; carrying out characteristic statistics on the memory performance information in the data group to obtain a memory performance statistic value; and counting the fault positions in the data group to obtain the fault times counting value of each block in the memory in the time window corresponding to the data group.
Wherein, the statistical data corresponding to the data group comprises: and error reporting statistics, memory performance statistics and failure times statistics corresponding to each failure type.
It should be noted that, the specific execution process of step S702 may refer to step S202 or steps S402 to S404, which is not described herein again.
And S703, processing the plurality of statistical data through the target model to obtain whether the electronic equipment is down due to the memory fault in the future period.
The target model is obtained by training through the fault prediction model training method in the embodiment shown in fig. 2 and 4.
After a plurality of pieces of statistical data corresponding to any one electronic device is obtained, the plurality of pieces of statistical data can be processed through a target model, the first K data characteristics corresponding to the plurality of pieces of statistical data are determined, and whether the electronic device is down due to memory failure in a future period is predicted according to the first K data characteristics and corresponding characteristic values.
The future time period can be set to be 15 days, and the downtime possibly caused by memory faults of any electronic equipment in the future 15 days can be predicted through the target model.
For example, if K is set to be 3, the future time period is 15 days, and after a plurality of pieces of statistical data are processed by the target model, it is determined that the first 3 data features corresponding to the plurality of pieces of statistical data are respectively: the total number of failures occurring in the Bank, the average value of the memory occupancy rate and the number of memory read errors. If the corresponding characteristic value is equivalent to the corresponding characteristic value in the first positive sample data with the labeled result of-8, it may be determined that the output prediction result is 8, which indicates that the electronic device may have a downtime event caused by a memory failure on the 8 th day in the future.
If the corresponding characteristic value is equivalent to the corresponding characteristic value in the first negative sample data with the labeled result of-17, it may be determined that the predicted result is 17, which indicates that the electronic device may have a downtime event caused by a memory fault on the 17 th day in the future.
Optionally, since the future time period predicted by the target model is set to be 15 days, if the prediction result is greater than 15 days, the prediction result may be further output as "no downtime" or "very small probability of occurrence of downtime", which is used to indicate that a downtime event caused by a memory fault does not occur within 15 days in the future.
In the embodiment of the application, for any one electronic device, multiple pieces of initial data can be acquired from the electronic device, characteristic statistics is performed on the multiple pieces of initial data according to the data sampling time to obtain multiple pieces of statistical data, and then the multiple pieces of statistical data can be processed through the target model to obtain whether the electronic device is down due to a memory fault in a future period. If the data characteristics of the statistical data and the corresponding characteristic values are equivalent to the corresponding characteristic values in the first positive sample data, determining that the predicted result is a downtime event possibly caused by memory failure in the next day in a future period according to the labeling result of the first positive sample data; if the data characteristics of the statistical data and the corresponding characteristic values are equivalent to the corresponding characteristic values in the first negative sample data, the prediction result can be determined to be that the downtime event caused by the memory fault cannot occur in the future period according to the labeling result of the first negative sample data. Due to the fact that the downtime event caused by the memory fault of the electronic equipment can be predicted in the next day through the target model, the accuracy of the downtime prediction of the electronic equipment is improved.
Fig. 8 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment of the present application, please refer to fig. 8, the model training apparatus includes: an acquisition module 11, a statistics module 12, a determination module 13 and a training module 14, wherein,
the obtaining module 11 is configured to obtain a plurality of pieces of initial data from a plurality of electronic devices, where the initial data includes: the method comprises the steps of obtaining device information, memory information of a memory in the electronic device and data sampling time, wherein the memory information comprises memory fault information and memory performance information;
the statistical module 12 is configured to perform feature statistics on initial data corresponding to each electronic device respectively according to the device information and the data sampling time to obtain a plurality of pieces of statistical data;
the determining module 13 is configured to determine, according to the downtime information of the plurality of pieces of electronic equipment, a tagging result of the plurality of pieces of statistical data, where the tagging result is used to indicate a duration to be halted between a time of acquiring the statistical data and a time of downtime of the electronic equipment;
the training module 14 is configured to perform model training according to the plurality of pieces of statistical data and the labeling result corresponding to each piece of statistical data to obtain a target model, where the target model is used to determine whether the electronic device is down due to a memory fault in a future period.
The model training device provided in the embodiment of the present application can execute the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar and will not be described herein again.
In a possible implementation, the statistical module 12 is specifically configured to:
determining initial data corresponding to each electronic device according to the electronic device information;
for each electronic device, dividing initial data corresponding to the electronic device into a plurality of data groups according to the data sampling time and a preset time window, wherein the data sampling time of the initial data in one data group is located in the corresponding time window;
and respectively carrying out characteristic statistics on the initial data in each data group to obtain statistical data corresponding to the electronic equipment, wherein one data group corresponds to one piece of statistical data.
In one possible implementation, the memory failure information includes: error reporting times and fault positions corresponding to each fault type; the statistical module 12 is specifically configured to:
performing characteristic statistics on error reporting times corresponding to each fault type in the data group to obtain error reporting statistics values corresponding to each fault type;
performing characteristic statistics on the memory performance information in the data group to obtain a memory performance statistical value;
counting the fault positions in the data group to obtain a fault frequency counting value of each block in the memory in a time window corresponding to the data group;
wherein, the statistical data corresponding to the data group comprises: and the error report statistic, the memory performance statistic and the failure times statistic corresponding to each failure type.
In a possible implementation manner, the downtime information includes a downtime moment; aiming at any piece of statistical data corresponding to any piece of electronic equipment; the determining module 13 is specifically configured to:
determining the acquisition time of the statistical data according to the data sampling time in the initial data corresponding to the statistical data;
acquiring initial time between the acquisition time and the downtime time;
rounding the initial time length according to a preset time unit to obtain the time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit;
and determining that the marking result of the statistical data comprises the duration of the downtime.
In a possible implementation manner, the downtime information includes a downtime identifier and a downtime moment, or the downtime information includes a non-downtime identifier;
aiming at any piece of statistical data corresponding to any piece of electronic equipment; the determining module 13 is specifically configured to:
if the downtime information comprises the downtime identification and the downtime moment, determining a collection moment according to a data sampling moment in the initial data corresponding to the statistical data; acquiring initial time between the acquisition time and the downtime time; rounding the initial time length according to a preset time unit to obtain the time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit; determining that the marking result of the statistical data comprises the duration of the downtime;
and if the downtime information comprises the non-downtime identification, determining that the identification result of the statistical data is non-downtime or the duration of waiting for downtime is greater than or equal to preset duration.
In a possible implementation, the training module 14 is specifically configured to:
determining M positive sample data and N negative sample data in the plurality of statistical data according to the plurality of statistical data and the marking result corresponding to each statistical data, wherein the time length to be delayed indicated by the marking result corresponding to the positive sample data is less than or equal to a preset time length, the time length to be delayed indicated by the marking result corresponding to the negative sample data is greater than the preset time length, and M and N are positive integers respectively;
according to the M and the N, determining a plurality of pieces of first positive sample data in the M pieces of positive sample data, and determining a plurality of pieces of first negative sample data in the N pieces of negative sample data, wherein the difference value between the number of the first positive sample data and the number of the first negative sample data is within a preset range;
and performing model training according to the multiple pieces of first positive sample data, the multiple pieces of first negative sample data, the labeling results corresponding to the multiple pieces of first positive sample data and the labeling results corresponding to the multiple pieces of first negative sample data to obtain the target model.
In a possible implementation, the training module 14 is specifically configured to:
performing first model training according to the first positive sample data, the first negative sample data, the labeling results corresponding to the first positive sample data and the labeling results corresponding to the first negative sample data to obtain an intermediate model, and determining the importance degree of each data feature in the statistical data;
arranging according to the order of importance degree of each data feature in the statistical data from top to bottom, reserving characteristic values of the front K data features in the first positive samples to obtain second positive sample data, and reserving characteristic values of the front K data features in the first negative samples to obtain second negative sample data;
and performing second model training on the intermediate model according to the plurality of pieces of second positive sample data, the plurality of pieces of second negative sample data, the labeling results corresponding to the plurality of pieces of second positive sample data and the labeling results corresponding to the plurality of pieces of second negative sample data to obtain the target model.
In one possible embodiment, the first positive sample data and the first negative sample data respectively include a plurality of data features therein; the training module 14 is specifically configured to:
determining a first data feature in the plurality of data features according to the feature value of each data feature in the plurality of pieces of first positive sample data and the feature value of each data feature in the plurality of pieces of first negative sample data; wherein a degree of difference between a feature value corresponding to the first data feature in the first positive sample and a feature value corresponding to the first data feature in the first negative sample is greater than or equal to a second threshold;
updating the plurality of pieces of first positive sample data and the plurality of pieces of first negative sample data according to the first data characteristics, wherein the updated plurality of pieces of first positive sample data and the updated plurality of pieces of first negative sample data comprise characteristic values of the first data characteristics;
and performing model training according to the updated plurality of pieces of first positive sample data, the updated plurality of pieces of first negative sample data, the labeling results corresponding to the updated plurality of pieces of first positive sample data and the labeling results corresponding to the updated plurality of pieces of first negative sample data to obtain the intermediate model.
In a possible implementation, the training module 14 is specifically configured to:
if the M is larger than the N and the difference value between the M and the N is larger than or equal to a first threshold, performing downsampling on the M pieces of positive sample data, determining the downsampled positive sample data as the plurality of pieces of first positive sample data, and determining the N pieces of negative sample data as the plurality of pieces of first negative sample data; alternatively, the first and second electrodes may be,
if the N is larger than the M and the difference value between the N and the M is larger than or equal to the first threshold, performing downsampling on the N pieces of negative sample data, determining the downsampled negative sample data as the plurality of pieces of first negative sample data, and determining the M pieces of positive sample data as the plurality of pieces of first positive sample data.
The model training device provided in the embodiment of the present application can execute the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar and will not be described herein again.
The model training device shown in the embodiment of fig. 8 may also be referred to as a failure prediction model device, and the model training device may be a device in a server or a device in a terminal device (e.g., a computer).
Fig. 9 is a schematic structural diagram of an apparatus fault determining device according to an exemplary embodiment of the present application, please refer to fig. 9, where the apparatus fault determining device includes: an acquisition module 21, a statistics module 22 and a processing module 23, wherein,
the obtaining module 21 is configured to obtain a plurality of pieces of initial data from an electronic device, where the initial data includes: the method comprises the steps that device information, memory information of a memory in the electronic device and data sampling time are obtained, wherein the memory information comprises memory fault information and memory performance information;
the statistical module 22 is configured to perform feature statistics on the multiple pieces of initial data according to the data sampling time to obtain multiple pieces of statistical data;
the processing module 23 is configured to process the plurality of pieces of statistical data through the target model to obtain whether the electronic device is down due to a memory fault in a future period; wherein the object model is trained according to the method of any one of claims 1-9.
The device fault determination apparatus provided in the embodiment of the present application may implement the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
In one possible implementation, the memory failure information includes: error reporting times and fault positions corresponding to each fault type; the statistical module 22 is specifically configured to:
dividing the initial data into a plurality of data groups according to a preset time window, wherein the data sampling time of the initial data in one data group is positioned in the corresponding time window;
performing characteristic statistics on error reporting times corresponding to each fault type in the data group aiming at any one data group in the plurality of data groups to obtain an error reporting statistic value corresponding to each fault type;
performing characteristic statistics on the memory performance information in the data group to obtain a memory performance statistical value;
counting the fault positions in the data group to obtain a fault frequency counting value of each block in the memory in a time window corresponding to the data group;
wherein, the statistical data corresponding to the data group comprises: and the error report statistic, the memory performance statistic and the failure times statistic corresponding to each failure type.
The device fault determination apparatus provided in the embodiment of the present application may implement the technical solutions shown in the above method embodiments, and the implementation principles and beneficial effects thereof are similar, and are not described herein again.
The device failure determination apparatus shown in the embodiment of fig. 9 may be an apparatus in a server or an apparatus in a terminal device (e.g., a computer).
The exemplary embodiment of the present application provides a schematic structural diagram of an electronic device, please refer to fig. 10, where the electronic device 30 may include a processor 31 and a memory 32. Illustratively, the processor 31, the memory 32, and the various parts are interconnected by a bus 33.
The memory 32 stores computer-executable instructions;
the processor 31 executes computer-executable instructions stored by the memory 32 to cause the processor 31 to perform a fault prediction model training method as shown in the above-described method embodiments.
The electronic device shown in the embodiment of fig. 10 may be a model training device or a failure prediction device.
Accordingly, the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-readable storage medium is used for implementing the fault prediction model training method or the equipment fault determination method according to the above method embodiments.
Accordingly, the present application may further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for training a fault prediction model or the method for determining a device fault may be implemented as shown in the foregoing method embodiments.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (14)

1. A fault prediction model training method is characterized by comprising the following steps:
obtaining a plurality of pieces of initial data from a plurality of electronic devices, the initial data including: the method comprises the steps that device information, memory information of a memory in the electronic device and data sampling time are obtained, wherein the memory information comprises memory fault information and memory performance information;
respectively carrying out characteristic statistics on initial data corresponding to each electronic device according to the device information and the data sampling time to obtain a plurality of pieces of statistical data;
determining the labeling results of the plurality of statistical data according to the downtime information of the plurality of pieces of electronic equipment, wherein the labeling results are used for indicating the duration to be halted between the acquisition time of the statistical data and the downtime of the electronic equipment;
and performing model training according to the plurality of statistical data and the labeling result corresponding to each statistical data to obtain a target model, wherein the target model is used for determining whether the electronic equipment is down due to memory failure in a future period.
2. The method according to claim 1, wherein performing feature statistics on the initial data corresponding to each electronic device according to the device information and the data sampling time to obtain a plurality of pieces of statistical data respectively comprises:
determining initial data corresponding to each electronic device according to the electronic device information;
for each piece of electronic equipment, dividing initial data corresponding to the electronic equipment into a plurality of data groups according to the data sampling time and a preset time window, wherein the data sampling time of the initial data in one data group is positioned in the corresponding time window;
and respectively carrying out characteristic statistics on the initial data in each data group to obtain statistical data corresponding to the electronic equipment, wherein one data group corresponds to one piece of statistical data.
3. The method of claim 2, wherein the memory failure information comprises: error reporting times and fault positions corresponding to each fault type;
for any one of the plurality of data sets; performing feature statistics on the initial data in the data group to obtain statistical data corresponding to the data group, including:
performing characteristic statistics on error reporting times corresponding to each fault type in the data group to obtain error reporting statistics values corresponding to each fault type;
performing characteristic statistics on the memory performance information in the data group to obtain a memory performance statistical value;
counting the fault positions in the data group to obtain a fault frequency counting value of each block in the memory in a time window corresponding to the data group;
wherein, the statistical data corresponding to the data group comprises: and the error report statistic, the memory performance statistic and the failure times statistic corresponding to each failure type.
4. The method according to claim 2 or 3, wherein the downtime information comprises a downtime moment; aiming at any piece of statistical data corresponding to any piece of electronic equipment; determining the labeling result of the statistical data according to the downtime information of the electronic equipment, wherein the labeling result comprises the following steps:
determining the acquisition time of the statistical data according to the data sampling time in the initial data corresponding to the statistical data;
acquiring the initial duration between the acquisition time and the downtime time;
rounding the initial time length according to a preset time unit to obtain the time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit;
and determining that the marking result of the statistical data comprises the duration of the downtime.
5. The method according to claim 2 or 3, wherein the downtime information comprises a downtime identifier and a downtime moment, or the downtime information comprises a non-downtime identifier;
aiming at any piece of statistical data corresponding to any piece of electronic equipment; determining the labeling result of the statistical data according to the downtime information of the electronic equipment, wherein the labeling result comprises the following steps: if the downtime information comprises the downtime identification and the downtime moment, determining a collection moment according to a data sampling moment in the initial data corresponding to the statistical data; acquiring initial time between the acquisition time and the downtime time; rounding the initial time length according to a preset time unit to obtain the time length to be delayed, wherein the time length to be delayed is an integral multiple of the preset time unit; determining that the marking result of the statistical data comprises the time length to be delayed;
and if the downtime information comprises the non-downtime identification, determining that the identification result of the statistical data is non-downtime or the duration of waiting for downtime is greater than or equal to preset duration.
6. The method according to any one of claims 1 to 5, wherein performing model training according to the plurality of pieces of statistical data and the labeling result corresponding to each piece of statistical data to obtain a target model comprises:
determining M positive sample data and N negative sample data in the plurality of statistical data according to the plurality of statistical data and the labeling result corresponding to each statistical data, wherein the downtime duration indicated by the labeling result corresponding to the positive sample data is less than or equal to a preset duration, the downtime duration indicated by the labeling result corresponding to the negative sample data is greater than the preset duration, and M and N are positive integers respectively;
according to the M and the N, determining a plurality of pieces of first positive sample data in the M pieces of positive sample data, and determining a plurality of pieces of first negative sample data in the N pieces of negative sample data, wherein the difference value between the number of the first positive sample data and the number of the first negative sample data is within a preset range;
and performing model training according to the multiple pieces of first positive sample data, the multiple pieces of first negative sample data, the labeling results corresponding to the multiple pieces of first positive sample data and the labeling results corresponding to the multiple pieces of first negative sample data to obtain the target model.
7. The method of claim 6, wherein performing model training according to the first positive sample data, the first negative sample data, the labeling results corresponding to the first positive sample data, and the labeling results corresponding to the first negative sample data to obtain the target model comprises:
performing first model training according to the first positive sample data, the first negative sample data, the labeling results corresponding to the first positive sample data and the labeling results corresponding to the first negative sample data to obtain an intermediate model, and determining the importance degree of each data feature in the statistical data;
arranging according to the order of the importance degree of each data feature in the statistical data from top to bottom, reserving the feature values of the first K data features in the first positive samples to obtain a plurality of pieces of second positive sample data, and reserving the feature values of the first K data features in the first negative samples to obtain a plurality of pieces of second negative sample data;
and performing second-time model training on the intermediate model according to the plurality of pieces of second positive sample data, the plurality of pieces of second negative sample data, the labeling results corresponding to the plurality of pieces of second positive sample data and the labeling results corresponding to the plurality of pieces of second negative sample data to obtain the target model.
8. The method according to claim 6 or 7, wherein the first positive sample data and the first negative sample data respectively comprise a plurality of data features therein;
performing a first model training according to the plurality of pieces of first positive sample data, the plurality of pieces of first negative sample data, the labeling results corresponding to the plurality of pieces of first positive sample data, and the labeling results corresponding to the plurality of pieces of first negative sample data to obtain an intermediate model, including:
determining a first data feature in the plurality of data features according to the feature value of each data feature in the plurality of pieces of first positive sample data and the feature value of each data feature in the plurality of pieces of first negative sample data; wherein a degree of difference between a feature value corresponding to the first data feature in the first positive sample and a feature value corresponding to the first data feature in the first negative sample is greater than or equal to a second threshold;
updating the plurality of pieces of first positive sample data and the plurality of pieces of first negative sample data according to the first data characteristics, wherein the updated plurality of pieces of first positive sample data and the updated plurality of pieces of first negative sample data comprise characteristic values of the first data characteristics;
and performing model training according to the updated plurality of pieces of first positive sample data, the updated plurality of pieces of first negative sample data, the labeling results corresponding to the updated plurality of pieces of first positive sample data and the labeling results corresponding to the updated plurality of pieces of first negative sample data to obtain the intermediate model.
9. A method according to any one of claims 6 to 8, wherein, in dependence on said M and said N, determining a plurality of pieces of first positive sample data among said M pieces of positive sample data, and a plurality of pieces of first negative sample data among said N pieces of negative sample data, comprises:
if the M is larger than the N and the difference value between the M and the N is larger than or equal to a first threshold, performing downsampling on the M pieces of positive sample data, determining the downsampled positive sample data as the plurality of pieces of first positive sample data, and determining the N pieces of negative sample data as the plurality of pieces of first negative sample data; alternatively, the first and second electrodes may be,
if the N is larger than the M and the difference value between the N and the M is larger than or equal to the first threshold, performing downsampling on the N pieces of negative sample data, determining the downsampled negative sample data as the plurality of pieces of first negative sample data, and determining the M pieces of positive sample data as the plurality of pieces of first positive sample data.
10. An apparatus fault determination method, comprising:
obtaining a plurality of pieces of initial data from an electronic device, the initial data including: the method comprises the steps that device information, memory information of a memory in the electronic device and data sampling time are obtained, wherein the memory information comprises memory fault information and memory performance information;
according to the data sampling time, carrying out feature statistics on the plurality of pieces of initial data to obtain a plurality of pieces of statistical data;
processing the plurality of statistical data through the target model to obtain whether the electronic equipment is down due to memory failure in a future period; wherein the object model is trained according to the method of any one of claims 1-9.
11. The method of claim 10, wherein the memory failure information comprises: error reporting times and fault positions corresponding to each fault type;
according to the data sampling time, carrying out feature statistics on the initial data to obtain a plurality of statistical data, including:
dividing the plurality of pieces of initial data into a plurality of data groups according to a preset time window, wherein the data sampling time of the initial data in one data group is positioned in the corresponding time window;
performing characteristic statistics on error reporting times corresponding to each fault type in the data group aiming at any one data group in the plurality of data groups to obtain an error reporting statistic value corresponding to each fault type;
performing characteristic statistics on the memory performance information in the data group to obtain a memory performance statistic value;
counting the fault positions in the data group to obtain a fault frequency counting value of each block in the memory in a time window corresponding to the data group;
wherein, the statistical data corresponding to the data group comprises: and the error report statistic, the memory performance statistic and the failure times statistic corresponding to each failure type.
12. An electronic device, comprising: a memory and a processor;
the memory stores computer-executable instructions;
the processor executing the computer-executable instructions stored by the memory causes the processor to perform the fault prediction model training method of any one of claims 1 to 9 or the device fault determination method of any one of claims 10 to 11.
13. A computer-readable storage medium having stored thereon computer-executable instructions for implementing the fault prediction model training method of any one of claims 1 to 9 or the device fault determination method of any one of claims 10 to 11 when executed by a processor.
14. A computer program product comprising a computer program which, when executed by a processor, implements a fault prediction model training method as claimed in any one of claims 1 to 9, or a device fault determination method as claimed in any one of claims 10 to 11.
CN202210880637.9A 2022-07-25 2022-07-25 Fault prediction model training method, equipment fault determination method, device and equipment Pending CN115168173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210880637.9A CN115168173A (en) 2022-07-25 2022-07-25 Fault prediction model training method, equipment fault determination method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210880637.9A CN115168173A (en) 2022-07-25 2022-07-25 Fault prediction model training method, equipment fault determination method, device and equipment

Publications (1)

Publication Number Publication Date
CN115168173A true CN115168173A (en) 2022-10-11

Family

ID=83497052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210880637.9A Pending CN115168173A (en) 2022-07-25 2022-07-25 Fault prediction model training method, equipment fault determination method, device and equipment

Country Status (1)

Country Link
CN (1) CN115168173A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116861798A (en) * 2023-09-01 2023-10-10 华侨大学 Online real-time residual life prediction method for vacuum dry pump based on XGBoost algorithm

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116861798A (en) * 2023-09-01 2023-10-10 华侨大学 Online real-time residual life prediction method for vacuum dry pump based on XGBoost algorithm
CN116861798B (en) * 2023-09-01 2023-12-26 华侨大学 Online real-time residual life prediction method for vacuum dry pump based on XGBoost algorithm

Similar Documents

Publication Publication Date Title
Giurgiu et al. Predicting DRAM reliability in the field with machine learning
CN111767957B (en) Log abnormality detection method and device, storage medium and electronic equipment
CN115168173A (en) Fault prediction model training method, equipment fault determination method, device and equipment
CN114968652A (en) Fault processing method and computing device
CN114780644B (en) Ship navigation data processing method, device, equipment and storage medium
CN113064930B (en) Cold and hot data identification method and device of data warehouse and electronic equipment
CN110532187B (en) HDFS throughput performance testing method, system, terminal and storage medium
CN111881058A (en) Software engineering quality prediction method
CN114996065A (en) Memory fault prediction method, device and equipment
CN114860487A (en) Memory fault identification method and memory fault isolation method
CN106708648B (en) A kind of the storage method of calibration and system of text data
CN111783883A (en) Abnormal data detection method and device
CN115238779B (en) Cloud disk abnormality detection method, device, equipment and medium
CN115442262B (en) Resource evaluation method and device, electronic equipment and storage medium
WO2023061209A1 (en) Method for predicting memory fault, and electronic device and computer-readable storage medium
CN113539352A (en) Solid state disk hidden fault detection method and related equipment
CN113409876A (en) Method and system for positioning fault hard disk
CN116401088A (en) Root cause index determination method, root cause index determination device and root cause index determination equipment
US20240004765A1 (en) Data processing method and apparatus for distributed storage system, device, and storage medium
CN115269245B (en) Memory fault processing method and computing device
CN111858108B (en) Hard disk fault prediction method and device, electronic equipment and storage medium
CN113905400B (en) Network optimization processing method and device, electronic equipment and storage medium
CN113837863B (en) Business prediction model creation method and device and computer readable storage medium
CN112528523B (en) Method for predicting residual life of satellite momentum wheel voting system under known failure information
CN113568822B (en) Service resource monitoring method, device, computing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination