CN115564069A

CN115564069A - Method for determining server maintenance strategy, method for generating model and device thereof

Info

Publication number: CN115564069A
Application number: CN202211204756.9A
Authority: CN
Inventors: 李晓晨; 曾亮
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2023-01-03

Abstract

The disclosure provides a determination method of a server maintenance strategy, a generation method of a model and a device thereof, and relates to the technical field of artificial intelligence, and further relates to the technical field of machine learning. The specific implementation scheme is as follows: acquiring characteristic information of a server to be processed; the characteristic information comprises at least one of attribute characteristics, physical environment characteristics, operation characteristics and fault characteristics; inputting the characteristic information of the server to be processed into a pre-established server risk scoring model to obtain a score corresponding to each characteristic in the characteristic information, wherein the server risk scoring model learns to obtain a mapping relation between each characteristic and the score of the server; obtaining a risk score value of the server to be processed according to the score corresponding to each feature in the feature information; and determining a maintenance strategy of the server to be processed according to the risk score value. The method and the system can realize maintenance of the server to be processed as required and reduce the failure rate of the server.

Description

Method for determining server maintenance strategy, method for generating model and device thereof

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and further relates to the field of machine learning technologies, and in particular, to a method for determining a server maintenance policy, a method for generating a model, and an apparatus thereof.

Background

With the rapid development of internet technology, the number of servers in a network is increasing, and the performance and the failure rate of the servers directly affect the service quality. In order to ensure the stability of the server, regular maintenance is usually performed, problems are found as early as possible, and the online accident rate is reduced.

Disclosure of Invention

The disclosure provides a method for determining a server maintenance strategy, a method for generating a model and a device thereof.

According to an aspect of the present disclosure, there is provided a method for determining a server maintenance policy, including:

acquiring characteristic information of a server to be processed; the characteristic information comprises at least one of attribute characteristics, physical environment characteristics, operation characteristics and fault characteristics;

inputting the characteristic information of the server to be processed into a pre-established server risk scoring model, and obtaining a score corresponding to each characteristic in the characteristic information; the server risk scoring model learns the mapping relation between each characteristic and the score of the server;

obtaining a risk score value of the server to be processed according to the score corresponding to each feature in the feature information;

and determining a maintenance strategy of the server to be processed according to the risk score value.

According to a second aspect of the present disclosure, there is provided a method of generating a model, including:

collecting relevant data required for generating the server risk scoring model; the related data comprises at least one of attribute data, physical environment data, operation data and fault data of the sample server;

selecting features and training targets from the related data, and generating a training data set according to the selected features and the training targets; the selected characteristics comprise at least one of attribute characteristics, physical environment characteristics, operation characteristics and fault characteristics;

training a logistic regression model according to the training data set;

and generating the server risk scoring model according to the model parameters of the logistic regression model obtained after training.

According to a third aspect of the present disclosure, there is provided a server maintenance policy determination apparatus, including:

the first acquisition module is used for acquiring the characteristic information of the server to be processed; the characteristic information comprises at least one of attribute characteristics, physical environment characteristics, operation characteristics and fault characteristics;

the second acquisition module is used for inputting the characteristic information of the server to be processed into a pre-established server risk scoring model and acquiring a score corresponding to each characteristic in the characteristic information; the server risk scoring model learns the mapping relation between each characteristic and the score of the server;

the third acquisition module is used for acquiring the risk score value of the server to be processed according to the score corresponding to each feature in the feature information;

and the determining module is used for determining the maintenance strategy of the server to be processed according to the risk score value.

According to a fourth aspect of the present disclosure, there is provided a generation apparatus of a model, including:

the acquisition module is used for acquiring relevant data required for generating the server risk scoring model; the related data comprises at least one of attribute data, physical environment data, operation data and fault data of the sample server;

the first generation module is used for selecting characteristics and training targets from the related data and generating a training data set according to the selected characteristics and the training targets; the selected characteristics comprise at least one of attribute characteristics, physical environment characteristics, operation characteristics and fault characteristics;

a training module for training a logistic regression model according to the training data set;

and the second generation module is used for generating the server risk scoring model according to the model parameters of the logistic regression model obtained after training.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining a server maintenance policy of the first aspect or to perform the method of generating a model of the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the method of determining a server maintenance policy of the aforementioned first aspect or the method of generating a model of the aforementioned second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of determining a server maintenance policy according to the aforementioned first aspect, or the method of generating a model of the aforementioned second aspect.

According to the method for determining the server maintenance strategy, the risk score value of the server to be processed is obtained based on the characteristic information of the server to be processed and the pre-established server risk score model, and therefore the risk degree of the server to be processed is accurately evaluated. And then according to the risk score value, determining a maintenance strategy of the server to be processed, realizing maintenance according to needs and reducing the possibility of resource mismatching. The method and the system can reduce the failure rate of the server, improve the maintenance efficiency of the server and effectively reduce the waste of maintenance resources.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flowchart of a method for determining a server maintenance policy according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of another method for determining a server maintenance policy provided according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart diagram illustrating a method for generating a model according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of an implementation process for generating a training data set based on selected features and training objectives;

FIG. 5 is a schematic diagram illustrating determining a reference time point according to an embodiment of the disclosure;

FIG. 6 is a schematic flow chart diagram of another method for generating a model provided in accordance with an embodiment of the present disclosure;

FIG. 7 is a graphical illustration of evidence weight WOE values for all bins of a feature provided by an embodiment of the present disclosure;

fig. 8 is a block diagram of a device for determining a server maintenance policy according to an embodiment of the present disclosure;

fig. 9 is a block diagram illustrating an architecture of another apparatus for determining a server maintenance policy according to an embodiment of the present disclosure;

FIG. 10 is a block diagram of a model generation apparatus according to an embodiment of the present disclosure;

fig. 11 is a block diagram of an electronic device to implement the method for determining a server maintenance policy or the method for generating a model according to the embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Because the maintenance cycle and the maintenance degree of the daily regular maintenance mode mostly depend on expert experience, the maintenance on demand cannot be performed relatively comprehensively by considering factors such as the operating load, the physical operating environment, the operating condition and the like of different servers, so that the resource mismatching is caused: the server which does not need maintenance occupies the maintenance resources, and the server which needs to be maintained in advance is not maintained in time, so that serious online accidents are caused.

Therefore, the disclosure provides a determination method of a server maintenance strategy, a generation method of a model and a device thereof. Specifically, a method for determining a server maintenance policy, a method for generating a model, and an apparatus thereof according to an embodiment of the present disclosure are described below with reference to the drawings.

Fig. 1 is a flowchart illustrating a method for determining a server maintenance policy according to an embodiment of the present disclosure. It should be noted that the method for determining a server maintenance policy according to the embodiment of the present disclosure may be applied to a device for determining a server maintenance policy according to the embodiment, and the device for determining a server maintenance policy may be configured on an electronic device. As shown in fig. 1, the method for determining the server maintenance policy includes the following steps:

step 101, obtaining characteristic information of a server to be processed. The characteristic information includes at least one of an attribute characteristic, a physical environment characteristic, an operation characteristic, and a failure characteristic.

Optionally, in some embodiments of the present disclosure, the attribute feature may be a service life of the server, a time since last overhaul, a brand, or the like; the physical environment characteristics can be the characteristics of temperature, humidity, noise, illumination and the like of the machine room where the server is located; the operation characteristics can be the disk occupancy rate, the memory occupancy rate, the CPU occupancy rate, the network throughput, the power supply power and other characteristics of the server; the failure characteristics may be characteristics of the time of occurrence of the server failure, the severity of the failure, and the like.

Step 102, inputting the characteristic information of the server to be processed into a pre-established server risk scoring model, and obtaining a score corresponding to each characteristic in the characteristic information. The server risk scoring model has learned the mapping between the server's various features and scores.

It should be noted that, for the generation method of the model in this embodiment, reference may be made to the description in the subsequent embodiments of the disclosure, and details are not described here.

And 103, acquiring a risk score value of the server to be processed according to the score corresponding to each feature in the feature information.

Optionally, a certain operation, for example, a summation operation, may be performed on the score corresponding to each feature in the feature information, so as to obtain a risk score of the server to be processed. Or, a reference score may be set, and a certain operation, such as summation calculation, difference calculation, and the like, is performed on the reference score and the score corresponding to each feature in the feature information, so as to obtain a risk score of the server to be processed. As an example, the reference score may be added to a score corresponding to each feature in the feature information, and the obtained sum may be used as a risk score of the to-be-processed server. As another example, the score corresponding to each feature in the feature information may be subtracted from the reference score, and the obtained difference may be used as the risk score of the server to be processed. The benchmark score can be understood as a risk score benchmark score of the server to be processed under the condition that the characteristics of the server to be processed are not considered, because the score corresponding to the characteristics may be a positive number or a negative number, in order to avoid that the obtained risk score of the server to be processed is a negative number, a benchmark score can be set, the benchmark score can correspond to the server risk score model, and on the basis of the benchmark score, certain operation processing is performed on the score corresponding to each characteristic in the characteristic information.

And step 104, determining a maintenance strategy of the server to be processed according to the risk score value.

Optionally, the risk score value and the maintenance policy may establish a mapping relationship. For example, in one implementation, the higher the risk score value, the lower the risk representing the pending server, and the higher the maintenance frequency is not required; the lower the risk score value, the higher the risk representing the server to be processed, and the maintenance frequency needs to be increased appropriately. In another implementation, the lower the risk score value, the lower the risk representing the pending server, without requiring an excessively high maintenance frequency; the higher the risk score value is, the higher the risk representing the server to be processed is, and the maintenance frequency needs to be properly increased.

Fig. 2 is a flowchart illustrating another method for determining a server maintenance policy according to an embodiment of the disclosure. As shown in fig. 2, the method for determining the server maintenance policy includes the following steps:

step 201, obtaining characteristic information of a server to be processed. The characteristic information includes at least one of an attribute characteristic, a physical environment characteristic, an operation characteristic, and a failure characteristic.

Step 202, inputting the characteristic information of the server to be processed into a pre-established server risk scoring model, and obtaining a reference score and a score corresponding to each characteristic in the characteristic information. The server risk scoring model has learned the mapping between the server's various features and scores.

Step 203, obtaining a risk score value of the server to be processed according to the score corresponding to each feature in the feature information.

As a possible implementation manner, a summation operation may be performed on the scores corresponding to each feature in the feature information, and the obtained sum is determined as the risk score of the server to be processed.

For example, it is assumed that the feature information of the server to be processed includes a brand feature, a temperature feature, and a memory occupancy feature. And summing the scores corresponding to the brand features, the temperature features and the memory occupancy rate features, and determining an obtained sum 14 as a risk score of the server to be processed.

As another possible implementation manner, a reference score corresponding to the server risk scoring model may be obtained, and the risk scoring value of the server to be processed may be obtained according to the reference score and a score corresponding to each feature in the feature information. And performing certain operation processing, such as summation operation and difference operation, on the reference score and the score corresponding to each feature in the feature information, so as to obtain a risk score of the server to be processed.

As an example, a summation operation may be performed on the reference score and the score corresponding to each feature in the feature information, and the resulting sum may be determined as the risk score of the server to be processed.

It should be noted that, because the score corresponding to a feature may be a positive number or a negative number, in order to avoid that the obtained risk score value of the server to be processed is a negative number, a reference score may be set, and the reference score may correspond to the server risk score model, and on the basis of the reference score, a certain operation process is performed on the score corresponding to each feature in the feature information.

For example, it is assumed that the feature information of the server to be processed includes a brand feature, a temperature feature, and a memory occupancy feature. The server risk score value corresponding to the server risk score model is 100, the score corresponding to the brand feature is 5, the score corresponding to the temperature feature is 2, the score corresponding to the memory occupancy rate feature is 7, the reference score, the brand feature, the score corresponding to the temperature feature and the score corresponding to the memory occupancy rate feature are summed, and the sum value 114 is determined as the risk score value of the server to be processed.

As still another example, the reference score and the score corresponding to each feature in the feature information may be subjected to a difference operation, and the obtained difference value may be determined as the risk score value of the server to be processed. For example, it is assumed that the feature information of the server to be processed includes a brand feature, a temperature feature, and a memory occupancy feature. The reference score corresponding to the server risk scoring model is 100, the score corresponding to the brand feature is 5, the score corresponding to the temperature feature is 2, the score corresponding to the memory occupancy rate feature is 7, the score corresponding to the brand feature, the temperature feature and the memory occupancy rate feature is subtracted from the reference score, and the obtained difference 86 is determined as the risk scoring value of the server to be processed.

Optionally, in an implementation manner, one base score may correspond to one server risk scoring model, that is, each different server risk scoring model corresponds to a different base score, for example, server risk scoring model a corresponds to base score a, server risk scoring model B corresponds to base score B, server risk scoring model a and server risk scoring model B are different models, and base score a and base score B are different scores. In another implementation, one base score may correspond to multiple different server risk scoring models, for example, server risk scoring model a corresponds to base score a, server risk scoring model C corresponds to base score a, and server risk scoring model a and server risk scoring model C are different models.

And step 204, acquiring the corresponding relation between the maintenance frequency and the risk score value.

As one example, a plurality of maintenance intervals may be pre-partitioned, wherein each maintenance interval includes a correspondence between a maintenance frequency and a risk score value. For example, four maintenance intervals are divided: the method comprises the following steps of a key maintenance interval, a high-frequency maintenance interval, a medium-frequency maintenance interval and a low-frequency maintenance interval, wherein a benchmark score is 600. When the risk score value is below a first threshold (for example, 300), corresponding to the important maintenance interval, the server to be processed needs to be maintained as soon as possible; when the risk score value is above a first threshold (for example, 300) and below a second threshold (for example, 450), the server to be processed needs to increase the maintenance frequency corresponding to the high-frequency maintenance interval; when the risk score value is above a second threshold (e.g., 450) and below a third threshold (e.g., 550), the server to be processed only needs to use the normal maintenance frequency corresponding to the intermediate frequency maintenance interval; when the risk score value is above a third threshold (e.g., 550) and below a fourth threshold (e.g., 600), the pending server uses a lower maintenance frequency corresponding to the low frequency maintenance interval. It should be noted that the score value division ranges of the multiple maintenance intervals are only exemplary and are not meant to limit the disclosure.

And step 205, determining a maintenance strategy of the server to be processed according to the risk score value and the corresponding relation of the server to be processed.

In the embodiment of the present disclosure, steps 201 to 202 may be implemented by any one of the embodiments of the present disclosure, and the present disclosure is not specifically limited and is not described in detail herein.

According to the method for determining the server maintenance strategy, the risk score value of the server to be processed is obtained based on the characteristic information of the server to be processed and the pre-established server risk score model, and therefore the risk degree of the server to be processed is accurately evaluated. And then according to the corresponding relation between the risk score value and the maintenance frequency and the risk score value, determining the maintenance strategy of the server to be processed, wherein the obtained maintenance strategy of the server to be processed is more accurate, maintenance according to needs is realized, the possibility of resource mismatching is further reduced, the failure rate of the server can be reduced, the maintenance efficiency of the server is improved, and the waste of maintenance resources is effectively reduced.

The present disclosure further provides a method for generating a model, and fig. 3 is a schematic flow chart of the method for generating a model according to the embodiment of the present disclosure. The method for generating a model according to the embodiment of the present disclosure may be applied to a device for generating a model according to the embodiment, and the device for generating a model may be disposed in an electronic device. As shown in fig. 3, the generation method of the model includes the following steps:

step 301, collecting relevant data required for generating a server risk scoring model. The related data includes at least one of attribute data, physical environment data, operational data, and failure data of the sample server.

Step 302, selecting features and training targets from the related data, and generating a training data set according to the selected features and training targets. The selected characteristics include at least one of attribute characteristics, physical environment characteristics, operational characteristics, and fault characteristics.

It should be noted that, as for the attribute features, the service life of the server, the time of the previous overhaul, the brand, and other features may be directly selected as the features required for generating the training data set. For physical environment features, operation features and fault features, aggregation calculation within a certain time window needs to be performed according to the features of the features, and the aggregated features are used as features required for generating a training data set. For example, the average, peak, difference, and count of the characteristic values in a certain time window are taken, and the step length of the time window can be selected from one week, one month, three months, one year, and the like.

For example, for the temperature characteristics in the physical environment characteristics, the average value of the last month, the peak value of the last month and the maximum temperature difference of the last month can be selected; for the disk occupancy rate characteristics in the operation characteristics, the average value of the last month and the peak value of the last week can be selected; for the failure characteristics, the number of times of serious failures occurred in the last year, the number of times of micro failures in the last year, and the like can be selected.

In addition, a failure condition in a period of time after the time window used in selecting the features may be obtained and used as a training target required for generating the training data set. It should be noted that the failure condition may refer to whether a failure occurs in the period of time, whether a serious failure occurs, or the like. As an example, assuming that the step size of the time window when the feature is selected among the related data is 6 months, and the features from january to june in a certain year are selected, the fault condition in a period of time after june can be taken as the training target, for example, the fault condition of july.

Step 303, training a logistic regression model according to the training data set.

In some embodiments of the present disclosure, training data is fitted using logistic regression, thereby training a logistic regression model, obtaining model parameters in the model.

And step 304, generating a server risk scoring model according to the model parameters of the logistic regression model obtained after training.

According to the model generation method disclosed by the embodiment of the disclosure, the characteristics and the training targets are selected from the historical relevant data of the sample server, the training data set is generated, the logistic regression model is trained, and then the server risk scoring model is generated. The server risk scoring model disclosed by the invention can be used for accurately scoring the risk degree of the server. Further, as historical data accumulates, new data is added to the training data, and the accuracy of the server risk scoring model will also improve.

It should be noted that, in order to accurately select the features and the training targets, in some embodiments of the present disclosure, the implementation process of selecting the features and the training targets from the relevant data and generating the training data set according to the selected features and the training targets in step 302 may be as shown in fig. 4, and includes the following steps:

step 401, determining a preset training target aggregation window and a preset feature aggregation window, and determining a reference time point according to the training target aggregation window and the feature aggregation window.

It should be noted that, when selecting the feature and the training target, it is necessary to select the related data a period of time after the reference time point as the training target, and select the related data a period of time before the reference time point as the selected feature. For a better understanding of this step, reference will now be made to FIG. 5. As shown in fig. 5, assuming that relevant data required for establishing a server risk score model for a year is collected in step 301, the step size of the feature aggregation window is set to 6 months, and the step size of the training target aggregation window is set to 1 month, when a training data set is constructed by selecting a feature and a training target, the reference time point can only select a time point between 7 months and 11 months.

Step 402, determining an aggregate feature acquisition time interval and a target acquisition time interval based on the reference time point.

Optionally, a first time period before the reference time point is used as the aggregate feature acquisition time interval, and a second time period after the reference time point is used as the target acquisition time interval. The duration of the first time period is the same as the step length of the characteristic aggregation window, and the duration of the second time period is the same as the step length of the training target aggregation window.

For example, if the step size of the feature aggregation window is 6 months, and the step size of the training target aggregation window is 1 month, the first time period is 6 months, and the second time period is 1 month. Therefore, 6 months before the reference time point is set as the aggregate characteristic collection time interval, and 1 month after the reference time point is set as the target collection time interval. As shown in fig. 5, if the reference time point is determined to be 6 months, the aggregate feature acquisition time interval is 1 month to 6 months, and the target acquisition time interval is 7 months. If the reference time point is determined to be 10 months, the aggregate feature acquisition time interval is 5 months to 10 months, and the target acquisition time interval is 11 months. It should be noted that, among the acquired related data, multiple reference time points may be selected to determine multiple sets of aggregate feature acquisition time intervals and target acquisition time intervals.

Step 403, selecting features from the relevant data according to the aggregate feature acquisition time interval. The selected features include features to be aggregated and attribute features.

It should be noted that the feature to be aggregated includes at least one of a physical environment feature, an operation feature, and a failure feature.

And step 404, aggregating the features to be aggregated based on the feature aggregation window and a preset aggregation mode to obtain aggregated features.

Optionally, for different features to be polymerized, a corresponding polymerization mode is selected for polymerization. For example, the aggregation mode may be to select an average value, a peak value, a difference value, a count, and the like of the feature values in the feature aggregation window. As an example, it may be determined to use a corresponding aggregation manner according to characteristics of the features to be aggregated (for example, for the features of numerical type (such as temperature, humidity, noise, illuminance, disk occupancy of a server, memory occupancy, CPU occupancy, network throughput, power supply, and the like), it may use at least one aggregation manner of an average value, a peak value, and a difference value to perform aggregation, and for the failure features, it may select a counting aggregation manner to perform aggregation.

For example, for a temperature feature in the physical environment feature, when the step size of the feature aggregation window is one month, the aggregation feature thereof may be an average value of the temperature feature in the last month, a peak value in the last month, and a maximum temperature difference in the last month; for the disk occupancy rate characteristics in the operation characteristics, when the step length of the characteristic aggregation window is one week, the aggregation characteristics can be the average value of the disk occupancy rate characteristics in the last week and the peak value of the last week; for the failure characteristics, a counting aggregation mode can be selected for aggregation, and when the step size of the characteristic aggregation window is one year, the aggregation characteristics can be the number of times of serious failures occurring in the last year, the number of times of micro failures in the last year, and the like.

Step 405, selecting a training target from the relevant data according to the target acquisition time interval.

As an example, a fault condition within a target acquisition time interval may be obtained as a training target required to generate a training data set. It should be noted that the failure condition may refer to whether a failure occurs in the target acquisition time interval, whether a serious failure occurs, or the like. For example, assuming that the target acquisition time interval is July, a fault condition of July may be taken as a training target.

And 406, generating a training data set according to the aggregation characteristics, the attribute characteristics and the selected training target.

Therefore, through steps 401 to 406, according to the training target aggregation window, the feature aggregation window and the reference time point, the features and the training targets can be accurately selected to generate a training data set required for establishing the server risk score model.

Fig. 6 is a flowchart illustrating another method for generating a model according to an embodiment of the disclosure. As shown in fig. 6, the generation method of the model includes the following steps:

step 601, collecting relevant data required for establishing a server risk scoring model. The related data includes at least one of attribute data, physical environment data, operational data, and failure data of the sample server.

Step 602, selecting features and training targets from the relevant data, and generating a training data set according to the selected features and training targets. The selected characteristics include at least one of attribute characteristics, physical environment characteristics, operational characteristics, and fault characteristics.

Step 603, performing binning processing on the feature information in the training data set to obtain a binning result of each feature.

Optionally, in some embodiments of the present disclosure, discretization may be performed on the value class characteristics in the characteristic information, such as temperature characteristics and the like. As a possible implementation mode, the characteristic information in the training data can be discretized by using a uniform-width binning mode, a uniform-frequency binning mode and an optimal binning mode.

As an example, optimal binning may be performed based on the CART algorithm, the number of samples in the current binning is divided into two in each binning, the degree of reduction of the kini value due to each segmentation possibility is sequentially calculated, and a point where the kini value is most reduced than before the segmentation is selected as an optimal segmentation point. And further recursively dividing the two sample subsets obtained by binning until a termination condition is met. The termination condition may choose that the number of samples of the leaf node must be greater than 3% of the total samples.

And step 604, mapping the binned features into numerical values based on the binning result of each feature to obtain the evidence weight WOE values of all the bins of each feature.

Wherein, the calculation formula of the evidence weight WOE value of all the bins of each feature can be expressed as formula (1):

wherein, WOE _i For the ith bin's evidence weight WOE value, bad _i Bad sample number, bad for ith bin _T For the total number of bad samples, good _i Good sample number for ith bin _T The total number of good samples.

It should be explained that a good sample refers to a sample in which no accident has occurred, and a bad sample refers to a sample in which an accident has occurred. As an example, the value of the evidence weight WOE for all bins of a feature may be as shown in fig. 7.

Step 605, training a logistic regression model according to the WOE values of each feature and all the bins thereof and the training target.

In some embodiments of the present disclosure, training data is fitted using logistic regression. Wherein, the logistic regression formula is expressed as formula (2):

where p is the probability of a bad sample, θ ^T The parameters obtained by logistic regression training are x, which is training data after WOE conversion.

Further, to ensure the validity of feature information among the training data, in some embodiments of the present disclosure, an information value IV value of the corresponding feature may be calculated based on the WOE values of all bins of each feature. And screening the characteristic information in the training data set according to the IV value of each characteristic to obtain a first characteristic with the IV value meeting a preset threshold value. And further training a logistic regression model according to the first characteristic, the WOE values of all the bins of the first characteristic and the training target.

As an example, the calculation formula of the IV value is expressed as formula (3):

it should be noted that a larger IV value indicates that the feature is more effective. For example, in some embodiments of the present disclosure, a preset threshold may be set to 0.02, a feature with an IV value greater than 0.02 may be used as the first feature, and a logistic regression model may be trained according to the first feature and the WOE values of all its bins and the training target.

And 606, generating a server risk scoring model according to the model parameters of the logistic regression model obtained after training.

From the definition of logistic regression one can obtain:

equation (4) can be converted to:

log(odds)＝θ ^T x (5)

wherein the content of the first and second substances,

p is the probability of a bad sample, 1-the probability of a good sample.

The score of the server risk scoring model may be defined as a linear expression of the log of ratios:

Score＝A-B*log(odds) (6)

wherein A and B are constants. The symbol before B may be such that the lower the probability of breach, the higher the score. Typically, a high score represents a low risk and a low score represents a high risk.

Developed from equation (6):

Score＝A-B{β ₀ +β ₁ x ₁ +…+β _p x _p } (7)

wherein, beta ₀ …β _p Is a plurality of elements in the model parameter matrix theta of the logistic regression model. Since x is training data after WOE conversion, equation (7) can be expanded as follows

Wherein, w _ij WOE value of i variable at j row _ij And when the variable i takes the jth binning value, the variable i is 1, and otherwise, the variable i is 0. Equation (8) can be further rewritten as:

Score＝(A-Bβ ₀ )-(Bβ ₁ w ₁₁ )δ ₁₁ -(Bβ ₁ w ₁₂ )δ ₁₂ -…-Bβ _p w _p1 )δ _p1 -Bβ _p w _p2 )δ _p2 -…

(9)

the standard expression for the resulting server risk scoring model is shown in table 1:

TABLE 1

Wherein, the values of A and B can be calculated by substituting two known or assumed scores. For example:

assume that 1: the expected score at a particular probability of default, i.e., the ratio odds, is θ ₀ The score of time is P ₀ 。

Assume 2: score of doubling the probability of breach (PDO)

From the above assumptions, a system of equations can be obtained:

P ₀ ＝A-B×log(θ ₀ ) (10)

P ₀ -PDO＝A-B×log(2θ ₀ ) (11)

solved to obtain

A＝P ₀ +B×log(θ ₀ ) (13)

As an example, the base score P ₀ Can be 600 minutes, PDO is 50, odds is 1.

According to the model generation method disclosed by the embodiment of the disclosure, the characteristics and the training targets are selected from the historical relevant data of the sample server, the characteristic information in the training data set is subjected to binning processing, the binned characteristics are mapped into numerical values, and the evidence weight WOE values of all the bins of each characteristic are obtained. And training a logistic regression model according to the WOE values of each feature and all the bins of the feature and the training target, and further generating a server risk scoring model. The server risk scoring model disclosed herein can more accurately score the risk level of the server. Further, as historical data accumulates, new data is added to the training data, and the accuracy of the server risk scoring model will also improve.

Fig. 8 is a block diagram illustrating a structure of a device for determining a server maintenance policy according to an embodiment of the present disclosure. As shown in fig. 8, the apparatus for determining the server maintenance policy includes a first obtaining module 801, a second obtaining module 802, a third obtaining module 803, and a determining module 804.

The first obtaining module 801 is configured to obtain feature information of a server to be processed; the characteristic information includes at least one of an attribute characteristic, a physical environment characteristic, an operation characteristic, and a failure characteristic.

A second obtaining module 802, configured to input the feature information of the server to be processed into a server risk scoring model that is established in advance, and obtain a score corresponding to each feature in the feature information; the server risk scoring model has learned the mapping between the various features of the server and the scores.

The third obtaining module 803 is configured to obtain a risk score of the server to be processed according to a score corresponding to each feature in the feature information.

In some embodiments of the present disclosure, the third obtaining module 803 is specifically configured to: acquiring a reference value corresponding to the server risk scoring model; and obtaining a risk score value of the server to be processed according to the reference score and the score corresponding to each feature in the feature information.

In some embodiments of the present disclosure, the third obtaining module 803 is specifically configured to: and summing the reference score and the score corresponding to each feature in the feature information, and determining the obtained sum as the risk score of the server to be processed.

The determining module 804 is configured to determine a maintenance policy of the to-be-processed server according to the risk score value.

In some embodiments of the present disclosure, the determining module 804 is specifically configured to: acquiring a corresponding relation between the maintenance frequency and the risk score value; and determining the maintenance frequency of the server to be processed according to the risk score value and the corresponding relation of the server to be processed.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

According to the device for determining the server maintenance strategy, the risk score value of the server to be processed is obtained based on the characteristic information of the server to be processed and the pre-established server risk score model, and therefore the risk degree of the server to be processed is accurately evaluated. And then, according to the risk score value, a maintenance strategy of the server to be processed is accurately determined, maintenance is achieved according to needs, and the possibility of resource mismatching is reduced. The method and the system can reduce the failure rate of the server, improve the maintenance efficiency of the server and effectively reduce the waste of maintenance resources.

Fig. 9 is a block diagram illustrating a structure of another apparatus for determining a server maintenance policy according to an embodiment of the present disclosure. In some embodiments of the present disclosure, on the basis of the embodiment shown in fig. 8, as shown in fig. 9, the apparatus for determining a server maintenance policy further includes a model building module 905 configured to build a server risk score model in advance.

The model building module 905 is specifically configured to:

collecting relevant data required for establishing a server risk scoring model; the related data comprises at least one of attribute data, physical environment data, operation data and fault data of the sample server;

training a logistic regression model according to the training data set;

and generating a server risk scoring model according to the model parameters of the logistic regression model obtained after training.

In some embodiments of the present disclosure, the model building module 905 is specifically configured to:

determining a preset training target aggregation window and a preset characteristic aggregation window, and determining a reference time point according to the training target aggregation window and the preset characteristic aggregation window;

determining an aggregate characteristic acquisition time interval and a target acquisition time interval based on the reference time point;

selecting characteristics from the related data according to the aggregation characteristic acquisition time interval; the selected characteristics comprise characteristics to be aggregated and attribute characteristics;

polymerizing the features to be polymerized based on the feature polymerization window and a preset polymerization mode to obtain polymerization features;

selecting a training target from the related data according to the target acquisition time interval;

and generating a training data set according to the aggregation characteristics, the attribute characteristics and the selected training target.

In some embodiments of the present disclosure, the model building module 905 is specifically configured to: performing box separation processing on the feature information in the training data set to obtain a box separation result of each feature; mapping the binned features into numerical values based on the binning result of each feature to obtain the evidence weight WOE values of all the bins of each feature; and training a logistic regression model according to the WOE values of each feature and all the bins of the feature and the training target.

In some embodiments of the present disclosure, the model building module 905 is specifically configured to: and calculating information value IV values of corresponding features based on the WOE values of all the sub-boxes of each feature, screening the feature information in the training data set according to the IV value of each feature to obtain a first feature with the IV value meeting a preset threshold, and training a logistic regression model according to the first feature, the WOE values of all the sub-boxes of the first feature and a training target.

901-904 in fig. 9 and 801-804 in fig. 8 have the same functions and structures.

According to the device for determining the server maintenance strategy, the pre-established server risk scoring model obtains the risk scoring value of the server to be processed based on the characteristic information of the server to be processed and the pre-established server risk scoring model, and therefore the risk degree of the server to be processed is accurately evaluated. And then, according to the risk score value, a maintenance strategy of the server to be processed is accurately determined, maintenance is realized according to needs, and the possibility of resource mismatching is reduced. The method and the system can reduce the failure rate of the server, improve the maintenance efficiency of the server and effectively reduce the waste of maintenance resources.

Fig. 10 is a block diagram of a model generation apparatus according to an embodiment of the present disclosure. As shown in fig. 10, the generation device of the model includes an acquisition module 1001, a first generation module 1002, a training module 1003, and a second generation module 1004.

The acquisition module 1001 is configured to acquire relevant data required for generating a server risk score model; the related data comprises at least one of attribute data, physical environment data, operation data and fault data of the sample server;

a first generating module 1002, configured to select a feature and a training target from the relevant data, and generate a training data set according to the selected feature and the training target; the selected characteristics include at least one of attribute characteristics, physical environment characteristics, operational characteristics, and fault characteristics.

In some embodiments of the present disclosure, the first generating module 1002 is specifically configured to: determining a preset training target aggregation window and a preset characteristic aggregation window, and determining a reference time point according to the training target aggregation window and the preset characteristic aggregation window;

A training module 1003 for training a logistic regression model according to the training data set;

in some embodiments of the present disclosure, the training module 1003 is specifically configured to: performing box separation processing on the characteristic information in the training data set to obtain a box separation result of each characteristic;

mapping the binned features into numerical values based on the binning result of each feature to obtain the evidence weight WOE values of all the bins of each feature;

training a logistic regression model according to the WOE values of each feature and all the bins thereof and the training target.

In some embodiments of the present disclosure, the training module 1003 is specifically configured to: and calculating information value IV values of corresponding features based on the WOE values of all the sub-boxes of each feature, and screening the feature information in the training data set according to the IV value of each feature to obtain a first feature with the IV value meeting a preset threshold. Training a logistic regression model according to the first characteristic and WOE values of all the bins of the first characteristic and the training target.

And a second generating module 1004, configured to generate a server risk scoring model according to the model parameters of the logistic regression model obtained after training.

According to the generation device of the model, the characteristics and the training targets are selected from the historical relevant data of the sample server, the training data set is generated, the logistic regression model is trained, and then the server risk scoring model is generated. The server risk scoring model disclosed by the invention can accurately score the risk degree of the server. Further, as historical data accumulates, new data is added to the training data, and the accuracy of the server risk scoring model will also improve.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

As shown in fig. 11, fig. 11 is a block diagram of an electronic device to implement the method for determining a server maintenance policy or the method for generating a model according to the embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the electronic apparatus includes: one or more processors 1101, a memory 1102, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing some of the necessary operations (e.g., as an array of servers, a group of blade servers, or a multi-processor system). In fig. 11, a processor 1101 is taken as an example.

Memory 1102 is a non-transitory computer readable storage medium provided by the present disclosure. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform a method for determining a server maintenance policy or a method for generating a model provided by the present disclosure. The non-transitory computer-readable storage medium of the present disclosure stores computer instructions for causing a computer to execute the determination method of the server maintenance policy or the generation method of the model provided by the present disclosure.

The memory 1102 may be used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the determination method of the server maintenance policy or the generation method of the model in the embodiment of the present disclosure (for example, a first obtaining module 901, a second obtaining module 902, a third obtaining module 903, a determination module 904, and a model building module 905 shown in fig. 9, and an acquisition module 1001, a first generation module 1002, a training module 1003, and a second generation module 1004 shown in fig. 10). The processor 1101 executes various functional applications of the server and data processing, that is, a method of determining a server maintenance policy or a method of generating a model in the above-described method embodiment, by executing a non-transitory software program, instructions, and modules stored in the memory 1102.

The memory 1102 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the electronic device of the determination method of the server maintenance policy or the generation method of the model, or the like. Further, the memory 1102 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1102 may optionally include a memory remotely located from the processor 1101, and such remote memory may be connected over a network to an electronic device to implement the server maintenance policy determination method or the model generation method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for determining the server maintenance policy or the method for generating the model may further include: an input device 1103 and an output device 1104. The processor 1101, the memory 1102, the input device 1103 and the output device 1104 may be connected by a bus or other means, and are exemplified by being connected by a bus in fig. 11.

The input device 1103 may receive input numeric or character information, and generate key signal inputs related to user settings and function control of the electronic device of the method of determining the server maintenance policy or the method of generating the model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 1104 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: the present disclosure also proposes a computer program which, when executed by a processor, implements the method for determining a server maintenance policy or the method for generating a model described in the above embodiments, the computer program or programs being executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special-purpose or general-purpose programmable processor, which may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain. It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for determining a server maintenance policy comprises the following steps:

2. The method of claim 1, wherein the determining the maintenance policy of the pending server according to the risk score value comprises:

acquiring a corresponding relation between the maintenance frequency and the risk score value;

and determining the maintenance frequency of the server to be processed according to the risk score value of the server to be processed and the corresponding relation.

3. The method of claim 1, wherein the server risk scoring model is pre-established by:

collecting relevant data required for establishing the server risk scoring model; the related data comprises at least one of attribute data, physical environment data, operation data and fault data of the sample server;

training a logistic regression model according to the training data set;

4. The method of claim 3, wherein said selecting features and training objectives from said correlated data and generating a training data set based on said selected features and training objectives comprises:

determining a preset training target aggregation window and a preset feature aggregation window, and determining a reference time point according to the training target aggregation window and the feature aggregation window;

determining an aggregate feature acquisition time interval and a target acquisition time interval based on the reference time point;

5. The method of claim 3, wherein the training a logistic regression model from the training data set comprises:

performing binning processing on the feature information in the training data set to obtain binning results of each feature;

mapping the classified features into numerical values based on the classification result of each feature to obtain the evidence weight WOE values of all the classifications of each feature;

training a logistic regression model according to the WOE values of each feature and all the bins of the feature and the training target.

6. The method of claim 5, wherein said training a logistic regression model based on the WOE values of said each feature and all its bins, and said training target comprises:

calculating an information value IV value of the corresponding characteristic based on the WOE values of all the sub-boxes of each characteristic;

screening the characteristic information in the training data set according to the IV value of each characteristic to obtain a first characteristic with the IV value meeting a preset threshold;

training a logistic regression model according to the first feature and WOE values of all the bins of the first feature and the training target.

7. The method of claim 1, wherein obtaining the risk score value of the server to be processed according to the score corresponding to each feature in the feature information comprises:

acquiring a reference value corresponding to the server risk scoring model;

and obtaining a risk score value of the server to be processed according to the reference score and the score corresponding to each feature in the feature information.

8. The method of claim 7, wherein the obtaining a risk score value of the server to be processed according to the benchmark score and a score corresponding to each feature in the feature information comprises:

and summing the benchmark score and the score corresponding to each feature in the feature information, and determining the obtained sum as the risk score of the server to be processed.

9. A method of generating a model, comprising:

collecting relevant data required for generating a server risk scoring model; the related data comprises at least one of attribute data, physical environment data, operation data and fault data of the sample server;

selecting features and training targets from the related data, and generating a training data set according to the selected features and training targets; the selected characteristics comprise at least one of attribute characteristics, physical environment characteristics, operation characteristics and fault characteristics;

training a logistic regression model according to the training data set;

10. The method of claim 9, wherein said selecting features and training objectives from said correlated data and generating a training data set based on said selected features and training objectives comprises:

11. The method of claim 9, wherein the training a logistic regression model from the training data set comprises:

12. The method of claim 11, wherein said training a logistic regression model based on the WOE values of said each feature and all its bins, and the training target comprises:

13. A device for determining a server maintenance policy, comprising:

14. The apparatus of claim 13, wherein the determining module is specifically configured to:

15. The apparatus of claim 13, further comprising:

the model building module is used for building the server risk scoring model in advance; wherein the model building module is specifically configured to:

training a logistic regression model according to the training data set;

16. The apparatus of claim 15, wherein the model building module is specifically configured to:

17. The apparatus of claim 15, wherein the model building module is specifically configured to:

18. The apparatus of claim 17, wherein the model building module is specifically configured to:

19. The apparatus of claim 13, wherein the third obtaining module is specifically configured to:

acquiring a reference value corresponding to the server risk scoring model;

20. The apparatus of claim 19, wherein the third obtaining module is specifically configured to:

and performing summation operation on the reference score and the score corresponding to each feature in the feature information, and determining the obtained sum as the risk score of the server to be processed.

21. An apparatus for generating a model, comprising:

the acquisition module is used for acquiring relevant data required by generating a server risk scoring model; the related data comprises at least one of attribute data, physical environment data, operation data and fault data of the sample server;

22. The apparatus of claim 21, wherein the first generating means is specifically configured to:

23. The apparatus of claim 21, wherein the training module is specifically configured to:

24. The apparatus of claim 23, the training module to:

25. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8 or to perform the method of any one of claims 9 to 12.

26. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 8 or causing the computer to perform the method of any one of claims 9 to 12.

27. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any one of claims 1 to 8, or implements the steps of the method of any one of claims 9 to 12.