CN116225808A

CN116225808A - Correction method, device, equipment, storage medium and program product for data label

Info

Publication number: CN116225808A
Application number: CN202310286549.0A
Authority: CN
Inventors: 杨执钧; 陆君杰; 石皓魁
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-03-22
Filing date: 2023-03-22
Publication date: 2023-06-06

Abstract

The application relates to a data tag correction method, device, equipment, storage medium and program product, and relates to the field of artificial intelligence. Comprising the following steps: if the fault prediction model detects that a fault hard disk exists, determining a first characteristic distance between two stored data contained in each data pair in the fault hard disk and a second characteristic distance between each stored data in each data pair and reference data of each data pair; any two storage data in the fault hard disk form a data pair; the reference data is any other stored data except the data pair in the fault hard disk; determining a tag correction strategy of each stored data according to the first characteristic distance and the second characteristic distance of each data pair and the health degree tag of each stored data; correcting the health degree label of each stored data according to the label correction strategy of each stored data; and each stored data after the health degree label is corrected is used for retraining the fault prediction model. To improve the accuracy of the fault prediction model.

Description

Correction method, device, equipment, storage medium and program product for data label

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a method, apparatus, device, storage medium, and program product for correcting a data tag.

Background

With the development of computer technology in various fields, storage systems have been applied in various fields. Storage systems, such as hard disks, are the most important components of data centers in various fields.

At present, in order to ensure normal storage of data, in the process of using a hard disk to store data, it is necessary to predict faults of the stored data of the hard disk in real time through a fault prediction model, label health degree labels for each stored data, for example, label the health degree labels as faults for data with abnormal detection results, and label the health degree labels as health for data with normal detection results. When the health degree label with the fault of the storage data in the hard disk is detected, the hard disk fault alarm information is output, and the operation and maintenance personnel are reminded to replace the hard disk.

Because whether the hard disk fault alarm is timely and accurate depends on the accuracy of the training result of the fault prediction model completely, how to improve the accuracy of the fault prediction model is a problem to be solved in the field of hard disk fault detection.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data tag correction method, apparatus, device, storage medium, and program product that can improve the accuracy of a failure prediction model.

In a first aspect, the present application provides a method for correcting a data tag. The method comprises the following steps:

if the fault prediction model detects that a fault hard disk exists, determining a first characteristic distance between two stored data contained in each data pair in the fault hard disk and a second characteristic distance between each stored data in each data pair and reference data of each data pair; any two storage data in the fault hard disk form a data pair; the reference data of each data pair is any other stored data except the data pair in the fault hard disk;

determining a tag correction strategy of each stored data according to the first characteristic distance and the second characteristic distance of each data pair and the health degree tag of each stored data;

correcting the health degree label of each stored data according to the label correction strategy of each stored data; and each stored data after the health degree label is corrected is used for retraining the fault prediction model.

In one embodiment, determining a first characteristic distance between two stored data included in each data pair in a failed hard disk, and a second characteristic distance between each stored data in each data pair and reference data of each data pair, includes:

mapping each storage data in the fault hard disk to the same space to obtain the spatial characteristics of each storage data in the same space;

determining a first characteristic distance of each data pair according to the spatial characteristics of the two stored data contained in each data pair;

and determining a second characteristic distance between each stored data in each data pair and the reference data of each data pair according to the spatial characteristics of each stored data in each data pair and the spatial characteristics of the reference data of each data pair.

In one embodiment, determining a tag correction policy for each stored data based on the first feature distance and the second feature distance for each data pair and the health tag for each stored data includes:

determining an adjacent relation between two stored data contained in each data pair according to the first characteristic distance and the second characteristic distance of each data pair and the consistency of the health degree labels of the two stored data contained in each data pair;

And determining a tag correction strategy of each storage data according to the adjacency relation between two storage data contained in each data pair and the health degree tag of each storage data.

In one embodiment, the second feature distance includes a first sub-distance and a second sub-distance; determining an adjacency relationship between two stored data contained in each data pair according to the first characteristic distance and the second characteristic distance of each data pair and consistency of health labels of the two stored data contained in each data pair, wherein the adjacency relationship comprises the following steps:

for each data pair, if the sum of the first sub-distance and the second sub-distance of the data pair is greater than the first characteristic distance and the health labels of the two stored data contained in the data pair are consistent, determining that the adjacency relationship between the two stored data contained in the data pair is adjacency.

In one embodiment, determining a tag correction policy for each stored data based on an adjacency relationship between two stored data included in each data pair and a health tag for each stored data includes:

determining adjacent data corresponding to each storage data according to the adjacent relation between two storage data contained in each data pair;

And determining a tag correction strategy of each stored data according to the health degree tag of each stored data and the health degree tag of the adjacent data corresponding to each stored data.

In one embodiment, determining a tag correction policy for each stored data according to the health tag for each stored data and the health tag for each adjacent data corresponding to each stored data includes:

determining the label accuracy category of each stored data according to the health degree label of each stored data and the health degree label of the adjacent data corresponding to each stored data; wherein, the label accuracy category includes: correct, incorrect and pending;

and determining a tag correction strategy for each stored data according to the tag accuracy class of each stored data.

In one embodiment, determining the tag accuracy class of each stored data according to the health degree tag of each stored data and the health degree tag of the adjacent data corresponding to each stored data includes:

determining the adjacent duty ratio of the same label of each stored data according to the health degree label of each stored data and the health degree label of the adjacent data corresponding to each stored data;

And determining the label accuracy class of each stored data according to the adjacent duty ratio of the same label of each stored data.

In one embodiment, determining a tag correction policy for each stored data based on a tag accuracy class for each stored data includes:

judging whether the duty ratio of the stored data with the tag accuracy category being the error is smaller than a preset duty ratio threshold according to the tag accuracy category of each stored data;

if yes, determining a label correction strategy for each stored data according to the label accuracy category of each stored data.

In a second aspect, the present application further provides a correction device for a data tag. The device comprises:

the distance determining module is used for determining a first characteristic distance between two stored data contained in each data pair in the fault hard disk and a second characteristic distance between each stored data in each data pair and reference data of each data pair if the fault prediction model detects that the fault hard disk exists; any two storage data in the fault hard disk form a data pair; the reference data of each data pair is any other stored data except the data pair in the fault hard disk;

The strategy determining module is used for determining a label correction strategy of each stored data according to the first characteristic distance and the second characteristic distance of each data pair and the health degree label of each stored data;

the tag correction module is used for correcting the health degree tag of each stored data according to the tag correction strategy of each stored data; and each stored data after the health degree label is corrected is used for retraining the fault prediction model.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, performs the steps of:

The correction method, the correction device, the correction equipment, the storage medium and the correction program product of the data tag acquire all storage data in the fault hard disk for the fault hard disk output by the fault prediction model, and form a plurality of data pairs by all the storage data, wherein each data pair corresponds to one reference data. Then, a first characteristic distance between two stored data contained in each data pair and a second characteristic distance between each stored data in each data pair and corresponding reference data are determined. In the process of determining the tag correction strategy of each storage data, the relationship between the first characteristic distance and the second characteristic distance is not only used, but also the relationship is determined by combining the health degree tag of each storage data, so that the determined tag correction strategy of each storage data is more matched with the storage data, and a guarantee is provided for correcting the health degree tag of the storage data based on the tag correction strategy. Further, the correction result after the correction of the health degree label of each storage data based on the label correction policy of each storage data is correspondingly more accurate. In addition, each stored data after correction of the health label will be used to retrain the failure prediction model. According to the method and the device, when the fault prediction model detects the fault of the hard disk, the health degree label of the storage data of the predicted fault hard disk is corrected in real time, and based on the correct health degree label of the storage data, the fault prediction model is retrained in real time, so that the precision of the fault prediction model is continuously updated when the fault prediction model is used, and the accuracy of the fault prediction model in predicting the fault hard disk is improved.

Drawings

Fig. 1 is an application environment diagram of a data tag correction method provided in this embodiment;

fig. 2 is a flowchart of a first method for correcting a data tag according to the present embodiment;

FIG. 3 is a schematic flow chart of a tag correction strategy for determining each stored data according to the present embodiment;

fig. 4 is a flowchart of a second method for correcting a data tag according to the present embodiment;

fig. 5 is a block diagram of a first data tag correction apparatus according to the present embodiment;

fig. 6 is a block diagram of a second data tag correction device according to the present embodiment;

fig. 7 is a block diagram of a third data tag correction apparatus according to the present embodiment;

fig. 8 is an internal structure diagram of a computer device according to the present embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The data tag correction method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in FIG. 1. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing relevant data for carrying out correction of the data tag. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of modifying a data tag.

In one embodiment, as shown in fig. 2, a method for correcting a data tag is provided, and the method is applied to the computer in fig. 1 for illustration, and includes the following steps:

s201, if the fault prediction model detects that a fault hard disk exists, determining a first characteristic distance between two storage data contained in each data pair in the fault hard disk and a second characteristic distance between each storage data in each data pair and reference data of each data pair.

Any two storage data in the fault hard disk form a data pair; the reference data of each data pair is any other stored data except the data pair in the fault hard disk; the failure prediction model is a pre-trained model for failure detection of stored data in the hard disk. The first characteristic distance may characterize a distance between two stored data comprised by the pair of data. The second characteristic distance may characterize a distance between each stored data in the respective data pair and the reference data corresponding to the data pair. It can be understood that, since the failed hard disk in this embodiment is the failed hard disk output by the failure prediction model, the failed hard disk in this embodiment must include the stored data whose health label is abnormal. And the health degree label of the data stored in the fault hard disk is predicted and marked by the fault prediction model.

Specifically, in this embodiment, for a failed hard disk output by a failure prediction model, all storage data in the failed hard disk are acquired first, then each data pair is determined from all storage data, reference data corresponding to each data pair is determined, and based on the reference data corresponding to each data pair, a first feature distance between two storage data included in each data pair and a second feature distance between each storage data in each data pair and the reference data of each data pair are determined.

Alternatively, in this embodiment, the process of determining each data pair from all the storage data in the failed hard disk may be that any two storage data are randomly selected as a group of data pairs each time in all the storage data, until any two storage data in the failed hard disk form a data pair. And, for each group of data pairs, one of the other stored data except the data pair in the failed hard disk is arbitrarily selected as the corresponding reference data of the data pair.

Optionally, one implementation manner of determining a first feature distance between two storage data included in each data pair in the failed hard disk and a second feature distance between each storage data in each data pair and reference data in each data pair may be: and carrying out quantization processing on each stored data according to a preset quantization rule to obtain word vectors of each stored data, and further, calculating a first Euclidean distance between two word vectors according to the word vectors of the two stored data contained in each data pair, wherein the first Euclidean distance is used as a first characteristic distance. And calculating a second Euclidean distance between the first stored data and the reference data according to the word vector of the reference data of each data pair and the word vectors (such as the first stored data and the second stored data) of the two stored data contained in the data pair, calculating a third Euclidean distance between the second stored data and the reference data, and taking the second area distance and the third Euclidean distance as second characteristic distances.

In addition, another implementation manner of determining the first feature distance and the second feature distance may be that mapping each storage data in the failed hard disk to the same space to obtain a spatial feature of each storage data in the same space; determining a first characteristic distance of each data pair according to the spatial characteristics of the two stored data contained in each data pair; and determining a second characteristic distance between each stored data in each data pair and the reference data of each data pair according to the spatial characteristics of each stored data in each data pair and the spatial characteristics of the reference data of each data pair. Specifically, the mapping of each storage data in the failed hard disk to the unified real space may be performed, or the mapping of each storage data in the failed hard disk to the matrix space may be performed, which is not limited. If each storage data in the fault hard disk is mapped into a unified real space, the spatial characteristics of the corresponding storage data under the space are coordinate points of the storage data under the real space; if each storage data in the fault hard disk is mapped into the matrix space, the space characteristic of each corresponding storage data under the space is the vector of the storage data under the matrix space.

For each data pair, the spatial characteristics of each stored data (such as the first stored data and the second stored data) in the data pair and the spatial characteristics of the reference data of the data pair are obtained in the above manner, and then the distance between the first stored data and the second stored data can be calculated according to the spatial characteristics of the first stored data and the spatial characteristics of the second stored data, and the distance between the first stored data and the second stored data is taken as the first characteristic distance of the data pair. Further, according to the spatial features of the first storage data, the spatial features of the second storage data and the spatial features of the reference data, a first distance between the first storage data and the reference data is calculated, a second distance between the second storage data and the reference data is calculated, and the first distance and the second distance are used as the second feature distance of the data pair. In the above embodiment, before determining the first feature distance and the second feature distance, each storage data in the failed hard disk is mapped to the same space, and the first feature distance and the second feature distance are determined based on the spatial feature of each storage data in the same space, so that the determined first feature distance and second feature distance can more accurately represent the adjacency relationship between the storage data, and a basis is provided for determining the tag correction policy of each storage data according to the first feature distance and the second feature distance of each data pair and the health tag of each storage data.

S202, determining a tag correction strategy of each stored data according to the first characteristic distance and the second characteristic distance of each data pair and the health degree tag of each stored data.

The health degree label of each storage data is a fault prediction result marked for the storage data after the fault prediction model predicts the fault of the storage data. For example, if the failure prediction result is healthy, the health degree label of the stored data is healthy, and if the failure prediction result is failed, the health degree label of the stored data is failed.

Optionally, in this embodiment, the first feature distance and the second feature distance of each data pair may be compared in size, and according to the result of the comparison in size, the tag correction policy of each stored data may be determined in combination with the health tag of each stored data. For each data pair, if the first feature distance of the data pair is smaller than the second feature distance (it should be noted that, if the second feature distance is two values, the first feature distance is required to be smaller than the sum of the two second feature distances at this time), the two stored data included in the data pair are connected, the two stored data in all the data pairs meeting the above conditions are connected, an undirected graph corresponding to the stored data is obtained, and according to each stored data in the undirected graph and the health degree label corresponding to the stored data having a connection relationship with each stored data, whether the health degree label of the stored data is correct is determined, and then a label correction strategy of the stored data is determined. For example, if the health label of the stored data is correct, the label correction policy of the stored data is to reserve the health label of the stored data, and if the health label of the stored data is incorrect, the label correction policy of the stored data is to reverse the health label of the stored data.

In one embodiment, the method for determining the tag correction policy of each stored data according to the first feature distance and the second feature distance of each data pair and the health degree tag of each stored data may be to determine the adjacency relationship between two stored data included in each data pair according to the first feature distance and the second feature distance of each data pair and the consistency of the health degree tags of two stored data included in each data pair; and determining a tag correction strategy of each storage data according to the adjacency relation between two storage data contained in each data pair and the health degree tag of each storage data. The second feature distance includes a first sub-distance and a second sub-distance, and it is understood that the first sub-distance and the second sub-distance respectively represent a distance between one of the stored data and the corresponding reference data in the data pair. In this embodiment, for each data pair, if the sum of the first sub-distance and the second sub-distance of the data pair is greater than the first feature distance and the health labels of the two stored data included in the data pair are consistent, it may be determined that the adjacency relationship between the two stored data included in the data pair is adjacency. For example, whether two storage data included in each data pair may have an adjacency relationship may be determined according to the first feature distance and the second feature distance of each data pair, if yes, whether health labels of the two storage data included in the data pair are consistent is determined, and if yes, whether the two storage data included in the data pair have an adjacency relationship is determined; otherwise the pair of data does not have an adjacency relationship with the two stored data contained. And comprehensively determining whether the two stored data contained in the data pair have an adjacency relationship according to the relationship between the first characteristic distance and the second characteristic distance and whether the health labels of the two stored data contained in the data pair are consistent, so that the adjacency relationship determination process is more rigorous. Alternatively, the manner of determining whether two stored data included in each data pair may have an adjacency relationship may be that if the first feature distance of the data pair is smaller than the second feature distance (it should be noted that if the second feature distance is two values, then the first feature distance needs to be smaller than the sum of the two second feature distances), it is noted that the two stored data in the data pair have an adjacency relationship. When the two pieces of storage data contained in the data pair have an adjacent relation, determining a tag correction policy of each piece of storage data according to the adjacent relation between the two pieces of storage data contained in each data pair and combining the health degree tags of each piece of storage data, for example, judging whether the consistency of the health degree tags of each piece of storage data and adjacent data meets a preset rule, if so, determining that the health degree tags of the pieces of storage data are correct, and further determining the tag correction policy of the pieces of storage data; if not, determining the health degree label error of the stored data, and further determining the label correction strategy of the stored data. For example, if the health label of the stored data is correct, the label correction policy of the stored data is to reserve the health label of the stored data, and if the health label of the stored data is incorrect, the label correction policy of the stored data is to reverse the health label of the stored data. In the above embodiment, the tag correction policy of each stored data is determined according to the adjacency relationship between two stored data included in each data pair and the health degree tag of each stored data, so that the tag correction policy for each stored data is more accurate, and a guarantee is provided for the subsequent correction of the health degree tag of the stored data.

S203, correcting the health degree label of each storage data according to the label correction strategy of each storage data; and each stored data after the health degree label is corrected is used for retraining the fault prediction model.

Specifically, in this embodiment, the health degree label of each storage data is corrected according to the label correction policy corresponding to each storage data. For example, if the label correction policy is to preserve the health label of the stored data, the stored data may be directly used to retrain the failure prediction model without processing the health label of the stored data; if the label correction strategy is to reversely process the health label of the stored data, correcting the health label of the stored data as a fault according to the condition that the health label of the stored data is healthy; for the case where the health degree label of the stored data is healthy, it is corrected to be healthy.

After the health degree label of each storage data is corrected based on the label correction strategy of each storage data, the failure prediction model is retrained by using each storage data after the health degree label correction. It can be understood that the health degree label corresponding to each piece of storage data after the correction of the health degree label is accurate, and the output precision of the fault prediction model, that is, the accuracy of fault prediction of the fault prediction model can be improved by training the fault prediction model based on the storage data with the accurate health degree label.

In the data tag correction method, for the fault hard disk output by the fault prediction model, all storage data in the fault hard disk are acquired, all the storage data are formed into a plurality of data pairs, and each data pair corresponds to one reference data. Then, a first characteristic distance between two stored data contained in each data pair and a second characteristic distance between each stored data in each data pair and corresponding reference data are determined. In the process of determining the tag correction strategy of each storage data, the relationship between the first characteristic distance and the second characteristic distance is not only used, but also the relationship is determined by combining the health degree tag of each storage data, so that the determined tag correction strategy of each storage data is more matched with the storage data, and a guarantee is provided for correcting the health degree tag of the storage data based on the tag correction strategy. Further, the correction result after the correction of the health degree label of each stored data based on the label correction policy of each stored data is also correspondingly more accurate. In addition, each stored data after correction of the health label will be used to retrain the failure prediction model. According to the method and the device, when the fault prediction model detects the fault of the hard disk, the health degree label of the storage data of the predicted fault hard disk is corrected in real time, and based on the correct health degree label of the storage data, the fault prediction model is retrained in real time, so that the precision of the fault prediction model is continuously updated when the fault prediction model is used, and the accuracy of the fault prediction model in predicting the fault hard disk is improved.

Further, in determining the correction policy of the health label of each storage data, in order to make the process of determining the label correction policy more strict, in one embodiment, as shown in fig. 3, determining the label correction policy of each storage data according to the adjacency relationship between two storage data included in each data pair and the health label of each storage data includes:

s301, determining adjacent data corresponding to each storage data according to the adjacent relation between two storage data contained in each data pair.

If two pieces of storage data contained in one data pair have an adjacent relation, the two pieces of storage data contained in the data pair are adjacent data.

Specifically, in this embodiment, for each data pair, if there is an adjacency relationship between two storage data included in the data pair, the two storage data included in the data pair are taken as a set of adjacency data, and by using the above manner, it can be determined whether the two storage data in each data pair are adjacency data.

S302, determining a tag correction strategy of each stored data according to the health degree tag of each stored data and the health degree tag of the adjacent data corresponding to each stored data.

Optionally, the method for determining the tag correction policy of each stored data according to the health degree tag of each stored data and the health degree tag of the adjacent data corresponding to each stored data may be to determine the tag accuracy class of each stored data according to the health degree tag of each stored data and the health degree tag of the adjacent data corresponding to each stored data; wherein, the label accuracy category includes: correct, incorrect and pending; and determining a tag correction strategy for each stored data according to the tag accuracy class of each stored data. For example, the health degree label of each storage data may be obtained first, and the adjacent duty ratio of each storage data with the label is determined according to the health degree label of each storage data and the health degree label of the adjacent data corresponding to each storage data; and determining the label accuracy class of each stored data according to the adjacent duty ratio of the same label of each stored data. The on-tag adjacent ratio of the stored data may be the same as the health tag of the stored data among all adjacent data of the stored data.

Specifically, in this embodiment, for each piece of stored data, the adjacent duty ratio of the same tag of the stored data is determined first, and the adjacent duty ratio threshold is compared with the adjacent duty ratio of the same tag in combination with a predetermined adjacent duty ratio range, and if the adjacent duty ratio of the same tag of the stored data is greater than the maximum value of the adjacent duty ratio range, the tag accuracy classification result of the stored data is determined to be correct; if the adjacent duty ratio of the same label of the stored data is smaller than the minimum value of the adjacent duty ratio range, determining that the label accuracy classification result of the stored data is an error; if the adjacent duty ratio of the same label of the stored data is within the adjacent duty ratio range, determining the label accuracy classification result of the stored data as pending. Illustratively, taking an adjacency duty cycle threshold of 30% -70% as an example, if the adjacency duty cycle with the tag is greater than 70%, the tag accuracy class of the stored data is correct; if the adjacent ratio of the two labels is smaller than 30, the label accuracy class of the stored data is error; if the adjacency to tag is between 30% and 70%, then the tag accuracy class of the stored data is pending. For example, if the health label of a certain stored data is healthy and the same duty ratio of the health label of adjacent data corresponding to the stored data is 90%, the accuracy classification result of the health label of the stored data is determined to be correct. The label accuracy class of each stored data is determined according to the same label adjacent ratio of each stored data, so that the process of determining the label accuracy class of each stored data is simpler. In the above embodiment, the label accuracy categories of the stored data are classified into three categories, namely, the correct category and the incorrect category, and the corresponding correction processing is performed on the stored data of the correct category, the incorrect category and the pending category, so that the correction processing on the stored data is more strict, and the residual data after the correction processing is more accurate.

In addition, after determining the tag accuracy of each storage data, before determining the tag correction policy of each storage data, in order to ensure the accuracy of the corrected storage data, in one embodiment, according to the tag accuracy class of each storage data, it is further determined whether the duty ratio of the storage data whose tag accuracy class is an error is less than a preset duty ratio threshold; if yes, determining a label correction strategy for each stored data according to the label accuracy category of each stored data. In this embodiment, for the case that the stored data with the tag accuracy class being erroneous is greater than the predicted duty threshold, the erroneous data in the failed hard disk is considered to be excessive, and in order to ensure the tag accuracy of the corrected stored data, the deletion operation is performed on all the data in the failed hard disk. That is, the failure prediction model is no longer trained subsequently based on the stored data in the failed hard disk. And if the label accuracy class in the fault hard disk is that the duty ratio of the wrong storage data is smaller than the predicted duty ratio threshold value, executing the operation of determining the label correction strategy of each storage data according to the health label of each storage data and the health label of the adjacent data corresponding to each storage data. The method provides a guarantee for improving the label accuracy of the corrected stored data, and further can improve the accuracy of the fault prediction model.

Further, after determining the tag accuracy class of each stored data, the tag correction policy determined for each class of stored data corrects the health tag of each stored data. For example, in this embodiment, the tag correction process may not be performed for the stored data whose tag accuracy class is correct; for the stored data with the wrong label accuracy category, the label of the stored data is corrected, for example, if the health label of the stored data is a fault, the stored data is corrected to be healthy, and if the health label of the stored data is healthy, the stored data is corrected to be faulty; aiming at the data with the label accuracy category to be determined, in the embodiment, the data can be deleted.

For the convenience of understanding of those skilled in the art, the above method for correcting the data tag will be described in detail, and as shown in fig. 4, the method may include:

s401, if the fault prediction model detects that a fault hard disk exists, mapping all storage data in the fault hard disk to the same space, and obtaining the space characteristics of all storage data in the same space.

S402, determining a first characteristic distance of each data pair according to the spatial characteristics of two stored data contained in each data pair.

Any two storage data in the fault hard disk form a data pair.

S403, determining a second characteristic distance between each storage data in each data pair and the reference data of each data pair according to the spatial characteristics of each storage data in each data pair and the spatial characteristics of the reference data of each data pair.

Wherein the second feature distance comprises a first sub-distance and a second sub-distance; the reference data of each data pair is any other stored data except the data pair in the fault hard disk.

S404, for each data pair, if the sum of the first sub distance and the second sub distance of the data pair is greater than the first characteristic distance and the health degree labels of the two stored data contained in the data pair are consistent, determining that the adjacent relation between the two stored data contained in the data pair is adjacent.

S405, according to the adjacent relation between the two stored data contained in each data pair, determining adjacent data corresponding to each stored data.

S406, determining the adjacent duty ratio of each storage data with the label according to the health label of each storage data and the health label of the adjacent data corresponding to each storage data.

S407, determining the label accuracy category of each stored data according to the adjacent duty ratio of the same label of each stored data.

Wherein, the label accuracy category includes: correct, incorrect, and pending.

S408, judging whether the duty ratio of the stored data with the tag accuracy category being the error is smaller than a preset duty ratio threshold value according to the tag accuracy category of each stored data; if yes, executing S409; if not, S411 is performed.

S409, determining a tag correction strategy for each stored data according to the tag accuracy class of each stored data.

S410, correcting the health degree label of each storage data according to the label correction strategy of each storage data.

And each stored data after the health degree label is corrected is used for retraining the fault prediction model.

S411, deleting all stored data in the failed hard disk.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a data tag correction device for realizing the above-mentioned data tag correction method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the correction device for one or more data labels provided below may be referred to the limitation of the correction method for a data label hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 5, there is provided a correction device 1 for a data tag, including: a distance determination module 10, a policy determination module 11 and a tag correction module 12, wherein:

the distance determining module 10 is configured to determine a first feature distance between two stored data included in each data pair in the failed hard disk and a second feature distance between each stored data in each data pair and reference data of each data pair if the failure prediction model detects that the failed hard disk exists.

Any two storage data in the fault hard disk form a data pair; the reference data of each data pair is any other stored data except the data pair in the fault hard disk.

The policy determining module 11 is configured to determine a tag correction policy of each stored data according to the first feature distance and the second feature distance of each data pair, and the health tag of each stored data.

The tag correction module 12 is configured to correct the health tag of each stored data according to a tag correction policy of each stored data; and each stored data after the health degree label is corrected is used for retraining the fault prediction model.

In one embodiment, as shown in fig. 6, the distance determining module 10 includes a spatial mapping unit 100, a first determining unit 101, and a second determining unit 102. Wherein:

the space mapping unit 100 is configured to map each storage data in the failed hard disk to the same space, so as to obtain a spatial feature of each storage data in the same space.

A first determining unit 101, configured to determine a first feature distance of each data pair according to spatial features of two stored data included in each data pair.

A second determining unit 102, configured to determine a second feature distance between each stored data in each data pair and the reference data of each data pair according to the spatial feature of each stored data in each data pair and the spatial feature of the reference data of each data pair.

In one embodiment, as shown in fig. 7, the policy determination module 11 includes a third determination unit 110 and a fourth determination unit 111. Wherein:

the third determining unit 110 is configured to determine an adjacency relationship between two stored data included in each data pair according to the first feature distance and the second feature distance of each data pair and consistency of the health labels of the two stored data included in each data pair.

A fourth determining unit 111, configured to determine a tag correction policy of each stored data according to an adjacency relationship between two stored data included in each data pair and a health tag of each stored data.

In one embodiment, the second feature distance includes a first sub-distance and a second sub-distance, and the third determining unit 110 is specifically configured to determine, for each data pair, that an adjacency relationship between two stored data included in the data pair is adjacency if a sum of the first sub-distance and the second sub-distance of the data pair is greater than the first feature distance and health labels of the two stored data included in the data pair are identical.

In one embodiment, the fourth determining unit 111 comprises a first determining subunit and a second determining subunit, wherein:

And the first determining subunit is used for determining the adjacent data corresponding to each storage data according to the adjacent relation between the two storage data contained in each data pair.

And the second determining subunit is used for determining the label correction strategy of each storage data according to the health degree label of each storage data and the health degree label of the adjacent data corresponding to each storage data.

In one embodiment, the second determining subunit is specifically configured to determine the tag accuracy class of each stored data according to the health degree tag of each stored data and the health degree tag of the adjacent data corresponding to each stored data.

Wherein, the label accuracy category includes: correct, incorrect and pending; and determining a tag correction strategy for each stored data according to the tag accuracy class of each stored data.

In one embodiment, the second determining subunit is further configured to determine, according to the health label of each stored data and the health label of the adjacent data corresponding to each stored data, a co-label adjacent duty ratio of each stored data; and determining the label accuracy class of each stored data according to the adjacent duty ratio of the same label of each stored data.

In one embodiment, the second determining subunit is further configured to determine, according to the tag accuracy class of each stored data, whether the duty ratio of the stored data whose tag accuracy class is erroneous is less than a preset duty ratio threshold; if yes, determining a label correction strategy for each stored data according to the label accuracy category of each stored data.

The above-mentioned various modules in the data tag correction device may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of modifying a data tag. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of: determining the adjacent duty ratio of the same label of each stored data according to the health degree label of each stored data and the health degree label of the adjacent data corresponding to each stored data;

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

It should be noted that, information (including but not limited to, failure hard disk information, tag correction policy, etc.) and data (including but not limited to, stored data, etc.) related to the present application are information and data authorized by a user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method for modifying a data tag, the method comprising:

if the fault prediction model detects that a fault hard disk exists, determining a first characteristic distance between two stored data contained in each data pair in the fault hard disk and a second characteristic distance between each stored data in each data pair and reference data of each data pair; any two storage data in the fault hard disk form a data pair; the reference data of each data pair is any one of the other stored data except the data pair in the fault hard disk;

2. The method of claim 1, wherein determining a first characteristic distance between two stored data included in each data pair in the failed hard disk and a second characteristic distance between each stored data in each data pair and reference data for each data pair comprises:

mapping each storage data in the fault hard disk to the same space to obtain the space characteristics of each storage data in the same space;

3. The method according to claim 1 or 2, wherein determining the tag correction policy for each stored data based on the first feature distance and the second feature distance for each data pair and the health tag for each stored data comprises:

and determining a tag correction strategy of each storage data according to the adjacent relation between the two storage data contained in each data pair and the health degree tag of each storage data.

4. A method according to claim 3, wherein the second feature distance comprises a first sub-distance and a second sub-distance; determining the adjacency relation between the two stored data contained in each data pair according to the first characteristic distance and the second characteristic distance of each data pair and the consistency of the health degree labels of the two stored data contained in each data pair, wherein the adjacency relation comprises the following steps:

for each data pair, if the sum of the first sub-distance and the second sub-distance of the data pair is larger than the first characteristic distance and the health degree labels of two stored data contained in the data pair are consistent, determining that the adjacent relation between the two stored data contained in the data pair is adjacent.

5. A method according to claim 3, wherein said determining a tag correction policy for each stored data based on an adjacency relationship between two stored data contained in each data pair and a health tag for each stored data comprises:

determining adjacent data corresponding to each storage data according to the adjacent relation between the two storage data contained in each data pair;

6. The method of claim 5, wherein determining the tag modification policy for each stored data based on the health tag for each stored data and the health tag for each adjacent data corresponding to each stored data comprises:

determining the label accuracy category of each stored data according to the health degree label of each stored data and the health degree label of the adjacent data corresponding to each stored data; wherein the tag accuracy category comprises: correct, incorrect and pending;

7. The method of claim 6, wherein determining the tag accuracy class for each stored data based on the health tag for each stored data and the health tag for each adjacent data corresponding to each stored data comprises:

8. The method of claim 6, wherein determining a tag correction policy for each stored data based on the tag accuracy class of each stored data comprises:

9. A data tag correction device, the device comprising:

the distance determining module is used for determining a first characteristic distance between two stored data contained in each data pair in the fault hard disk and a second characteristic distance between each stored data in each data pair and reference data of each data pair if the fault prediction model detects that the fault hard disk exists; any two storage data in the fault hard disk form a data pair; the reference data of each data pair is any one of the other stored data except the data pair in the fault hard disk;

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.

12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 8.