CN103646114A - Method and device for extracting feature data from SMART data of hard disk - Google Patents

Method and device for extracting feature data from SMART data of hard disk Download PDF

Info

Publication number
CN103646114A
CN103646114A CN201310733574.5A CN201310733574A CN103646114A CN 103646114 A CN103646114 A CN 103646114A CN 201310733574 A CN201310733574 A CN 201310733574A CN 103646114 A CN103646114 A CN 103646114A
Authority
CN
China
Prior art keywords
data
attribute
smart
hard disk
smart data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310733574.5A
Other languages
Chinese (zh)
Other versions
CN103646114B (en
Inventor
胡光
胡殿明
杨文君
魏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310733574.5A priority Critical patent/CN103646114B/en
Publication of CN103646114A publication Critical patent/CN103646114A/en
Application granted granted Critical
Publication of CN103646114B publication Critical patent/CN103646114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Abstract

The invention discloses a method and a device for extracting feature data from SMART self-monitoring analysis and reporting technology data of a hard disk. The method comprises the following steps: obtaining an SMART data set from a sample hard disk, wherein the SMART data set comprises Q SMART data and Q hard disk type information corresponding to the Q SAMART data respectively; normalizing the Q SMART data to generate Q normalized SMART data; correcting the Q normalized SMART data according to the Q hard disk type information respectively to generate a corrected SMART data set; generating the feature data of the hard disk according to the corrected SMART data set. According to the method of the embodiment disclosed by the invention, different hard disks can be subjected to fault prewarning test and analysis through a same fault prewarning model, so that the accuracy of the fault prewarning model is improved and the model training, test and analysis costs can be reduced.

Description

Characteristic extracting method and device in hard disk SMART data
Technical field
The present invention relates to technical field of memory, characteristic extracting method and device in particularly a kind of hard disk SMART self-monitoring, analysis and reporting techniques data.
Background technology
Because hard disk failure can be from hard disk SMART(Self Monitoring Analysis And Reporting Technology, self-monitoring, analysis and reporting techniques) reflect in data, therefore in hard disk failure early warning analysis, can within following a period of time, whether can break down according to the SMART data analysis hard disk of hard disk.At present, can according to certain attribute in SMART data, train fault pre-alarming model by machine learning algorithm, thus whether can steady operation within following a period of time with prediction hard disk to the SMART data analysis of hard disk according to this fault pre-alarming model.
But, because the eigenwert of different attribute in SMART data represents mode disunity, and too discrete, be difficult to the joint effect of the some different attributes of prediction to hard disk.And in training pattern, there is the situation of eigenwert disappearance in some attribute, increased the difficulty of analyzing SMART data, make model prediction inaccurate.In addition, the eigenwert account form disunity of the hard disc data of different vendor, be unfavorable for that unified numerical characteristics represents, therefore need to train respectively fault pre-alarming model to carry out fault pre-alarming analysis to the SMART data of the hard disk of each manufacturer, this just need to repeatedly carry out model training, makes thus analysis cost greatly be increased.
Summary of the invention
The present invention is intended to solve the problems of the technologies described above at least to a certain extent.
For this reason, first object of the present invention is to propose characteristic extracting method in a kind of hard disk SMART data, the method is without a plurality of fault pre-alarming models, only by same fault pre-alarming model, can realize the fault pre-alarming test of different hard disks and analyze, the accuracy that has improved fault pre-alarming model, has reduced model training, test and analysis cost.
Second object of the present invention is to propose characteristic extraction element in a kind of hard disk SMART data.
For reaching above-mentioned purpose, first aspect present invention embodiment has proposed characteristic extracting method in a kind of hard disk SMART data, comprise the following steps: the SMART data acquisition that obtains sample hard disk, wherein, described SMART data acquisition comprises Q SMART data and distinguishes Q corresponding hard disk type information with described Q SMART data; Described Q SMART data are normalized, to generate Q normalization SMART data; According to described Q hard disk type information, respectively described Q normalization SMART data are revised, to generate, revised SMART data acquisition; According to described correction SMART data acquisition, generate hard disk characteristic.
Characteristic extracting method in the hard disk SMART data of the embodiment of the present invention, by the SMART data to sample hard disk, be normalized, and according to type of hardware information, normalized hard disk SMART data are revised, thus, make hard disk SMART data there is identical codomain, and by normalized hard disk SMART data are revised to subregion, thereby can realize the fault pre-alarming test of different hard disks and analyze by same fault pre-alarming model, the accuracy that has improved fault pre-alarming model, has reduced model training, test and analysis cost.
For reaching above-mentioned purpose, second aspect present invention embodiment provides characteristic extraction element in a kind of hard disk SMART data, comprise: the first acquisition module, for obtaining the SMART data acquisition of sample hard disk, wherein, described SMART data acquisition comprises Q SMART data and distinguishes Q corresponding hard disk type information with described Q SMART data; The first generation module, for described Q SMART data are normalized, to generate Q normalization SMART data; Correcting module, for respectively described Q normalization SMART data being revised according to described Q hard disk type information, revises SMART data acquisition to generate; The second generation module, for generating hard disk characteristic according to described correction SMART data acquisition.
Characteristic extraction element in the hard disk SMART data of the embodiment of the present invention, by the SMART data to sample hard disk, be normalized, and according to type of hardware information, normalized hard disk SMART data are revised, thus, make hard disk SMART data there is identical codomain, and by normalized hard disk SMART data are revised to subregion, thereby can realize the fault pre-alarming test of different hard disks and analyze by same fault pre-alarming model, the accuracy that has improved fault pre-alarming model, has reduced model training, test and analysis cost.
Additional aspect of the present invention and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or additional aspect of the present invention and advantage accompanying drawing below combination obviously and is easily understood becoming the description of embodiment, wherein:
Fig. 1 is the process flow diagram of characteristic extracting method in the hard disk SMART data of one embodiment of the invention;
Fig. 2 is the process flow diagram of characteristic extracting method in the hard disk SMART data of another embodiment of the present invention;
Fig. 3 is the schematic diagram of the normalized Analysis result of gradient data in the hard disk SMART data of a specific embodiment of the present invention;
Fig. 4 is the schematic diagram of the normalized Analysis result of attribute data in the hard disk SMART data of a specific embodiment of the present invention;
Fig. 5 is the structural representation of characteristic extraction element in the hard disk SMART data of one embodiment of the invention;
Fig. 6 is the structural representation of characteristic extraction element in the hard disk SMART data of another embodiment of the present invention; And
Fig. 7 is the structural representation of characteristic extraction element in the hard disk SMART data of another embodiment of the present invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of identical or similar functions from start to finish.Below by the embodiment being described with reference to the drawings, be exemplary, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
In description of the invention, it will be appreciated that, term " " center ", " longitudinally ", " laterally ", " on ", D score, " front ", " afterwards ", " left side ", " right side ", " vertically ", " level ", " top ", " end ", " interior ", orientation or the position relationship of indications such as " outward " are based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, rather than device or the element of indication or hint indication must have specific orientation, with specific orientation structure and operation, therefore can not be interpreted as limitation of the present invention.In addition, term " first ", " second " be only for describing object, and can not be interpreted as indication or hint relative importance.
In description of the invention, it should be noted that, unless otherwise clearly defined and limited, term " installation ", " being connected ", " connection " should be interpreted broadly, and for example, can be to be fixedly connected with, and can be also to removably connect, or connect integratedly; Can be mechanical connection, can be to be also electrically connected to; Can be to be directly connected, also can indirectly be connected by intermediary, can be the connection of two element internals.For the ordinary skill in the art, can concrete condition understand above-mentioned term concrete meaning in the present invention.
At present, can according to certain attribute in SMART data, train fault pre-alarming model by machine learning algorithm, thereby whether prediction hard disk can steady operation within following a period of time.Yet existing fault pre-alarming model cannot embody the joint effect of some SMART attributes to hard disk, and train in the middle of also the hard disk SMART data of several model cannot being added to a fault pre-alarming model.Therefore, hard disk failure early warning is inaccurate, and in hard disk prealarming process, needs the corresponding different faults Early-warning Model of the hard disk of different model to analyze, and analysis cost is higher.If the property value in the hard disk SMART data of different model can be processed, each property value is represented unified, the hardware data of some different models can be added in a fault pre-alarming model and train, can reduce thus fault pre-alarming model training number of times, reduce analysis cost.For this reason, the present invention proposes characteristic extracting method in a kind of hard disk SMART data.
Fig. 1 is the process flow diagram of characteristic extracting method in the hard disk SMART data of one embodiment of the invention.
As shown in Figure 1, in hard disk SMART data, characteristic extracting method comprises the following steps
S11, obtains the SMART data acquisition of sample hard disk.
In one embodiment of the invention, SMART data acquisition comprises Q SMART data and distinguishes Q corresponding hard disk type information with Q SMART data.Wherein, SMART data acquisition is the attribute datas such as hard disk tracking error rate, hard disk temperature relevant to hard disk recording in Q same kind and/or dissimilar hard disk SMART, and the data acquisition of corresponding hard disk type information with it.Wherein, provided by HD vendor relevant to hard disk such as hard disk model, hard disk ID(Identity is provided hard disk type information) etc. data message.For instance, when carrying out machine algorithm study, the SMART data acquisition of hard disk comprises that hard disk tracking error rate, the hard disk of a plurality of different hard disks power up the attribute datas such as number of times, hard disk temperature, and corresponding hard disk model, the hard disk ID(Identity of hard disk) etc. information.
S12, is normalized Q SMART data, to generate Q normalization SMART data.
In one embodiment of the invention, can each attribute data in Q dissimilar and/or dissimilar hard disk SMART data be normalized respectively, thereby each attribute data in SMART data with different codomains is normalized to the data in same codomain.Thus, can realize the unified of dissimilar hardware SMART data analyzed and processed.
S13, revises Q normalization SMART data respectively according to Q hard disk type information, to generate, revises SMART data acquisition.
In order to obtain respectively the test result of different hard disks in the test result of hard disk failure Early-warning Model, in one embodiment of the invention, can to different hard disks, set respectively corresponding data offset according to the type information of each hard disk, and according to the corresponding data offset of each hard disk, Q normalization SMART data be revised to realize the subregion to SMART data acquisition.
S14, generates hard disk characteristic according to revising SMART data acquisition.
Characteristic extracting method in the hard disk SMART data of the embodiment of the present invention, by the SMART data to sample hard disk, be normalized, and according to type of hardware information, normalized hard disk SMART data are revised, thus, make hard disk SMART data there is identical codomain, and by normalized hard disk SMART data are revised to subregion, thereby can realize the fault pre-alarming test of different hard disks and analyze by same fault pre-alarming model, the accuracy that has improved fault pre-alarming model, has reduced model training, test and analysis cost.
Better for the fault pre-alarming model performance that makes to train, before a plurality of SMART data are normalized, also can obtain by least square method the corresponding Grad of each attribute data in each SMART data.Particularly, Fig. 2 is the process flow diagram of characteristic extracting method in the hard disk SMART data of another embodiment of the present invention.
As shown in Figure 2, in hard disk SMART data, characteristic extracting method comprises the following steps.
S21, obtains the SMART data acquisition of sample hard disk.
In one embodiment of the invention, SMART data acquisition comprises Q SMART data and distinguishes Q corresponding hard disk type information with Q SMART data.Wherein, SMART data acquisition is the attribute datas such as hard disk tracking error rate, hard disk temperature relevant to hard disk recording in Q same kind and/or dissimilar hard disk SMART, and the data acquisition of corresponding hard disk type information with it.Wherein, provided by HD vendor relevant to hard disk such as hard disk model, hard disk ID(Identity is provided hard disk type information) etc. data message.For instance, when carrying out machine algorithm study, the SMART data acquisition of hard disk comprises that hard disk tracking error rate, the hard disk of a plurality of different hard disks power up the attribute datas such as number of times, hard disk temperature, and corresponding hard disk model, the hard disk ID(Identity of hard disk) etc. information.
S22, obtains respectively S S gradient data set corresponding to the first attribute data subclass in each SMART data.
In an embodiment of the present invention, SMART data acquisition also comprises and S attribute distinguished S corresponding attribute data set, and each SMART data comprises and S S the first attribute data subclass that attribute difference is corresponding.Wherein, the first attribute data subclass is the corresponding data acquisition of certain attribute in SMART.For example, the first attribute data subclass can be the corresponding data acquisition of hard disk tracking error rate attribute in SMART.
In an embodiment of the present invention, first, in the first attribute data subclass corresponding to s attribute from each SMART data, choose successively M attribute data, to generate P the second attribute data subclass, wherein, each second attribute data subclass comprises M attribute data, P=N-M+1, N is the sum of attribute data in the first attribute data subclass that attribute s is corresponding, s=1 ... S.
Then, calculate respectively P P gradient data corresponding to the second attribute data subclass, and generate gradient data set corresponding to attribute s according to P gradient data.Particularly, after obtaining P the second attribute data subset, can obtain respectively P P fitting coefficient corresponding to the second attribute data subclass by weighted least-squares method, particularly, can first obtain i fitting coefficient k in P fitting coefficient i=(Z-b*Y)/X, wherein, i=1 ... P, particularly, X, Y, Z and b can pass through following formula gained:
X = Σ j = i M + i - 1 w j * x j 2 ,
Y = Σ j = i M + i - 1 w j * x i ,
Z = Σ j = i M + i - 1 w j * x j * y j ,
b = ( Z * Y - Σ j = i M + i - 1 ( w j * y j ) * X ) / ( Y * Y - X * Σ j = i M + i - 1 w j ) ,
Wherein, w jfor j the default weight that attribute data is corresponding in the first attribute data subclass corresponding to attribute s, x jfor the detection time of j attribute data in the first attribute data subclass corresponding to attribute s, y jfor j attribute data in the first attribute data subclass corresponding to attribute s.
I fitting coefficient k in obtaining P fitting coefficient iafterwards, can obtain respectively i gradient data Grad in P gradient data by following formula i:
Grad i=k i*(M-1)*y M+i-1
Wherein, k i* (M-1) represents by the drop value between two attribute datas of weighted least-squares method matching straight line out, the symbol of drop value represents the trend of whole piece straight line, multiply each other with the y value of last attribute data again, can obtain final Grad, now the size of Grad both can represent overall variation trend, can also represent the intensity of variation tendency.
After obtaining P gradient data, can generate gradient data set corresponding to attribute s according to P gradient data.
Should be appreciated that by above-mentioned steps and can finally obtain S corresponding S gradient data set of the first attribute data subclass in each SMART data.
S23, using S gradient data set of each SMART data of obtaining as S, a first new attribute data subclass adds respectively each SMART data.
S24, is normalized Q SMART data, to generate Q normalization SMART data.
In an embodiment of the present invention, can to a plurality of SMART data, be normalized by following formula:
g(x)=sign(x)×log y|x|,
Wherein, x is an attribute data in a plurality of SMART data, and g (x) is the attribute data after the normalization that attribute data x is corresponding, and wherein, y can calculate by following formula:
y z≤Value<(y+Δy) z
Wherein, z is predetermined threshold value, and Value is the factory-default of the attribute that attribute data x is corresponding, and Δ y is default precision.
For instance, in (1900,2000) corresponding greatest gradient value, accounting for 70% when above of sum, can be Grad=1.078 by the Grad calculating, y=1.071, can obtain gradient normalized image as shown in Figure 3 and attribute data normalized image as shown in Figure 4.
S25, revises Q normalization SMART data respectively according to Q hard disk type information, to generate, revises SMART data acquisition.
In order to obtain respectively the test result of different hard disks in the test result of hard disk failure Early-warning Model, in an embodiment of the present invention, can be according to Q hard disk type acquisition of information and Q hard disk type information corresponding Q modified value respectively, and according to Q hard disk type information respectively corresponding Q modified value respectively corresponding normalization SMART data are revised.For instance, can to different hard disks, set respectively corresponding data offset according to the type information of each hard disk, and according to the corresponding data offset of each hard disk, Q normalization SMART data be revised to realize the subregion to SMART data acquisition.
S26, generates hard disk characteristic according to revising SMART data acquisition.
In an embodiment of the present invention, each correction SMART data acquisition comprises and S S the correction attribute data set that attribute difference is corresponding.Particularly, can obtain respectively S S the training characteristics data that attribute is corresponding, and respectively the training characteristics value in each training characteristics data be sorted to generate and S S the characteristic sequence (V that attribute is corresponding i).Wherein, can obtain by following default mapping ruler the eigenwert f (v) of each the property value v in the correction attribute data set that each attribute is corresponding:
f ( v ) = V i V i &le; v &le; [ ( V i + 1 - V i ) / 2 ] f ( v ) = V i + 1 V i + [ ( V i + 1 - V i ) / 2 ] + 1 < v < V i + 1
After obtaining the eigenwert of each property value, can generate hard disk characteristic according to the eigenwert of each property value in correction attribute data set corresponding to each attribute obtaining.Thus, the corresponding eigenwert of each training data that can make to revise in SMART data acquisition is all applied in fault pre-alarming model and is trained by mapping ruler, avoid the defect of eigenwert disappearance in training pattern process, improved the accuracy of failure prediction model.
Characteristic extracting method in the hard disk SMART data of the embodiment of the present invention, before a plurality of SMART data are normalized, by least square method, obtain gradient data set corresponding to attribute data in each SMART data, and upgrade the first corresponding attribute data set with gradient data set, thus, can make the variation tendency of SMRAT data highlight, then coordinate machine learning algorithm, can make the fault pre-alarming model performance that trains better.
In order to realize above-described embodiment, the present invention also proposes characteristic extraction element in a kind of hard disk SMART data.
Characteristic extraction element in a kind of hard disk SMART data, comprise: the first acquisition module, for obtaining the SMART data acquisition of sample hard disk, wherein, SMART data acquisition comprises Q SMART data and distinguishes Q corresponding hard disk type information with Q SMART data; The first generation module, for Q SMART data are normalized, to generate Q normalization SMART data; Correcting module, for respectively Q normalization SMART data being revised according to Q hard disk type information, revises SMART data acquisition to generate; The second generation module, for generating hard disk characteristic according to revising SMART data acquisition.
Fig. 5 is the structural representation of characteristic extraction element in the hard disk SMART data of one embodiment of the invention.
As shown in Figure 5, in hard disk SMART data, characteristic extraction element comprises: the first acquisition module 100, the first generation module 200, correcting module 300 and the second generation module 400.
Particularly, the first acquisition module 100 is for obtaining the SMART data acquisition of sample hard disk.Wherein, SMART data acquisition comprises Q SMART data and distinguishes Q corresponding hard disk type information with Q SMART data.In other words, SMART data acquisition is the attribute datas such as hard disk tracking error rate, hard disk temperature relevant to hard disk recording in Q same kind and/or dissimilar hard disk SMART, and the data acquisition of corresponding hard disk type information with it.Wherein, provided by HD vendor relevant to hard disk such as hard disk model, hard disk ID(Identity is provided hard disk type information) etc. data message.For instance, when carrying out machine algorithm study, hard disk tracking error rate, hard disk that the first acquisition module 100 can obtain a plurality of different hard disks in the SMART data acquisition of hard disk power up the attribute datas such as number of times, hard disk temperature, and corresponding hard disk model, the hard disk ID(Identity of hard disk) etc. data message.
The first generation module 200 is for Q SMART data are normalized, to generate Q normalization SMART data.Particularly, in one embodiment of the invention, the first generation module 200 cocoas are normalized respectively each attribute data in Q dissimilar and/or dissimilar hard disk SMART data, thereby each attribute data in SMART data with different codomains is normalized to the data in same codomain.Thus, can realize the unified of dissimilar hardware SMART data analyzed and processed.
Correcting module 300, for respectively Q normalization SMART data being revised according to Q hard disk type information, is revised SMART data acquisition to generate.Particularly, in one embodiment of the invention, correcting module 300 can be set respectively corresponding data offset to different hard disks according to the type information of each hard disk, and according to the corresponding data offset of each hard disk, Q normalization SMART data is revised to realize the subregion to SMART data acquisition.Thus, in the test result of hard disk failure Early-warning Model, can obtain respectively the test result of different hard disks.
The second generation module 400 is for generating hard disk characteristic according to revising SMART data acquisition.
Characteristic extraction element in the hard disk SMART data of the embodiment of the present invention, by the SMART data to sample hard disk, be normalized, and according to type of hardware information, normalized hard disk SMART data are revised, thus, make hard disk SMART data can there is identical codomain, and by normalized hard disk SMART data are revised to subregion, thereby can realize the fault pre-alarming test of different hard disks and analyze by same fault pre-alarming model, improved the accuracy of fault pre-alarming model, reduced model training, test and analysis cost.
Fig. 6 is the structural representation of characteristic extraction element in the hard disk SMART data of another embodiment of the present invention.
As shown in Figure 6, in hard disk SMART data, characteristic extraction element comprises: the first acquisition module 100, the first generation module 200, correcting module 300, the second generation module 400, the second acquisition module 500 and add module 600.
In an embodiment of the present invention, SMART data acquisition comprises and S attribute distinguished S corresponding attribute data set, and each SMART data comprises and S S the first attribute data subclass that attribute difference is corresponding.Wherein, the first attribute data subclass is the corresponding data acquisition of certain attribute in SMART.For example, the first attribute data subclass can be the corresponding data acquisition of hard disk tracking error rate attribute in SMART.
Particularly, the second acquisition module 500 is for obtaining respectively S S gradient data set corresponding to the first attribute data subclass of each SMART data.Add module 600 for a first new attribute data subclass adds respectively each SMART data using S gradient data set of each SMART data of obtaining as S.
Fig. 7 is the structural representation of characteristic extraction element in the hard disk SMART data of another embodiment of the present invention.
As shown in Figure 7, in hard disk SMART data, characteristic extraction element comprises: the first acquisition module 100, the first generation module 200, correcting module 300, the second generation module 400, the second acquisition module 500 and add module 600.Wherein, the second acquisition module 500 comprises: the first generation unit 510 and the second generation unit 520, wherein, the first generation module 200 comprises processing unit 210, correcting module 300 comprises: the first acquiring unit 310 and amending unit 320, the second generation modules 400 comprise: second obtains unit 410, sequencing unit 420 and the 3rd acquiring unit 430.Wherein, the second generation unit 520 comprises: first obtains subelement 521 and second obtains subelement 522.
Particularly, the first generation unit 510 is for choosing successively M attribute data the first attribute data subclass corresponding to s the attribute from each SMART data, to generate P the second attribute data subclass, wherein, each second attribute data subclass comprises M attribute data, P=N-M+1, N is the sum of attribute data in the first attribute data subclass that attribute s is corresponding, s=1 ... S.
The second generation unit 520 is for calculating respectively P P gradient data corresponding to the second attribute data subclass, and according to gradient data set corresponding to P gradient data generation attribute s.
Processing unit 210 is for being normalized a plurality of SMART data by following formula:
g(x)=sign(x)×log y|x|,
Wherein, x is an attribute data in a plurality of SMART data, and g (x) is the attribute data after the normalization that attribute data x is corresponding, and y can calculate by following formula:
y z≤Value<(y+Δy) z
Wherein, z is predetermined threshold value, and Value is the factory-default of the attribute that attribute data x is corresponding, and Δ y is default precision.
The first acquiring unit 310 is for distinguishing Q corresponding modified value according to Q hard disk type acquisition of information and Q hard disk type information.
Amending unit 320 for according to Q hard disk type information respectively corresponding Q modified value respectively corresponding normalization SMART data are revised.
Second obtains unit 410 for obtaining respectively S S the training characteristics data that attribute is corresponding.
Sequencing unit 420 is for sorting to generate and S S the characteristic sequence (V that attribute is corresponding to the training characteristics value of each training characteristics data respectively i).
The 3rd acquiring unit 430 is for obtaining the eigenwert f (v) of each property value v of the correction attribute data set that each attribute is corresponding by following default mapping ruler:
f ( v ) = V i V i &le; v &le; [ ( V i + 1 - V i ) / 2 ] f ( v ) = V i + 1 V i + [ ( V i + 1 - V i ) / 2 ] + 1 < v < V i + 1
The 3rd generation unit 440 is for generating hard disk characteristic according to the eigenwert of each property value of correction attribute data set corresponding to each attribute obtaining.
First obtains subelement 521 for obtain respectively P P fitting coefficient corresponding to the second attribute data subclass by weighted least-squares method, wherein, and i fitting coefficient k in P fitting coefficient i=(Z-b*Y)/X,
Wherein, i=1 ... P,
X = &Sigma; j = i M + i - 1 w j * x j 2 ,
Y = &Sigma; j = i M + i - 1 w j * x i ,
Z = &Sigma; j = i M + i - 1 w j * x j * y j ,
b = ( Z * Y - &Sigma; j = i M + i - 1 ( w j * y j ) * X ) / ( Y * Y - X * &Sigma; j = i M + i - 1 w j ) ,
W jfor j the default weight that attribute data is corresponding in the first attribute data subclass corresponding to attribute s, x jfor the detection time of j attribute data in the first attribute data subclass corresponding to attribute s, y jfor j attribute data in the first attribute data subclass corresponding to attribute s.
Second obtains subelement 522 for obtain respectively i gradient data Grad of P gradient data by following formula i:
Grad i=k i*(M-1)*y M+i-1
Characteristic extraction element in the hard disk SMART data of the embodiment of the present invention, by least square method, obtain gradient data set corresponding to attribute data in each SMART data, and upgrade the first corresponding attribute data set with gradient data set, thus, can make the variation tendency of SMRAT data highlight, coordinate again machine learning algorithm, can make the fault pre-alarming model performance that trains better.
In process flow diagram or any process of otherwise describing at this or method describe and can be understood to, represent to comprise that one or more is for realizing module, fragment or the part of code of executable instruction of the step of specific logical function or process, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can be not according to order shown or that discuss, comprise according to related function by the mode of basic while or by contrary order, carry out function, this should be understood by embodiments of the invention person of ordinary skill in the field.
The logic and/or the step that in process flow diagram, represent or otherwise describe at this, for example, can be considered to for realizing the sequencing list of the executable instruction of logic function, may be embodied in any computer-readable medium, for instruction execution system, device or equipment (as computer based system, comprise that the system of processor or other can and carry out the system of instruction from instruction execution system, device or equipment instruction fetch), use, or use in conjunction with these instruction execution systems, device or equipment.With regard to this instructions, " computer-readable medium " can be anyly can comprise, storage, communication, propagation or transmission procedure be for instruction execution system, device or equipment or the device that uses in conjunction with these instruction execution systems, device or equipment.The example more specifically of computer-readable medium (non-exhaustive list) comprises following: the electrical connection section (electronic installation) with one or more wirings, portable computer diskette box (magnetic device), random-access memory (ram), ROM (read-only memory) (ROM), the erasable ROM (read-only memory) (EPROM or flash memory) of editing, fiber device, and portable optic disk ROM (read-only memory) (CDROM).In addition, computer-readable medium can be even paper or other the suitable medium that can print described program thereon, because can be for example by paper or other media be carried out to optical scanning, then edit, decipher or process in electronics mode and obtain described program with other suitable methods if desired, be then stored in computer memory.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, a plurality of steps or method can realize with being stored in storer and by software or the firmware of suitable instruction execution system execution.For example, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: have for data-signal being realized to the discrete logic of the logic gates of logic function, the special IC with suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries is to come the hardware that instruction is relevant to complete by program, described program can be stored in a kind of computer-readable recording medium, this program, when carrying out, comprises step of embodiment of the method one or a combination set of.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, can be also that the independent physics of unit exists, and also can be integrated in a module two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.If described integrated module usings that the form of software function module realizes and during as production marketing independently or use, also can be stored in a computer read/write memory medium.
The above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.
In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the feature of this embodiment or example description.In this manual, the schematic statement of above-mentioned term is not necessarily referred to identical embodiment or example.And the specific features of description, structure, material or feature can be with suitable mode combinations in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, those having ordinary skill in the art will appreciate that: in the situation that not departing from principle of the present invention and aim, can carry out multiple variation, modification, replacement and modification to these embodiment, scope of the present invention is by claim and be equal to and limit.

Claims (14)

1. a characteristic extracting method in hard disk SMART self-monitoring, analysis and reporting techniques data, is characterized in that, comprising:
Obtain the SMART data acquisition of sample hard disk, wherein, described SMART data acquisition comprises Q SMART data and distinguishes Q corresponding hard disk type information with described Q SMART data;
Described Q SMART data are normalized, to generate Q normalization SMART data;
According to described Q hard disk type information, respectively described Q normalization SMART data are revised, to generate, revised SMART data acquisition;
According to described correction SMART data acquisition, generate hard disk characteristic.
2. the method for claim 1, it is characterized in that, described SMART data acquisition comprises and S S the attribute data set that attribute difference is corresponding, described in each, SMART data comprise and S S the first attribute data subclass that attribute difference is corresponding, described described a plurality of SMART data are normalized before, also comprise:
Obtain respectively S gradient data set corresponding to the first attribute data subclass of the S in SMART data described in each;
Using obtain each described in S gradient data set of SMART data as S the first new attribute data subclass, add respectively described each SMART data.
3. method as claimed in claim 2, is characterized in that, obtains respectively a S gradient data set corresponding to the first attribute data subclass of the S in SMART data described in each and specifically comprises:
In the first attribute data subclass corresponding to s attribute from SMART data described in each, choose successively M attribute data, to generate P the second attribute data subclass, wherein, P=N-M+1, N is the sum of attribute data in the first attribute data subclass that described attribute s is corresponding, s=1 ... S;
Calculate respectively described P P gradient data corresponding to the second attribute data subclass, and generate according to a described P gradient data gradient data set that described attribute s is corresponding.
4. method as claimed in claim 3, is characterized in that, described individual P gradient data corresponding to the second attribute data subclass of described P that calculate respectively specifically comprises:
By weighted least-squares method, obtain respectively described P P fitting coefficient corresponding to the second attribute data subclass, wherein, i fitting coefficient k in a described P fitting coefficient i=(Z-b*Y)/X,
Wherein, i=1 ... P,
X = &Sigma; j = i M + i - 1 w j * x j 2 ,
Y = &Sigma; j = i M + i - 1 w j * x i ,
Z = &Sigma; j = i M + i - 1 w j * x j * y j ,
b = ( Z * Y - &Sigma; j = i M + i - 1 ( w j * y j ) * X ) / ( Y * Y - X * &Sigma; j = i M + i - 1 w j ) ,
W jfor j the default weight that attribute data is corresponding in the first attribute data subclass corresponding to described attribute s, x jfor the detection time of j attribute data in the first attribute data subclass corresponding to described attribute s, y jfor j attribute data in the first attribute data subclass corresponding to described attribute s;
By following formula, obtain respectively i gradient data Grad in a described P gradient data i:
Grad i=k i*(M-1)*y M+i-1
5. the method as described in claim 2-4 any one, is characterized in that, described described a plurality of SMART data are normalized specifically and are comprised:
By following formula, described a plurality of SMART data are normalized:
g(x)=sign(x)×log y|x|,
Wherein, x is an attribute data in described a plurality of SMART data, and g (x) is the attribute data after the normalization that attribute data x is corresponding, and described y can calculate by following formula:
y z≤Value<(y+Δy) z
Wherein, z is predetermined threshold value, and Value is the factory-default of the attribute that described attribute data x is corresponding, and Δ y is default precision.
6. the method as described in claim 2-4 any one, is characterized in that, describedly according to described Q hard disk type information, respectively described Q normalization SMART data is revised specifically and is comprised:
According to described Q hard disk type acquisition of information and described Q Q the modified value that hard disk type information difference is corresponding;
According to described Q hard disk type information respectively corresponding Q modified value respectively corresponding normalization SMART data are revised.
7. the method as described in claim 2-4 any one, it is characterized in that, described correction SMART data acquisition comprises and corresponding S the correction attribute data set respectively of described S attribute, describedly according to described correction SMART data acquisition generation hard disk characteristic, specifically comprises:
Obtain respectively S the training characteristics data that a described S attribute is corresponding;
Respectively the training characteristics value in each training characteristics data is sorted to generate S the characteristic sequence (V corresponding with a described S attribute i);
By following default mapping ruler, obtain the eigenwert f (v) of each the property value v in the correction attribute data set that each attribute is corresponding:
f ( v ) = V i V i &le; v &le; [ ( V i + 1 - V i ) / 2 ] f ( v ) = V i + 1 V i + [ ( V i + 1 - V i ) / 2 ] + 1 < v < V i + 1 ;
According to the eigenwert of each property value in correction attribute data set corresponding to each attribute obtaining, generate described hard disk characteristic.
8. a characteristic extraction element in hard disk SMART data, is characterized in that, comprising:
The first acquisition module, for obtaining the SMART data acquisition of sample hard disk, wherein, described SMART data acquisition comprises Q SMART data and distinguishes Q corresponding hard disk type information with described Q SMART data;
The first generation module, for described Q SMART data are normalized, to generate Q normalization SMART data;
Correcting module, for respectively described Q normalization SMART data being revised according to described Q hard disk type information, revises SMART data acquisition to generate;
The second generation module, for generating hard disk characteristic according to described correction SMART data acquisition.
9. device as claimed in claim 8, it is characterized in that, described SMART data acquisition comprises and S S the attribute data set that attribute difference is corresponding, described in each, SMART data comprise and S S the first attribute data subclass that attribute difference is corresponding, before described the first generation module, also comprise:
The second acquisition module, for obtaining respectively S S gradient data set corresponding to the first attribute data subclass of SMART data described in each;
Add module, for using obtain each described in S gradient data set of SMART data as S the first new attribute data subclass, add respectively described each SMART data.
10. device as claimed in claim 9, is characterized in that, described the second acquisition module specifically comprises:
The first generation unit, for choose successively M attribute data from the first attribute data subclass corresponding to s attribute of SMART data described in each, to generate P the second attribute data subclass, wherein, P=N-M+1, N is the sum of attribute data in the first attribute data subclass that described attribute s is corresponding, s=1 ... S;
The second generation unit, for calculating respectively described P P gradient data corresponding to the second attribute data subclass, and generates according to a described P gradient data gradient data set that described attribute s is corresponding.
11. devices as claimed in claim 10, is characterized in that, described the second generation unit specifically comprises:
First obtains subelement, for obtain respectively described P P fitting coefficient corresponding to the second attribute data subclass by weighted least-squares method, wherein, i fitting coefficient k in a described P fitting coefficient i=(Z-b*Y)/X,
Wherein, i=1 ... P,
X = &Sigma; j = i M + i - 1 w j * x j 2 ,
Y = &Sigma; j = i M + i - 1 w j * x i ,
Z = &Sigma; j = i M + i - 1 w j * x j * y j ,
b = ( Z * Y - &Sigma; j = i M + i - 1 ( w j * y j ) * X ) / ( Y * Y - X * &Sigma; j = i M + i - 1 w j ) ,
W jfor j the default weight that attribute data is corresponding in the first attribute data subclass corresponding to described attribute s, x jfor the detection time of j attribute data in the first attribute data subclass corresponding to described attribute s, y jfor j attribute data in the first attribute data subclass corresponding to described attribute s;
Second obtains subelement, for obtain respectively i gradient data Grad of a described P gradient data by following formula i:
Grad i=k i*(M-1)*y M+i-1
12. devices as described in claim 9-11 any one, is characterized in that, described the first generation module specifically comprises:
Processing unit, for described a plurality of SMART data being normalized by following formula:
g(x)=sign(x)×log y|x|,
Wherein, x is an attribute data in described a plurality of SMART data, and g (x) is the attribute data after the normalization that attribute data x is corresponding, and described y can calculate by following formula:
y z≤Value<(y+Δy) z
Wherein, z is predetermined threshold value, and Value is the factory-default of the attribute that described attribute data x is corresponding, and Δ y is default precision.
13. devices as described in claim 9-11 any one, is characterized in that, described correcting module specifically comprises:
The first acquiring unit, for distinguishing Q corresponding modified value according to described Q hard disk type acquisition of information and described Q hard disk type information;
Amending unit, for according to described Q hard disk type information respectively corresponding Q modified value respectively corresponding normalization SMART data are revised.
14. devices as described in claim 9-11 any one, is characterized in that, described correction SMART data acquisition comprises and corresponding S the correction attribute data set respectively of described S attribute, and described the second generation module specifically comprises:
Second obtains unit, for obtaining respectively S the training characteristics data that a described S attribute is corresponding;
Sequencing unit, for sorting to generate S the characteristic sequence (Vi) corresponding with a described S attribute to the training characteristics value of each training characteristics data respectively;
The 3rd acquiring unit, for obtain the eigenwert f (v) of each property value v of the correction attribute data set that each attribute is corresponding by following default mapping ruler:
f ( v ) = V i V i &le; v &le; [ ( V i + 1 - V i ) / 2 ] f ( v ) = V i + 1 V i + [ ( V i + 1 - V i ) / 2 ] + 1 < v < V i + 1 ;
The 3rd generation unit, for generating described hard disk characteristic according to the eigenwert of each property value of correction attribute data set corresponding to each attribute obtaining.
CN201310733574.5A 2013-12-26 2013-12-26 Characteristic extracting method and device in hard disk SMART data Active CN103646114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310733574.5A CN103646114B (en) 2013-12-26 2013-12-26 Characteristic extracting method and device in hard disk SMART data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310733574.5A CN103646114B (en) 2013-12-26 2013-12-26 Characteristic extracting method and device in hard disk SMART data

Publications (2)

Publication Number Publication Date
CN103646114A true CN103646114A (en) 2014-03-19
CN103646114B CN103646114B (en) 2017-04-05

Family

ID=50251327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310733574.5A Active CN103646114B (en) 2013-12-26 2013-12-26 Characteristic extracting method and device in hard disk SMART data

Country Status (1)

Country Link
CN (1) CN103646114B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260279A (en) * 2015-11-04 2016-01-20 四川效率源信息安全技术股份有限公司 Method and device of dynamically diagnosing hard disk failure based on S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology) data
CN105589795A (en) * 2014-12-31 2016-05-18 中国银联股份有限公司 Disk failure prediction method and device based on prediction model
CN107025153A (en) * 2016-01-29 2017-08-08 阿里巴巴集团控股有限公司 The failure prediction method and device of disk
CN110399238A (en) * 2019-06-27 2019-11-01 浪潮电子信息产业股份有限公司 A kind of disk failure method for early warning, device, equipment and readable storage medium storing program for executing
CN110929305A (en) * 2019-08-08 2020-03-27 北京盛赞科技有限公司 Hard disk protection method, device, equipment and computer readable storage medium
CN111611117A (en) * 2020-05-22 2020-09-01 浪潮电子信息产业股份有限公司 Hard disk fault prediction method, device, equipment and computer readable storage medium
CN113380316A (en) * 2020-02-25 2021-09-10 深信服科技股份有限公司 Disk information mining method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627277A (en) * 2003-12-13 2005-06-15 张国飙 Intelligent hard disk
CN101764846A (en) * 2009-12-18 2010-06-30 西南交通大学 Remote centralized disk array operation monitoring system and implement method thereof
US20130293981A1 (en) * 2010-11-05 2013-11-07 International Business Machines Corporation Smart optimization of tracks for cloud computing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627277A (en) * 2003-12-13 2005-06-15 张国飙 Intelligent hard disk
CN101764846A (en) * 2009-12-18 2010-06-30 西南交通大学 Remote centralized disk array operation monitoring system and implement method thereof
US20130293981A1 (en) * 2010-11-05 2013-11-07 International Business Machines Corporation Smart optimization of tracks for cloud computing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PCFAN评测室: "固态硬盘当缓存 Intel Smart Response技术实战", 《电脑迷 》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589795A (en) * 2014-12-31 2016-05-18 中国银联股份有限公司 Disk failure prediction method and device based on prediction model
WO2016107402A1 (en) * 2014-12-31 2016-07-07 中国银联股份有限公司 Magnetic disk fault prediction method and device based on prediction model
CN105260279A (en) * 2015-11-04 2016-01-20 四川效率源信息安全技术股份有限公司 Method and device of dynamically diagnosing hard disk failure based on S.M.A.R.T (Self-Monitoring Analysis and Reporting Technology) data
CN105260279B (en) * 2015-11-04 2019-01-01 四川效率源信息安全技术股份有限公司 Method and apparatus based on SMART data dynamic diagnosis hard disk failure
CN107025153A (en) * 2016-01-29 2017-08-08 阿里巴巴集团控股有限公司 The failure prediction method and device of disk
CN110399238A (en) * 2019-06-27 2019-11-01 浪潮电子信息产业股份有限公司 A kind of disk failure method for early warning, device, equipment and readable storage medium storing program for executing
CN110399238B (en) * 2019-06-27 2023-09-22 浪潮电子信息产业股份有限公司 Disk fault early warning method, device, equipment and readable storage medium
CN110929305A (en) * 2019-08-08 2020-03-27 北京盛赞科技有限公司 Hard disk protection method, device, equipment and computer readable storage medium
CN113380316A (en) * 2020-02-25 2021-09-10 深信服科技股份有限公司 Disk information mining method, device, equipment and storage medium
CN111611117A (en) * 2020-05-22 2020-09-01 浪潮电子信息产业股份有限公司 Hard disk fault prediction method, device, equipment and computer readable storage medium
CN111611117B (en) * 2020-05-22 2022-06-10 浪潮电子信息产业股份有限公司 Hard disk fault prediction method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN103646114B (en) 2017-04-05

Similar Documents

Publication Publication Date Title
CN103646114A (en) Method and device for extracting feature data from SMART data of hard disk
US10992697B2 (en) On-board networked anomaly detection (ONAD) modules
CN102117660B (en) For testing the tester of data storage device
TWI510916B (en) Storage device lifetime monitoring system and storage device lifetime monitoring method thereof
US20170193372A1 (en) Health Management Using Distances for Segmented Time Series
CN111459700A (en) Method and apparatus for diagnosing device failure, diagnostic device, and storage medium
CN101999101B (en) The defining method of system cloud gray model prediction
US20140214354A1 (en) System and method of detection and analysis for semiconductor condition prediction
JP2015026252A (en) Abnormality detection device and program
CN111459692B (en) Method, apparatus and computer program product for predicting drive failure
CN103606221B (en) Fault automatic diagnostic method of counter and device
CN112016689B (en) Information processing device, prediction discrimination system, and prediction discrimination method
CN111626351B (en) Method and system for acquiring concept drift amount of data distribution
CN111813585A (en) Prediction and processing of slow discs
CN108121750A (en) A kind of model treatment method, apparatus and machine readable media
CN104216825A (en) Problem locating method and system
CN114327241A (en) Method, electronic device and computer program product for managing disk
JP6898607B2 (en) Abnormality sign detection system and abnormality sign detection method
US20170132056A1 (en) Durability and availability evaluation for distributed storage systems
WO2018176203A1 (en) Method and device for use in estimating lifecycle of component
CN107807862A (en) Detect the method, apparatus and server of hard disk failure point
US8447710B1 (en) Method and system for reducing links in a Bayesian network
EP3435186A1 (en) Method and system for diagnostics and monitoring of electric machines
US11339763B2 (en) Method for windmill farm monitoring
CN113344150B (en) Method, device, medium and electronic equipment for identifying stained code points

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant