CN109992569A

CN109992569A - Cluster log feature extracting method, device and storage medium

Info

Publication number: CN109992569A
Application number: CN201910123928.1A
Authority: CN
Inventors: 吴超勇; 陈仕财
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-02-19
Filing date: 2019-02-19
Publication date: 2019-07-09
Also published as: WO2020168756A1

Abstract

The present invention relates to pedestal O&M, a kind of cluster log feature extracting method, device and storage medium are provided by the log of flume client acquisition server cluster and is sent to database；Data cleansing is carried out to daily record data, filters out initial data；Initial data is carried out include mean value, virtual value, peak value, root amplitude, waveform index, pulse index, kurtosis index characteristics extraction；The characteristic value of extraction is carried out with initial data to the operation of Pearson correlation coefficient respectively, it is compared according to calculated related coefficient with relevance threshold, valid data are then considered higher than relevance threshold, and invalid data is then considered lower than relevance threshold and are rejected.Energy Effective selection of the present invention goes out the effective information of the creation data of each host in server cluster, and the characteristic value of creation data is extracted from effective information, and the failure predication and failure modes of system easy to produce reduce the generation of production accident.

Description

Cluster log feature extracting method, device and storage medium

Technical field

The present invention relates to pedestal O&Ms, are situated between specifically, being related to a kind of cluster log feature extracting method, device and storage Matter.

Background technique

In the epoch that information explosion formula increases, file size and data scale march toward TB grades even PB grades and have become reality, collect Group's storage nodes number has reached 64 node number of clusters, and the so huge group system of management has become institute, data center The severe challenge faced.Tracking clustered node operating status in time, being accurately positioned node error message becomes particularly important.Collecting In group's actual operation of storage system, a kind of cluster storage system blog management method is commonly used at present, it can timing or real-time hair System log is sent, the concentration of transmissions of log is realized, but log is not analyzed and managed, understanding that cannot be global is whole The operating condition of a cluster storage system cannot quickly navigate to error message.But increasing with clustered node number, it is right Cluster system management becomes to become increasingly complex.From magnanimity server data, the feature that can reflect server performance is extracted, essence Determine the incipient fault of position clustered node, carrying out corresponding performance detection in advance is particularly important.

Summary of the invention

In order to solve the above problem, the present invention provides a kind of cluster log feature extracting method, is applied to electronic device, including Following steps: by the log of flume client acquisition server cluster, it is sent to Hbase database, wherein flume client End corresponds to the log of every server in acquisition server cluster by multiple Agent processes, and Agent is periodically by corresponding clothes Business device on collection of log data and Hbase database is sent to by api interface；Daily record data is counted using Hadoop According to cleaning, initial data is filtered out, wherein initial data includes at least server disk occupancy, memory usage, cpu and occupies Rate, business interface calling amount；Initial data is carried out to include that mean value, virtual value, peak value, root amplitude, waveform index, pulse refer to The characteristics extraction of mark, kurtosis index；Filter out validity feature with Pearson correlation coefficient, by the characteristic value of extraction respectively with Initial data carries out the operation of Pearson correlation coefficient, is compared according to calculated related coefficient with relevance threshold, high Valid data are then considered in relevance threshold, and invalid data is then considered lower than relevance threshold and are rejected.

Preferably, the data with gross error are rejected in data cleansing using Pauta criterion, comprising the following steps: right Daily record data x₁,x₂...,x_n, calculate its arithmetic mean of instantaneous valueAnd residual error Wherein, x_i For the daily record data of single Agent acquisition；

Calculate standard deviation S_x,

If data x_bResidual error v_b(1≤b≤n), meets formula

Then think x_bIt is the singular value containing gross error value, and abnormal value elimination.

Preferably, the singular value of daily record data is substituted with intermediate value, wherein the intermediate value refers to each daily record data x₁, x₂...,x_nSequence arranges by size, and value in an intermediate position is known as intermediate value.

Preferably, it includes mean value that initial data, which carries out, virtual value, peak value, root amplitude, waveform index, pulse index, high and steep Spend the characteristics extraction of index, wherein

Virtual value is calculated using following formula:

Peak value is calculated using following formula:

X_p=max (x_i)

Root amplitude is calculated using following formula:

Waveform index is calculated using following formula:

Pulse index is calculated using following formula:

Kurtosis index is calculated using following formula:

Wherein, x_iFor the daily record data of single Agent acquisition；

N is the number of data acquisition；

For the arithmetic mean of instantaneous value of the daily record data of acquisition；

X_rmsFor the virtual value of the daily record data of acquisition；

X_pFor the peak value of the daily record data of acquisition；

X_rFor the root amplitude of the daily record data of acquisition；

X_wsFor the waveform index of the daily record data of acquisition；

X_ifFor the pulse index of the daily record data of acquisition；

X_kvFor the kurtosis index of the daily record data of acquisition.

Preferably, the formula of Pearson correlation coefficient is as follows:

Wherein, x_iFor the daily record data of single Agent acquisition；

y_jThe a certain characteristic value extracted in data is acquired for single Agent；

It is daily record data x₁, x₂...,x_nArithmetic average；

It is y₁, y₂...,y_nArithmetic average；

N is the number of log data acquisition.

Preferably, Flume includes multiple first level Agent and a second level Agent, each first level The daily record data of the daily record data of Agent one server of corresponding acquisition, multiple first level Agent acquisitions is collected to Second level Agent, and be transmitted in HDFS by the second level Agent.

The present invention also provides a kind of electronic device, which includes: memory and processor, is deposited in the memory Cluster log feature extraction procedure is contained, following step is realized when the cluster log feature extraction procedure is executed by the processor It is rapid: by the log of flume client acquisition server cluster, to be sent to Hbase database, wherein flume client passes through Multiple Agent processes correspond to the log of every server in acquisition server cluster, and Agent timing will be on corresponding server Collection of log data and Hbase database is sent to by api interface；It is clear that data are carried out to daily record data using Hadoop Wash, filter out initial data, wherein initial data include at least server disk occupancy, memory usage, cpu occupancy, Business interface calling amount；To initial data carry out include mean value, virtual value, peak value, root amplitude, waveform index, pulse index, The characteristics extraction of kurtosis index；Filter out validity feature with Pearson correlation coefficient, by the characteristic value of extraction respectively with original Beginning data carry out the operation of Pearson correlation coefficient, are compared, are higher than with relevance threshold according to calculated related coefficient Relevance threshold is then considered valid data, and invalid data is then considered lower than relevance threshold and is rejected.

Preferably, the data with gross error are rejected in data cleansing using Pauta criterion, comprising the following steps: right Daily record data x₁, x₂...,x_n, calculate its arithmetic mean of instantaneous valueAnd residual error Wherein, x_iFor Single Agent acquired data values；

Calculate standard deviation S_x,

If data x_bResidual error v_b(1≤b≤n), meets following formula

Then think x_bIt is the singular value containing gross error value, and rejects the singular value.

The present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has computer Program, the computer program include that program instruction realizes above-described cluster when described program instruction is executed by processor Log feature extracting method.

Energy Effective selection of the present invention goes out the effective information of the creation data of each host in server cluster, and from effective information In extract the characteristic value of creation data, the failure predication and failure modes of system easy to produce reduce the generation of production accident.

Detailed description of the invention

By the way that embodiment is described in conjunction with following accompanying drawings, features described above of the invention and technological merit will become More understands and be readily appreciated that.

Fig. 1 is the flow diagram of the cluster log feature extracting method of the embodiment of the present invention；

Fig. 2 is the hardware structure schematic diagram of the electronic device of the embodiment of the present invention；

Fig. 3 is the module structure drafting of the cluster log feature extraction procedure of the embodiment of the present invention；

Fig. 4 is the unit composition figure of the log acquisition module of the embodiment of the present invention；

Fig. 5 is the unit composition figure of the characteristic extracting module of the embodiment of the present invention；

Fig. 6 is the unit composition figure of the data cleansing module of the embodiment of the present invention；

Fig. 7 is that the Agent process of Flume reads the schematic diagram of data.

Specific embodiment

Cluster log feature extracting method of the present invention, device and storage medium described below with reference to the accompanying drawings Embodiment.Those skilled in the art will recognize, without departing from the spirit and scope of the present invention, can be with Described embodiment is modified with a variety of different modes or combinations thereof.Therefore, attached drawing and description are inherently said Bright property, it is not intended to limit the scope of the claims.In addition, in the present specification, attached drawing is drawn not in scale, and And identical appended drawing reference indicates identical part.

As shown in Figure 1, the cluster log feature extracting method of the present embodiment, includes the following steps:

Step S10 passes through flume (distributed massive logs acquisition, polymerization and Transmission system) client acquisition service The log of device cluster is sent to Hbase database server.Flume with Agent process be the smallest independent operating unit, one A Agent process is exactly a complete data gathering tool.As shown in fig. 7, Agent includes component Source (data collection Component), Channel (transfer temporarily stores), Sink, three set up an Agent, and source collects data from server, Pass to Channel, Channel saves the Event (data cell) passed over by Source component, and Sink is from Channel Middle reading simultaneously removes Event, and Event is transmitted to backstage.Flume corresponds to each server collector journal by multiple Agent Data.An Agent is arranged in corresponding each server, periodically by the collection of log data on corresponding server and passes through Api interface is sent to backstage.

Step S30 carries out data cleansing to daily record data using Hadoop (distributed system infrastructure), filters out original Beginning data, wherein initial data includes at least server disk occupancy, memory usage, cpu occupancy, business interface and calls Amount.

Step S50 carries out initial data to include that mean value, virtual value, peak value, root amplitude, waveform index, pulse refer to The characteristics extraction of mark, kurtosis index.

Step S70 filters out validity feature with Pearson correlation coefficient: by the characteristic value of extraction respectively with initial data The operation for carrying out Pearson correlation coefficient, is compared according to calculated related coefficient with relevance threshold, is higher than the degree of correlation Threshold value is then considered valid data, and invalid data is then considered lower than relevance threshold and is rejected.

Further, the data with gross error are rejected in data cleansing using Pauta criterion, comprising the following steps:

To daily record data x₁,x₂...,x_n, calculate its arithmetic mean of instantaneous valueAnd residual error Wherein, x_iFor the daily record data of single Agent acquisition；

Calculate standard deviation S_x,

If the x in daily record data_bResidual error v_b(1≤b≤n), meets formula

Further, the singular value of creation data can be efficiently identified out using La Yida rule, but for weeding out Data can then generate null value.Therefore, the singular value of the daily record data identified is substituted with intermediate value, is realized to creation data information Pretreatment.Wherein the intermediate value refers to each variate-value x₁,x₂...,x_nIt sequentially lines up by size, forms a number Column, the variate-value in variable series middle position are known as intermediate value.

In one alternate embodiment, initial data is carried out to include that mean value, virtual value, peak value, root amplitude, waveform refer to The characteristics extraction of mark, pulse index, kurtosis index, wherein

Virtual value is calculated using following formula:

Peak value is calculated using following formula:

X_p=max (x_i)

Root amplitude is calculated using following formula:

Waveform index is calculated using following formula:

Pulse index is calculated using following formula:

Kurtosis index is calculated using following formula:

Wherein, x_iFor the daily record data of single Agent acquisition；

N is the number of log data acquisition；

X_rmsFor the virtual value of the daily record data of acquisition；

X_pFor the peak value of the daily record data of acquisition；

X_rFor the root amplitude of the daily record data of acquisition；

X_wsFor the waveform index of the daily record data of acquisition；

X_ifFor the pulse index of the daily record data of acquisition；

X_kvFor the kurtosis index of the daily record data of acquisition.

Filter out validity feature with Pearson correlation coefficient, specifically, be by features above value respectively with initial data The operation for carrying out Pearson correlation coefficient, according to calculated related coefficient with relevance threshold come compared with, higher than degree of correlation threshold Value is then considered valid data, is then considered invalid data lower than relevance threshold, needs to be rejected, so as to filter out The data of effect.For example, relevance threshold is 0.7, the related coefficient of root amplitude and initial data is 0.2, then shows root width Being worth is invalid data, and the related coefficient of kurtosis index and initial data is 0.85, then assert that kurtosis index is valid data.Its In, the formula of Pearson correlation coefficient is as follows:

Wherein, x_iFor single Agent acquired data values；

y_iThe a certain characteristic value extracted in data is acquired for single Agent；

It is daily record data x₁,x₂...,x_nArithmetic average；

It is y₁,y₂...,y_nArithmetic average；

N is the number of log data acquisition.

In one alternate embodiment, Flume includes multiple first level Agent and a second level Agent, each The daily record data of first level Agent, one server of corresponding acquisition, the log number of multiple first level Agent acquisitions According to being collected to the second level Agent, and it is transmitted in HDFS (distributed file system) by the second level Agent.

As shown in fig.2, being the hardware structure schematic diagram of the embodiment of electronic device of the present invention.It is described in the present embodiment Electronic device 2 be it is a kind of can according to the instruction for being previously set or store, automatic progress numerical value calculating and/or information processing Equipment.For example, it may be smart phone, tablet computer, laptop, desktop computer, rack-mount server, blade type take It is engaged in device, tower server or Cabinet-type server (including server set composed by independent server or multiple servers Group) etc..As shown in Fig. 2, the electronic device 2 includes at least, but it is not limited to, depositing for connection can be in communication with each other by system bus Reservoir 21, processor 22, network interface 23.Wherein: the memory 21 includes at least a type of computer-readable storage Medium, the readable storage medium storing program for executing include flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), Random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable are only Read memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc..In some embodiments In, the memory 21 can be the internal storage unit of the electronic device 2, such as the hard disk or memory of the electronic device 2. In further embodiments, the memory 21 is also possible to the External memory equipment of the electronic device 2, such as electronics dress Set the plug-in type hard disk being equipped on 2, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Certainly, the memory 21 can also both include the electronic device 2 Internal storage unit also include its External memory equipment.In the present embodiment, the memory 21 is installed on commonly used in storage Operating system and types of applications software of the electronic device 2, such as the cluster log feature extraction procedure code etc..This Outside, the memory 21 can be also used for temporarily storing the Various types of data that has exported or will export.

The processor 22 can be in some embodiments central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 22 is commonly used in the control electricity The overall operation of sub-device 2, such as execute control relevant to the electronic device 2 progress data interaction or communication and processing Deng.In the present embodiment, the processor 22 is for running the program code stored in the memory 21 or processing data, example Cluster log feature extraction procedure as described in running.

The network interface 23 may include radio network interface or wired network interface, which is commonly used in Communication connection is established between the electronic device 2 and other electronic devices.For example, the network interface 23 is used to incite somebody to action by network The electronic device 2 is connected with push platform, and data transmission channel is established between the electronic device 2 and push platform and is led to Letter connection etc..The network can be intranet (Intranet), internet (Internet), global system for mobile communications (Global System of Mobile communication, GSM), wideband code division multiple access (Wideband CodeDivision Multiple Access, WCDMA), 4G network, 5G network, bluetooth (Bluetooth), Wi-Fi etc. is wireless Or cable network.

Optionally, which can also include display, and display is referred to as display screen or display unit. It can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display and Organic Light Emitting Diode in some embodiments (Organic Light-Emitting Diode, OLED) display etc..Display is used to be shown in handle in electronic device 2 Information and for showing visual user interface.

It should be pointed out that Fig. 2 illustrates only the electronic device 2 with component 21-23, it should be understood that not It is required that implement all components shown, the implementation that can be substituted is more or less component.

It may include operating system, cluster log feature extraction procedure 50 in memory 21 comprising readable storage medium storing program for executing Deng.Processor 22 realizes following steps when executing cluster log feature extraction procedure 50 in memory 21:

Step S10 passes through flume (distributed massive logs acquisition, polymerization and Transmission system) client acquisition service The log of device cluster is sent to Hbase database server.Flume with Agent component be the smallest independent operating unit, one A Agent component is exactly a complete data gathering tool.Flume is corresponded to each server by multiple Agent and collects day Will data.An Agent is arranged in corresponding each server, periodically by the collection of log data on corresponding server and passes through Api interface is sent to backstage.

Step S70 filters out validity feature with Pearson correlation coefficient, by the characteristic value of extraction respectively with initial data The operation for carrying out Pearson correlation coefficient, is compared according to calculated related coefficient with relevance threshold, is higher than the degree of correlation Threshold value is then considered valid data, and invalid data is then considered lower than relevance threshold and is rejected.

In the present embodiment, the cluster log feature extraction procedure being stored in memory 21 can be divided into one A or multiple program modules, one or more of program modules are stored in memory 21, and can be by one or more A processor (the present embodiment is processor 22) is performed, to complete the present invention.For example, Fig. 3 shows the cluster log spy The program module schematic diagram for levying extraction procedure, in the embodiment, the cluster log feature extraction procedure 50 can be divided into Log acquisition module 501, data cleansing module 502, characteristic extracting module 503, validity feature screening module 504.Wherein, this hair Bright so-called program module is the series of computation machine program instruction section for referring to complete specific function, than program more suitable for retouching State implementation procedure of the cluster log feature extraction procedure in the electronic device 2.It will be described below described in specifically introducing The concrete function of program module.

Wherein, log acquisition module 501 is used for through flume (distributed massive logs acquisition, polymerization and transmission system System) client acquisition server cluster log, be sent to Hbase database server.Flume is minimum with Agent component Independent operating unit, an Agent component is exactly a complete data gathering tool.Flume by multiple Agent come pair Answer each server collector journal data.An Agent is arranged in corresponding each server, periodically by the day on corresponding server Will data collection is simultaneously sent to backstage by api interface.

Data cleansing module 502 is used to carry out data to daily record data using Hadoop (distributed system infrastructure) clear Wash, filter out initial data, wherein initial data include at least server disk occupancy, memory usage, cpu occupancy, Business interface calling amount.

Characteristic extracting module 503 is used to carry out initial data to include that mean value, virtual value, peak value, root amplitude, waveform refer to The characteristics extraction of mark, pulse index, kurtosis index.

Validity feature screening module 504 filters out validity feature with Pearson correlation coefficient, by the characteristic value of extraction point The operation for not carrying out Pearson correlation coefficient with initial data, is compared according to calculated related coefficient with relevance threshold Compared with being then considered valid data higher than relevance threshold, invalid data be then considered lower than relevance threshold and is rejected.

In one alternate embodiment, as shown in fig. 6, data cleansing module 502 includes Pauta criterion judging unit 5021, Pauta criterion judging unit 5021 rejects the data with gross error using Pauta criterion, comprising the following steps:

To daily record data x₁,x₂...,x_n, calculate its arithmetic mean of instantaneous valueAnd residual error Wherein, x_iFor single Agent acquired data values；

Calculate standard deviation S_x,

If data x_bResidual error v_b(1≤b≤n), meets following formula

Further, data cleansing module 502 further includes singular value replacement unit 5022.It can be effective using La Yida rule Ground identifies the singular value of creation data, but the data for weeding out can then generate null value.Singular value replacement unit 5022 is right The singular value of the daily record data identified is substituted with intermediate value, realizes the pretreatment to creation data information.Wherein the intermediate value is Refer to each variate-value x₁,x₂...,x_nIt sequentially lines up by size, forms an ordered series of numbers, be in variable series middle position Variate-value be known as intermediate value.

In one alternate embodiment, as shown in figure 5, characteristic extracting module 503 include mean value extraction unit 5031, effectively It is worth extraction unit 5032, peak extraction unit 5033, root magnitude extraction unit 5034, waveform index extraction unit 5035, arteries and veins Rush index extraction unit 5036, kurtosis index extraction unit 5037.Respectively initial data is carried out to include mean value, virtual value, peak The characteristics extraction of value, root amplitude, waveform index, pulse index, kurtosis index, wherein

Virtual value is calculated using following formula:

Peak value is calculated using following formula:

X_p=max (x_i)

Root amplitude is calculated using following formula:

Waveform index is calculated using following formula:

Pulse index is calculated using following formula:

Kurtosis index is calculated using following formula:

Wherein, x_iFor the daily record data of single Agent acquisition；

N is the number of log data acquisition；

X_rmsFor the virtual value of the daily record data of acquisition；

X_pFor the peak value of the daily record data of acquisition；

X_rFor the root amplitude of the daily record data of acquisition；

X_wsFor the waveform index of the daily record data of acquisition；

X_ifFor the pulse index of the daily record data of acquisition；

X_kvFor the kurtosis index of the daily record data of acquisition.

Wherein, x_iFor single Agent acquired data values；

It is daily record data x₁, x₂...,x_nArithmetic average；

It is y₁, y₂...,y_nArithmetic average；

N is the number of data acquisition.

In one alternate embodiment, as shown in figure 4, log acquisition module 501 further includes Agent setting unit 5011, For for Flume carry out include multiple first level Agent and a second level Agent setting, each first level The daily record data of the daily record data of Agent one server of corresponding acquisition, multiple first level Agent acquisitions is collected to Second level Agent, and be transmitted in HDFS by the second level Agent.

In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium It can be hard disk, multimedia card, SD card, flash card, SMC, read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM), any one in portable compact disc read-only memory (CD-ROM), USB storage etc. or several timess Meaning combination.It include cluster log feature extraction procedure etc. in the computer readable storage medium, the cluster log feature mentions Following operation is realized when program fetch 50 is executed by processor 22:

Step S10 is sent to Hbase database server by the log of flume client acquisition server cluster. For Flume with Agent component for the smallest independent operating unit, an Agent component is exactly a complete data gathering tool. Flume corresponds to each server collector journal data by multiple Agent.An Agent is arranged in corresponding each server, fixed When by the collection of log data on corresponding server and by api interface be sent to backstage.

Step S30 carries out data cleansing to daily record data using Hadoop, filters out initial data, wherein initial data Including at least server disk occupancy, memory usage, cpu occupancy, business interface calling amount.

The specific embodiment of the computer readable storage medium of the present invention and above-mentioned cluster log feature extracting method with And the specific embodiment of electronic device 2 is roughly the same, details are not described herein.

The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification, Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of cluster log feature extracting method is applied to electronic device, which comprises the following steps:

By the log of flume client acquisition server cluster, it is sent to Hbase database, wherein flume client is logical The log for every server that multiple Agent processes correspond in acquisition server cluster is crossed, Agent is periodically by corresponding server On collection of log data and Hbase database is sent to by api interface；

Data cleansing is carried out to daily record data using Hadoop, filters out initial data, wherein initial data includes at least service Device disk occupancy, memory usage, cpu occupancy, business interface calling amount；

Initial data is carried out to include mean value, virtual value, peak value, root amplitude, waveform index, pulse index, kurtosis index Characteristics extraction；

Validity feature is filtered out with Pearson correlation coefficient: the characteristic value of extraction is subjected to Pearson came phase with initial data respectively The operation of relationship number is compared with relevance threshold according to calculated related coefficient, is then effective higher than relevance threshold Data are then invalid datas lower than relevance threshold, and are rejected.

2. cluster log feature extracting method according to claim 1, which is characterized in that

During data cleansing, the data with gross error are rejected using Pauta criterion, comprising the following steps:

To daily record data x₁,x₂...,x_n, calculate its arithmetic mean of instantaneous valueAnd residual errorWherein, x_iFor the daily record data of single Agent acquisition；

Calculate standard deviation S_x,

If the x in daily record data_bResidual error v_b(1≤b≤n), meets formula

Then determine x_bIt is the singular value containing gross error value, and abnormal value elimination.

3. cluster log feature extracting method according to claim 2, which is characterized in that

The singular value of daily record data is substituted with intermediate value, wherein the intermediate value refers to each daily record data x₁,x₂...,x_nBy big Small sequence arrangement, value in an intermediate position are known as intermediate value.

4. cluster log feature extracting method according to claim 2, which is characterized in that

Initial data carry out include mean value, virtual value, peak value, root amplitude, waveform index, pulse index, kurtosis index spy Value indicative is extracted, wherein

Virtual value is calculated using following formula:

Peak value is calculated using following formula:

X_p=max (x_i)

Root amplitude is calculated using following formula:

Waveform index is calculated using following formula:

Pulse index is calculated using following formula:

Kurtosis index is calculated using following formula:

Wherein, x_iFor the daily record data of single Agent acquisition；

N is the number of data acquisition；

X_rmsFor the virtual value of the daily record data of acquisition；

X_pFor the peak value of the daily record data of acquisition；

X_rFor the root amplitude of the daily record data of acquisition；

X_wsFor the waveform index of the daily record data of acquisition；

X_ifFor the pulse index of the daily record data of acquisition；

X_kvFor the kurtosis index of the daily record data of acquisition.

5. cluster log feature extracting method according to claim 2, which is characterized in that the formula of Pearson correlation coefficient It is as follows:

Wherein, x_iFor the daily record data of single Agent acquisition；

It is daily record data x₁, x₂...,x_nArithmetic average；

It is y₁, y₂...,y_nArithmetic average；

N is the number of log data acquisition.

6. cluster log feature extracting method according to claim 1, which is characterized in that

Flume includes multiple first level Agent and one second level Agent, each first level Agent corresponding The daily record data of a server is acquired, the daily record data of multiple first level Agent acquisitions is collected to the second level Agent, And it is transmitted in HDFS by the second level Agent.

7. a kind of electronic device, which is characterized in that the electronic device includes: memory and processor, is stored in the memory There is cluster log feature extraction procedure, following step is realized when the cluster log feature extraction procedure is executed by the processor It is rapid:

8. electronic device according to claim 7, which is characterized in that

The data with gross error are rejected in data cleansing using Pauta criterion, comprising the following steps:

To daily record data x₁,x₂...,x_n, calculate its arithmetic mean of instantaneous valueAnd residual errorWherein, x_iFor single Agent acquired data values；

Calculate standard deviation S_x,

If the x in daily record data_bResidual error v_b(1≤b≤n), meets formula

9. electronic device according to claim 8, which is characterized in that

Singular value in daily record data is substituted with intermediate value, wherein the intermediate value refers to each daily record data x₁,x₂...,x_nIt presses Size order arrangement, value in an intermediate position are known as intermediate value.

10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence, the computer program include that program instruction is realized in claim 1 to 6 and appointed when described program instruction is executed by processor Cluster log feature extracting method described in one.