CN108922140A

CN108922140A - It is a kind of based on N-gram model industry alarm spread unchecked prediction technique

Info

Publication number: CN108922140A
Application number: CN201810889499.4A
Authority: CN
Inventors: 王建东; 徐洲; 徐一洲
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2018-11-30
Anticipated expiration: 2038-08-07
Also published as: CN108922140B

Abstract

The invention belongs to field of signal processing more particularly to a kind of industry alarm based on N-gram model to spread unchecked prediction technique, include the following steps：(1) it obtains history alarm and spreads unchecked data set, count variable of wherein alarming, the discrimination for calculating each alarm variable simultaneously rejects the alarm variable that discrimination is 0；(2) by treated data set sequence and emerging sequence do similitude comparison one by one, and sequence is arranged from high to low according to similarity scores；(3) to treated again, data set is segmented setting time window, and counts the quantity of each data segment, is found out and is calculated next alarm variable and corresponding probability being likely to occur using sample data set；(4) by bayesian probability model find out the alarm for predicting next appearance probability and corresponding confidence interval；(5) operation is iterated to step (3) and (4).It is true that the present invention solves the problems, such as to carry out forecasting inaccuracy when prediction is spread unchecked in alarm at present.

Description

It is a kind of based on N-gram model industry alarm spread unchecked prediction technique

Technical field

The invention belongs to field of signal processing more particularly to a kind of industry alarm based on N-gram model to spread unchecked prediction side Method.

Background technique

In current industrial circle, alarm system is as being monitored the abnormal conditions in industrial process and alarm Role is widely used.However current industrial alarm system still remains many problems, such as creates disturbances to alarm, resides alarm It is spread unchecked with alarm.It creates disturbances to alarm and refers to a large amount of meaningless of short time interior generation, do not need what operator was responded The presence of alarm, these alarms can reduce operator to the responding ability really alarmed；Resident alarm refers to holding after occurring The alarm of some time is held in continuation of insurance, these alarms are not fallen clearly still after operator takes movement usually, will affect operator couple The judgement of working state of system；Alarm, which is spread unchecked, refers to many alarms of generation in the short time, and is usually to be triggered by single incident , these alarms have been usually more than the processing limit of operator, solve more complicated.

It is directed to the research work alarmed and spread unchecked at present mainly in terms of spreading unchecked the similarity analysis of sequence for alarm, and needle The timely research for prejudging and handling spread unchecked to the alarm occurred in real time in industrial process or blank out.

The prediction spread unchecked of alarming, which refers to, spreads unchecked emerging alarm, and system can predict next possible generation Alarm, to allow the operator to be operated in advance.The problem of prediction spread unchecked at present about alarm is primarily present has： 1) not to history alarm spread unchecked classify under the premise of directly predict, cause it is certain as caused by different event still The result for thering is part to alarm again to mislead when the alarm of identical history is spread unchecked for predicting；2) history alarm is not accounted for spread unchecked Influence of the quantity of data for prediction result in database, from the point of view of practical experience, historical data is more, for the knot of prediction Fruit will be more accurate, and traditional n-gram prediction technique, which not can reflect historical data quantity, influences result bring, leads The prediction result for causing output error, influences the judgement of operator.

Two above problem is that alarm forecasting reliability causes obstacle, if not solving that mistake will be may cause Alarm prediction, influences the judgement of operator, causes safety and economic loss in industrial processes.

Summary of the invention

According to the above-mentioned deficiencies of the prior art, the industry alarm based on N-gram model that the present invention provides a kind of is spread unchecked pre- Survey method can solve and not account for the history classification spread unchecked of alarm and data in database are spread unchecked in history alarm at present Alarm is carried out in the case where quantity spreads unchecked prediction and the true problem of the forecasting inaccuracy that occurs.

Present invention solves the technical problem that the technical solution used includes the following steps：

(1) it obtains history alarm and spreads unchecked data setCount variable of wherein alarmingCalculate each alarm The discrimination D of variable_i, rejecting pretreatment is carried out for 0 alarm variable to the indexing of data concentration zones, forms the first data set

(2) by the first data setIn m-th of history alarm spread unchecked sequence and emerging alarm sequenceSimilitude comparison is carried out one by one, and the sequence after matching is arranged from high to low according to similarity scores, shape At the second data set

(3) the sliding size of setting time window and time window is to the second data setIt is segmented, and The quantity for counting each data segment is found out by the second data setNext alarm being likely to occur when as sample Variable and corresponding probability；

(4) by bayesian probability model find out the alarm for predicting next appearance probability and corresponding confidence interval [Z₁,Z₂]；

(5) n times interative computation is carried out to step (3) and step (4), takes confidence interval [Z₁,Z₂] lower limit Z₁Highest result Sequence in corresponding data set is optimum prediction data set, and corresponding prediction result is exported as final prediction result.

Further, step (1) is implemented as：

Alarm variable x_iDiscrimination D_i=log N/ | m:x_i∈x_AF(m) }, wherein N represents data setMiddle packet The number of sequence is spread unchecked in the history alarm contained, | m:x_i∈x_AF(m) indicate that the history alarm in data set comprising the alarm element is general The number of indiscriminate sequence rejects the alarm variable that discrimination is 0 and obtains the first data set laterWhen discrimination is 0, Illustrate that data set is spread unchecked in alarmIn spread unchecked by the alarm that various failures cause have this alarm occurrences, because This is it is considered that this alarm variable is the nonsensical alarm that creates disturbances to, therefore this alarm variable is directly spread unchecked data from alarm Concentrate removal.

Further, the specific implementation process of step (2) is：

Similarity scores are calculated using smith-waterman algorithm, as emerging alarm sequence x_ASWith the first data CollectionIn m-th of sequenceWhen carrying out similitude comparison, similarity scores matrix H is constructed (m), it initializes the first row and first and is classified as 0, matrix size is j × (m_l+ 1), m_lFor x '_AF(m) length,

Wherein (1≤i '≤j-1,1≤j '≤m_l), W_k′=k ' W₁Be plug hole length be k ' when point penalty, W₁It is penalized for unit Point,It is direct matching x_i′WithScore, H (m)_i′-_k′,j′-W_k′It is x_i′Preceding plug hole length k's ' Point penalty, H (m)_{i′,j′-l′}-W_l′It isThe point penalty of preceding plug hole length l ', 0 indicates to be matched to x_i′WithTime series is not similar Property, maximum element is x in final score matrix H (m)_ASWith x '_AF(m) similarity scores；By the first data setIn alarm spread unchecked sequence and sorted from high to low according to similarity scores, obtain the second data set

Further, it in the step (3), sets to the second data setThe time window being segmented Length is n and n-1, and the sliding size of time window is 1, is asked by the second data setIt is next when as sample can The specific implementation process of the alarm variable and corresponding probability that can occur is, with time window n and n-1 respectively to the second data set ForInterior data count the quantity of each data segment after being segmented, record into data setTo emerging alarm sequenceBecause same failure causes Alarm spread unchecked sequence similarity height, therefore predict subsequent time t_jWhen the alarm that the moment occurs, the second data set can refer toMiddle alert data segmentEnter the alarm variable of alarm condition later, it is all in this number It can be seen as being next alarm being likely to occur according to the alarm variable for entering alarm condition after segment, avoid in this way It is that different faults cause but there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result, so inquiry X_DS InCorresponding quantity andAll alarms that subsequent time occursInquiryCorresponding quantity finds out conditional probability using maximum likelihood rule approximationSetting In make conditional probabilityCorresponding element is output result when maximumNote

Further, it is contemplated that the accuracy and arithmetic speed of prediction, the preferred value of n are 3.

Further, in step (4), it is contemplated that data set size meeting is spread unchecked in the history alarm in actual conditions for prediction The reliability of prediction result is had an impact, for example is predicted down with the data set that sequence composition is spread unchecked in 10 and 100 alarms One alarm is alarm variable x_j0Probability be all 80%, it is clear that 100 alarm spread unchecked sequence composition data set prediction knot Fruit wants relatively reliable, and the influence in step 3 for data set size to result has no embodiment, thus we to introduce Bayes general Rate model carrys out innovatory algorithm.

Set occurent sequence x_ASSubsequent time enter the alarm variable of alarm conditionFor eventIt is sent out Giving birth to probability isAssuming that it is defined to be uniformly distributedFor prior probability；

Setting is by the second data setIt is alarm variable as the next alarm of sample predictionsFor event

Set the second data setIn there is k alarm to spread unchecked sequence and includeThere is l alarm to spread unchecked sequence Column includeThen conditional probabilityExpression event? Event in the case where knowingThe probability of generation, wherein

Then by eventEmerging alarm sequence x is speculated in the case where determination_ASSubsequent time enters alarm condition For variable of alarmingProbability beIt is fixed JusticeFor posterior probability；

Wherein, because of posterior probabilityIt can be according to prior probabilityValue it is different and change, and thing PartObedience is uniformly distributed, and is calculated using probability mass function (PMF), obtains posterior probabilityProbability matter Flow function f_X；

To probability mass function f_XSummation is that final prediction and alarm is alarm variableProbability；

Confidence level 1- α is provided, to probability mass function f_XSummation, finds out section [Z₁,Z₂],This section, which is exactly that confidence level 1- α is corresponding, sets Believe section.

We introduce the concept of confidence interval, are the credibilities in order to embody the probability value of prediction.Confidence interval embodies The probability of the next alarm actually occurred falls in the degree around prediction probability result, it is assumed that confidence level 95% sets Believe section [Z₁,Z₂] lower limit mean a possibility that we have 95% assurance to guarantee the alarm of next generation at least Z_β。

Further, the specific implementation process of step (5) is：

N times iteration is carried out to step (3) and step (4), kth deletes K (0≤K≤N) data and concentrates similarity scores Minimum sequence, uses data setIt is predicted, it is final to compare confidence interval [Z in n times result₁,Z₂] lower limit Z₁, take confidence interval [Z₁,Z₂] lower limit Z₁Highest result is optimum prediction as a result, corresponding data setFor most Good predictive data set because lower limit of confidence interval has reacted the minimum assurance that prediction result is target alarm, therefore selects confidence area Between sequence in the corresponding data set of the highest result of lower limit be optimum prediction data set.

The beneficial effects of the invention are as follows：(1) the invention proposes the data set and the alarm to be predicted that use prediction are general The indiscriminate method for carrying out similitude comparison and being resequenced from high to low according to similarity scores, and iteration reduction affinity score is low Sequence, until obtaining optimum prediction result, it is contemplated that the history alarm caused by same failure, which is spread unchecked, can improve the accurate of prediction Property, avoid different faults initiation but there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result； (2) the N-gram prediction technique proposed by the present invention based on bayesian probability model, it is contemplated that the history alarm for prediction is general Indiscriminate influence of the data set size for prediction result, embodies the credibility of prediction result, avoids because sample size is insufficient It misleads to prediction result.

Detailed description of the invention

Fig. 1 is that industrial alarm described in the embodiment of the present invention based on N-gram model spreads unchecked the flow chart of prediction technique；

Fig. 2 is the probability mass function of prediction result in concrete application scene of the present invention；

Fig. 3 is that prediction probability and confidence interval with the history alarm for prediction spread unchecked number in concrete application scene of the present invention The schematic diagram of mesh variation and variation；

Specific embodiment

The invention will be further described below in conjunction with the accompanying drawings.

Embodiment one：

Such as Fig. 1, prediction technique is spread unchecked in a kind of industry alarm based on N-gram model of the present invention, including is walked as follows Suddenly：

S1 obtains history alarm and spreads unchecked data setCount variable of wherein alarmingCalculate each alarm The discrimination D of variable_i, rejecting pretreatment is carried out for 0 alarm variable to the indexing of data concentration zones, forms the first data set

S2, by the first data setIn m-th of history alarm spread unchecked sequence and emerging alarm sequenceSimilitude comparison is carried out one by one, and the sequence after matching is arranged from high to low according to similarity scores, Form the second data set

The sliding size of S3, setting time window and time window is to the second data setIt is segmented, and The quantity for counting each data segment is found out by the second data setNext alarm being likely to occur when as sample Variable and corresponding probability；

S4, by bayesian probability model find out the alarm for predicting next appearance probability and corresponding confidence interval [Z₁,Z₂]；

S5 carries out n times interative computation to step (3) and step (4), takes confidence interval [Z₁,Z₂] lower limit Z₁Highest result Sequence in corresponding data set is optimum prediction data set, and corresponding prediction result is exported as final prediction result.

In step sl, it obtains history alarm and spreads unchecked data setData set is spread unchecked in statistical history alarmIn include alarm variableCalculate the discrimination D of each alarm variable_i；Alarm variable x_iDiscrimination be D_i=log N/ | m:x_i∈x_AF(m) }, wherein N represents data setIn include history alarm spread unchecked the number of sequence Mesh, | m:x_i∈x_AF(m) indicate that the number of sequence is spread unchecked in the history alarm in data set comprising the alarm element；Reject discrimination To obtain the first data set after 0 alarm variableWhen discrimination is 0, illustrate that data set is spread unchecked in alarmIn spread unchecked by the alarm that various failures cause have this alarm occurrences, therefore it is considered that this alarm Variable is the nonsensical alarm that creates disturbances to, therefore this alarm variable is directly spread unchecked in data set from alarm and is removed.

In step s 2, using smith-waterman algorithm, as emerging alarm sequence x_ASWith the first data setIn m-th of sequenceWhen carrying out similitude comparison, construct similarity scores matrix H (m), Initialization the first row and first is classified as 0, and matrix size is j × (m_l+ 1), m_lFor x '_AF(m) length initializes the first row and the One is classified as 0, and matrix size is j × (m_l+ 1), m_lFor x '_AF(m) length：

Wherein (1≤i '≤j-1,1≤j '≤m_l), W_k′=k ' W₁Be plug hole length be k ' when point penalty, W₁It is penalized for unit Point；

It is direct matching x_i′WithScore；

H(m)_{i′-k′,j′}-W_k′It is x_i′The point penalty of preceding plug hole length k '；

H(m)_{i′,j′-l′}-W_l′It isThe point penalty of preceding plug hole length l '；

0 means to be matched to x_i′WithTime series is without similitude.

Matrix is updated according to the above rule, maximum element is x in final score matrix H (m)_ASWith x '_AF (m) sequence in the first data set is ranked up by similarity scores from high to low according to similarity scores, obtains the second number According to collection, it is denoted as

In step S3, the length that can use time window is n and n-1, and the sliding size of time window is T=1s, to second Data setInterior data count the quantity of each data segment after being segmented respectively with time window, record into data CollectionTo emerging alarm sequenceBecause same failure is drawn Sequence similarity height is spread unchecked in the alarm of hair, therefore predicts subsequent time t_jWhen the alarm that the moment occurs, the second data set can refer toMiddle alert data segmentEnter the alarm variable of alarm condition later, it is all in this number It can be seen as being next alarm being likely to occur according to the alarm variable for entering alarm condition after segment, avoid in this way It is that different faults cause but there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result, specially inquire X_DSInCorresponding quantity andAll alarms that subsequent time occursInquiryCorresponding quantity finds out conditional probability using maximum likelihood rule approximation：

Obviously, conditional probabilityCorresponding alarm variable when being maximizedIt is exactly that subsequent time most has Possibly into the alarm variable of alarm condition, note makes conditional probabilityAlarm variable when being maximized is

In addition, it is contemplated that the value of n is selected as 3 by the accuracy and arithmetic speed of prediction, the present invention.

In step s 4, it is contemplated that data set size is spread unchecked in the history alarm in actual conditions for prediction to tie prediction The reliability of fruit has an impact, for example predicts next alarm with the data set that sequence composition is spread unchecked in 10 and 100 alarms For the variable x that alarms_j0Probability be all 80%, it is clear that the prediction result of data set that sequence composition is spread unchecked in 100 alarms will be more Reliably, and the influence in step 3 for data set size to result has no embodiment, therefore we introduce bayesian probability model Innovatory algorithm.

It is implemented as：

Then by eventOccurent alarm sequence x is speculated in the case where determination_ASSubsequent time enters alarm condition For variable of alarmingProbability beIt is fixed JusticeFor posterior probability；Wherein, because of posterior probabilityIt can be according to prior probabilityValue it is different And change, and eventObedience is uniformly distributed, so being calculated using probability mass function (PMF), obtains posterior probabilityProbability mass function f_X；

To probability mass function f_XSummation is that final prediction and alarm isProbability；

In step s 5, n times interative computation is carried out to step S3 and step S4, kth iteration deletes K (0≤K≤N) item The history alarm that similarity scores are minimum in data set spreads unchecked sequence until data set is sky.Use data setIt is predicted, it is final to compare lower limit of confidence interval Z in n times result₁, take the highest result of lower limit of confidence interval to be Optimum prediction is as a result, corresponding data setFor optimum prediction data set.Because lower limit of confidence interval has reacted pre- Survey result be target alarm minimum assurance, therefore select the sequence in the corresponding data set of the highest result of lower limit of confidence interval for Optimum prediction data set.

Progress similitude, which is spread unchecked, the invention proposes the data set and the alarm to be predicted that use prediction compares simultaneously basis The method that similarity scores are resequenced from high to low, and iteration reduces the low sequence of affinity score, until obtaining optimum prediction As a result, it is contemplated that the accuracy that can improve prediction is spread unchecked in the history alarm caused by same failure, avoids different faults initiation But there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result；What is proposed simultaneously is general based on Bayes The N-gram prediction technique of rate model, it is contemplated that the history alarm for prediction spreads unchecked data set size for the shadow of prediction result It rings, embodies the credibility of prediction result, avoid because sample size deficiency misleads to prediction result.

It is application of the method for the invention in concrete application scene below.

One group of historical data alarm, which is provided, using TE process simulation spreads unchecked data.Wherein sequence number is the size of 100, n It is set as 3.

Step (1) calculates the discrimination of each alarm variable in data set, weeds out the alarm variable that discrimination is 0.

Step (2), according to sequence similarity score aligning method by all alarms in data set spread unchecked sequence with The alarm sequence of appearance compares one by one and according to the descending rearrangement of similarity scores.

Step (3), the sequence concentrated to data is segmented, and counts each data segment quantity, according to nearest two moment Historical data section is searched in the alarm of appearance from data set, and finds out corresponding conditional probability by maximum likelihood rule approximation.

Step (4) finds out the corresponding probability mass function of prediction result using bayesian probability model, by probability matter Flow function is summed into obtaining prediction probability, and finds out the confidence interval under corresponding confidence level 95%.As shown in Fig. 2, confidence area Between be (47.17%, 55.81%).

Step (5) is iterated calculating to data set, repeats step (3) and (4), prediction probability in all results is taken to set The letter highest result of interval limit is final output.As shown in figure 3, optimal prediction result is x₁₁, prediction probability is 98.82%, confidence interval is [97.33%, 100%] (confidence level 90%), illustrates that system has 90% assurance to think next The alarm of appearance is x₁₁Minimum probability be 97.33%.

Claims

1. prediction technique is spread unchecked in a kind of industry alarm based on N-gram model, which is characterized in that include the following steps：

(1) it obtains history alarm and spreads unchecked data setCount variable of wherein alarmingCalculate each alarm variable Discrimination D_i, rejecting pretreatment is carried out for 0 alarm variable to the indexing of data concentration zones, forms the first data set

(2) by the first data setIn m-th of history alarm spread unchecked sequence and emerging alarm sequenceSimilitude comparison is carried out one by one, and the sequence after matching is arranged from high to low according to similarity scores, Form the second data set

(3) the sliding size of setting time window and time window is to the second data setIt is segmented, and counts every The quantity of a data segment is found out by the second data setWhen as sample next alarm variable being likely to occur and Corresponding probability；

(4) probability and corresponding confidence interval [Z of the alarm for predicting next appearance are found out by bayesian probability model₁, Z₂]；

(5) n times interative computation is carried out to step (3) and step (4), takes confidence interval [Z₁,Z₂] lower limit Z₁Highest result is corresponding Data set in sequence be optimum prediction data set, and corresponding prediction result is exported as final prediction result.

2. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that step (1) be implemented as：

Alarm variable x_iDiscrimination D_i=log N/ | m:x_i∈x_AF(m) }, wherein N represents data setIn include History, which is alarmed, spreads unchecked the number of sequence, | m:x_i∈x_AF(m) indicate that sequence is spread unchecked in the history alarm in data set comprising the alarm element The number of column.

3. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that step (2) specific implementation process is：

Similarity scores are calculated using smith-waterman algorithm, as emerging alarm sequence x_ASWith the first data setIn m-th of sequenceWhen carrying out similitude comparison, construct similarity scores matrix H (m), Initialization the first row and first is classified as 0, and matrix size is j × (m_l+ 1), m_lFor x '_AF(m) length,

Wherein (1≤i '≤j-1,1≤j '≤m_l), W_k′=k ' W₁Be plug hole length be k ' when point penalty, W₁For unit point penalty,It is direct matching x_i′WithScore, H (m)_{i′-k′,j′}-W_k′It is x_i′Preceding plug hole length k's ' penalizes Point, H (m)_{i′,j′-l′}-W_l′It isThe point penalty of preceding plug hole length l ', 0 indicates to be matched to x_i′WithTime series without similitude, Maximum element is x in final score matrix H (m)_ASWith x '_AF(m) similarity scores；By the first data set In alarm spread unchecked sequence and sorted from high to low according to similarity scores, obtain the second data set

4. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that described In step (3), set to the second data setThe length for the time window being segmented is n and n-1, time window Sliding size be 1.

5. prediction technique is spread unchecked in the industry alarm according to claim 4 based on N-gram model, which is characterized in that step (3) it is asked in by the second data setNext alarm variable and corresponding probability being likely to occur when as sample Specific implementation process is：

It is to the second data set respectively with time window n and n-1Interior data count each data after being segmented The quantity of section, records into data set

Predict emerging alarm sequenceSubsequent time t_jWhen the alarm of generation, look into first Ask X_DSInCorresponding quantity andAll alarms that subsequent time occursIt looks into It askes Corresponding quantity finds out conditional probability using maximum likelihood rule approximation

SettingIn make conditional probabilityCorresponding element is output result when maximumNote

6. prediction technique is spread unchecked in the industry alarm according to claim 5 based on N-gram model, which is characterized in that n's Preferred value is 3.

7. prediction technique is spread unchecked in the industry alarm according to claim 5 based on N-gram model, which is characterized in that step (4) detailed process realized is：

Set occurent sequence x_ASSubsequent time enter the alarm variable of alarm conditionFor eventIts probability of happening ForAssuming that it is defined to be uniformly distributedFor prior probability；

Set the second data setIn there is k alarm to spread unchecked sequence and includeThere is l alarm to spread unchecked sequence packet ContainThen conditional probabilityExpression eventIt is known In the case of eventThe probability of generation, wherein

Then by eventEmerging alarm sequence x is speculated in the case where determination_ASIt is report that subsequent time, which enters alarm condition, Alert variableProbability beDefinitionFor posterior probability；

It is calculated using probability mass function PMF, obtains posterior probabilityProbability mass function f_X；

8. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that step (5) specific implementation process is：

N times iteration is carried out to step (3) and step (4), kth deletes K (0≤K≤N) data and concentrates similarity scores minimum Sequence, use data setIt is predicted, it is final to compare confidence interval [Z in n times result₁,Z₂] lower limit Z₁, Take confidence interval [Z₁,Z₂] lower limit Z₁Highest result is optimum prediction as a result, corresponding data setIt is best Predictive data set.