CN108922140A - It is a kind of based on N-gram model industry alarm spread unchecked prediction technique - Google Patents
It is a kind of based on N-gram model industry alarm spread unchecked prediction technique Download PDFInfo
- Publication number
- CN108922140A CN108922140A CN201810889499.4A CN201810889499A CN108922140A CN 108922140 A CN108922140 A CN 108922140A CN 201810889499 A CN201810889499 A CN 201810889499A CN 108922140 A CN108922140 A CN 108922140A
- Authority
- CN
- China
- Prior art keywords
- alarm
- data set
- probability
- sequence
- spread unchecked
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B29/00—Checking or monitoring of signalling or alarm systems; Prevention or correction of operating errors, e.g. preventing unauthorised operation
- G08B29/18—Prevention or correction of operating errors
- G08B29/185—Signal analysis techniques for reducing or preventing false alarms or for enhancing the reliability of the system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B31/00—Predictive alarm systems characterised by extrapolation or other computation using updated historic data
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A10/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
- Y02A10/40—Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Business, Economics & Management (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Operations Research (AREA)
- Computer Security & Cryptography (AREA)
- Algebra (AREA)
- Computing Systems (AREA)
- Emergency Management (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention belongs to field of signal processing more particularly to a kind of industry alarm based on N-gram model to spread unchecked prediction technique, include the following steps:(1) it obtains history alarm and spreads unchecked data set, count variable of wherein alarming, the discrimination for calculating each alarm variable simultaneously rejects the alarm variable that discrimination is 0;(2) by treated data set sequence and emerging sequence do similitude comparison one by one, and sequence is arranged from high to low according to similarity scores;(3) to treated again, data set is segmented setting time window, and counts the quantity of each data segment, is found out and is calculated next alarm variable and corresponding probability being likely to occur using sample data set;(4) by bayesian probability model find out the alarm for predicting next appearance probability and corresponding confidence interval;(5) operation is iterated to step (3) and (4).It is true that the present invention solves the problems, such as to carry out forecasting inaccuracy when prediction is spread unchecked in alarm at present.
Description
Technical field
The invention belongs to field of signal processing more particularly to a kind of industry alarm based on N-gram model to spread unchecked prediction side
Method.
Background technique
In current industrial circle, alarm system is as being monitored the abnormal conditions in industrial process and alarm
Role is widely used.However current industrial alarm system still remains many problems, such as creates disturbances to alarm, resides alarm
It is spread unchecked with alarm.It creates disturbances to alarm and refers to a large amount of meaningless of short time interior generation, do not need what operator was responded
The presence of alarm, these alarms can reduce operator to the responding ability really alarmed;Resident alarm refers to holding after occurring
The alarm of some time is held in continuation of insurance, these alarms are not fallen clearly still after operator takes movement usually, will affect operator couple
The judgement of working state of system;Alarm, which is spread unchecked, refers to many alarms of generation in the short time, and is usually to be triggered by single incident
, these alarms have been usually more than the processing limit of operator, solve more complicated.
It is directed to the research work alarmed and spread unchecked at present mainly in terms of spreading unchecked the similarity analysis of sequence for alarm, and needle
The timely research for prejudging and handling spread unchecked to the alarm occurred in real time in industrial process or blank out.
The prediction spread unchecked of alarming, which refers to, spreads unchecked emerging alarm, and system can predict next possible generation
Alarm, to allow the operator to be operated in advance.The problem of prediction spread unchecked at present about alarm is primarily present has:
1) not to history alarm spread unchecked classify under the premise of directly predict, cause it is certain as caused by different event still
The result for thering is part to alarm again to mislead when the alarm of identical history is spread unchecked for predicting;2) history alarm is not accounted for spread unchecked
Influence of the quantity of data for prediction result in database, from the point of view of practical experience, historical data is more, for the knot of prediction
Fruit will be more accurate, and traditional n-gram prediction technique, which not can reflect historical data quantity, influences result bring, leads
The prediction result for causing output error, influences the judgement of operator.
Two above problem is that alarm forecasting reliability causes obstacle, if not solving that mistake will be may cause
Alarm prediction, influences the judgement of operator, causes safety and economic loss in industrial processes.
Summary of the invention
According to the above-mentioned deficiencies of the prior art, the industry alarm based on N-gram model that the present invention provides a kind of is spread unchecked pre-
Survey method can solve and not account for the history classification spread unchecked of alarm and data in database are spread unchecked in history alarm at present
Alarm is carried out in the case where quantity spreads unchecked prediction and the true problem of the forecasting inaccuracy that occurs.
Present invention solves the technical problem that the technical solution used includes the following steps:
(1) it obtains history alarm and spreads unchecked data setCount variable of wherein alarmingCalculate each alarm
The discrimination D of variablei, rejecting pretreatment is carried out for 0 alarm variable to the indexing of data concentration zones, forms the first data set
(2) by the first data setIn m-th of history alarm spread unchecked sequence and emerging alarm sequenceSimilitude comparison is carried out one by one, and the sequence after matching is arranged from high to low according to similarity scores, shape
At the second data set
(3) the sliding size of setting time window and time window is to the second data setIt is segmented, and
The quantity for counting each data segment is found out by the second data setNext alarm being likely to occur when as sample
Variable and corresponding probability;
(4) by bayesian probability model find out the alarm for predicting next appearance probability and corresponding confidence interval
[Z1,Z2];
(5) n times interative computation is carried out to step (3) and step (4), takes confidence interval [Z1,Z2] lower limit Z1Highest result
Sequence in corresponding data set is optimum prediction data set, and corresponding prediction result is exported as final prediction result.
Further, step (1) is implemented as:
Alarm variable xiDiscrimination Di=log N/ | m:xi∈xAF(m) }, wherein N represents data setMiddle packet
The number of sequence is spread unchecked in the history alarm contained, | m:xi∈xAF(m) indicate that the history alarm in data set comprising the alarm element is general
The number of indiscriminate sequence rejects the alarm variable that discrimination is 0 and obtains the first data set laterWhen discrimination is 0,
Illustrate that data set is spread unchecked in alarmIn spread unchecked by the alarm that various failures cause have this alarm occurrences, because
This is it is considered that this alarm variable is the nonsensical alarm that creates disturbances to, therefore this alarm variable is directly spread unchecked data from alarm
Concentrate removal.
Further, the specific implementation process of step (2) is:
Similarity scores are calculated using smith-waterman algorithm, as emerging alarm sequence xASWith the first data
CollectionIn m-th of sequenceWhen carrying out similitude comparison, similarity scores matrix H is constructed
(m), it initializes the first row and first and is classified as 0, matrix size is j × (ml+ 1), mlFor x 'AF(m) length,
Wherein (1≤i '≤j-1,1≤j '≤ml), Wk′=k ' W1Be plug hole length be k ' when point penalty, W1It is penalized for unit
Point,It is direct matching xi′WithScore, H (m)i′-k′,j′-Wk′It is xi′Preceding plug hole length k's '
Point penalty, H (m)i′,j′-l′-Wl′It isThe point penalty of preceding plug hole length l ', 0 indicates to be matched to xi′WithTime series is not similar
Property, maximum element is x in final score matrix H (m)ASWith x 'AF(m) similarity scores;By the first data setIn alarm spread unchecked sequence and sorted from high to low according to similarity scores, obtain the second data set
Further, it in the step (3), sets to the second data setThe time window being segmented
Length is n and n-1, and the sliding size of time window is 1, is asked by the second data setIt is next when as sample can
The specific implementation process of the alarm variable and corresponding probability that can occur is, with time window n and n-1 respectively to the second data set
ForInterior data count the quantity of each data segment after being segmented, record into data setTo emerging alarm sequenceBecause same failure causes
Alarm spread unchecked sequence similarity height, therefore predict subsequent time tjWhen the alarm that the moment occurs, the second data set can refer toMiddle alert data segmentEnter the alarm variable of alarm condition later, it is all in this number
It can be seen as being next alarm being likely to occur according to the alarm variable for entering alarm condition after segment, avoid in this way
It is that different faults cause but there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result, so inquiry XDS
InCorresponding quantity andAll alarms that subsequent time occursInquiryCorresponding quantity finds out conditional probability using maximum likelihood rule approximationSetting
In make conditional probabilityCorresponding element is output result when maximumNote
Further, it is contemplated that the accuracy and arithmetic speed of prediction, the preferred value of n are 3.
Further, in step (4), it is contemplated that data set size meeting is spread unchecked in the history alarm in actual conditions for prediction
The reliability of prediction result is had an impact, for example is predicted down with the data set that sequence composition is spread unchecked in 10 and 100 alarms
One alarm is alarm variable xj0Probability be all 80%, it is clear that 100 alarm spread unchecked sequence composition data set prediction knot
Fruit wants relatively reliable, and the influence in step 3 for data set size to result has no embodiment, thus we to introduce Bayes general
Rate model carrys out innovatory algorithm.
Set occurent sequence xASSubsequent time enter the alarm variable of alarm conditionFor eventIt is sent out
Giving birth to probability isAssuming that it is defined to be uniformly distributedFor prior probability;
Setting is by the second data setIt is alarm variable as the next alarm of sample predictionsFor event
Set the second data setIn there is k alarm to spread unchecked sequence and includeThere is l alarm to spread unchecked sequence
Column includeThen conditional probabilityExpression event?
Event in the case where knowingThe probability of generation, wherein
Then by eventEmerging alarm sequence x is speculated in the case where determinationASSubsequent time enters alarm condition
For variable of alarmingProbability beIt is fixed
JusticeFor posterior probability;
Wherein, because of posterior probabilityIt can be according to prior probabilityValue it is different and change, and thing
PartObedience is uniformly distributed, and is calculated using probability mass function (PMF), obtains posterior probabilityProbability matter
Flow function fX;
To probability mass function fXSummation is that final prediction and alarm is alarm variableProbability;
Confidence level 1- α is provided, to probability mass function fXSummation, finds out section [Z1,Z2],This section, which is exactly that confidence level 1- α is corresponding, sets
Believe section.
We introduce the concept of confidence interval, are the credibilities in order to embody the probability value of prediction.Confidence interval embodies
The probability of the next alarm actually occurred falls in the degree around prediction probability result, it is assumed that confidence level 95% sets
Believe section [Z1,Z2] lower limit mean a possibility that we have 95% assurance to guarantee the alarm of next generation at least Zβ。
Further, the specific implementation process of step (5) is:
N times iteration is carried out to step (3) and step (4), kth deletes K (0≤K≤N) data and concentrates similarity scores
Minimum sequence, uses data setIt is predicted, it is final to compare confidence interval [Z in n times result1,Z2] lower limit
Z1, take confidence interval [Z1,Z2] lower limit Z1Highest result is optimum prediction as a result, corresponding data setFor most
Good predictive data set because lower limit of confidence interval has reacted the minimum assurance that prediction result is target alarm, therefore selects confidence area
Between sequence in the corresponding data set of the highest result of lower limit be optimum prediction data set.
The beneficial effects of the invention are as follows:(1) the invention proposes the data set and the alarm to be predicted that use prediction are general
The indiscriminate method for carrying out similitude comparison and being resequenced from high to low according to similarity scores, and iteration reduction affinity score is low
Sequence, until obtaining optimum prediction result, it is contemplated that the history alarm caused by same failure, which is spread unchecked, can improve the accurate of prediction
Property, avoid different faults initiation but there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result;
(2) the N-gram prediction technique proposed by the present invention based on bayesian probability model, it is contemplated that the history alarm for prediction is general
Indiscriminate influence of the data set size for prediction result, embodies the credibility of prediction result, avoids because sample size is insufficient
It misleads to prediction result.
Detailed description of the invention
Fig. 1 is that industrial alarm described in the embodiment of the present invention based on N-gram model spreads unchecked the flow chart of prediction technique;
Fig. 2 is the probability mass function of prediction result in concrete application scene of the present invention;
Fig. 3 is that prediction probability and confidence interval with the history alarm for prediction spread unchecked number in concrete application scene of the present invention
The schematic diagram of mesh variation and variation;
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.
Embodiment one:
Such as Fig. 1, prediction technique is spread unchecked in a kind of industry alarm based on N-gram model of the present invention, including is walked as follows
Suddenly:
S1 obtains history alarm and spreads unchecked data setCount variable of wherein alarmingCalculate each alarm
The discrimination D of variablei, rejecting pretreatment is carried out for 0 alarm variable to the indexing of data concentration zones, forms the first data set
S2, by the first data setIn m-th of history alarm spread unchecked sequence and emerging alarm sequenceSimilitude comparison is carried out one by one, and the sequence after matching is arranged from high to low according to similarity scores,
Form the second data set
The sliding size of S3, setting time window and time window is to the second data setIt is segmented, and
The quantity for counting each data segment is found out by the second data setNext alarm being likely to occur when as sample
Variable and corresponding probability;
S4, by bayesian probability model find out the alarm for predicting next appearance probability and corresponding confidence interval
[Z1,Z2];
S5 carries out n times interative computation to step (3) and step (4), takes confidence interval [Z1,Z2] lower limit Z1Highest result
Sequence in corresponding data set is optimum prediction data set, and corresponding prediction result is exported as final prediction result.
In step sl, it obtains history alarm and spreads unchecked data setData set is spread unchecked in statistical history alarmIn include alarm variableCalculate the discrimination D of each alarm variablei;Alarm variable xiDiscrimination be
Di=log N/ | m:xi∈xAF(m) }, wherein N represents data setIn include history alarm spread unchecked the number of sequence
Mesh, | m:xi∈xAF(m) indicate that the number of sequence is spread unchecked in the history alarm in data set comprising the alarm element;Reject discrimination
To obtain the first data set after 0 alarm variableWhen discrimination is 0, illustrate that data set is spread unchecked in alarmIn spread unchecked by the alarm that various failures cause have this alarm occurrences, therefore it is considered that this alarm
Variable is the nonsensical alarm that creates disturbances to, therefore this alarm variable is directly spread unchecked in data set from alarm and is removed.
In step s 2, using smith-waterman algorithm, as emerging alarm sequence xASWith the first data setIn m-th of sequenceWhen carrying out similitude comparison, construct similarity scores matrix H (m),
Initialization the first row and first is classified as 0, and matrix size is j × (ml+ 1), mlFor x 'AF(m) length initializes the first row and the
One is classified as 0, and matrix size is j × (ml+ 1), mlFor x 'AF(m) length:
Wherein (1≤i '≤j-1,1≤j '≤ml), Wk′=k ' W1Be plug hole length be k ' when point penalty, W1It is penalized for unit
Point;
It is direct matching xi′WithScore;
H(m)i′-k′,j′-Wk′It is xi′The point penalty of preceding plug hole length k ';
H(m)i′,j′-l′-Wl′It isThe point penalty of preceding plug hole length l ';
0 means to be matched to xi′WithTime series is without similitude.
Matrix is updated according to the above rule, maximum element is x in final score matrix H (m)ASWith x 'AF
(m) sequence in the first data set is ranked up by similarity scores from high to low according to similarity scores, obtains the second number
According to collection, it is denoted as
In step S3, the length that can use time window is n and n-1, and the sliding size of time window is T=1s, to second
Data setInterior data count the quantity of each data segment after being segmented respectively with time window, record into data
CollectionTo emerging alarm sequenceBecause same failure is drawn
Sequence similarity height is spread unchecked in the alarm of hair, therefore predicts subsequent time tjWhen the alarm that the moment occurs, the second data set can refer toMiddle alert data segmentEnter the alarm variable of alarm condition later, it is all in this number
It can be seen as being next alarm being likely to occur according to the alarm variable for entering alarm condition after segment, avoid in this way
It is that different faults cause but there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result, specially inquire
XDSInCorresponding quantity andAll alarms that subsequent time occursInquiryCorresponding quantity finds out conditional probability using maximum likelihood rule approximation:
Obviously, conditional probabilityCorresponding alarm variable when being maximizedIt is exactly that subsequent time most has
Possibly into the alarm variable of alarm condition, note makes conditional probabilityAlarm variable when being maximized is
In addition, it is contemplated that the value of n is selected as 3 by the accuracy and arithmetic speed of prediction, the present invention.
In step s 4, it is contemplated that data set size is spread unchecked in the history alarm in actual conditions for prediction to tie prediction
The reliability of fruit has an impact, for example predicts next alarm with the data set that sequence composition is spread unchecked in 10 and 100 alarms
For the variable x that alarmsj0Probability be all 80%, it is clear that the prediction result of data set that sequence composition is spread unchecked in 100 alarms will be more
Reliably, and the influence in step 3 for data set size to result has no embodiment, therefore we introduce bayesian probability model
Innovatory algorithm.
It is implemented as:
Set occurent sequence xASSubsequent time enter the alarm variable of alarm conditionFor eventIt is sent out
Giving birth to probability isAssuming that it is defined to be uniformly distributedFor prior probability;
Setting is by the second data setIt is alarm variable as the next alarm of sample predictionsFor event
Set the second data setIn there is k alarm to spread unchecked sequence and includeThere is l alarm to spread unchecked sequence
Column includeThen conditional probabilityExpression event?
Event in the case where knowingThe probability of generation, wherein
Then by eventOccurent alarm sequence x is speculated in the case where determinationASSubsequent time enters alarm condition
For variable of alarmingProbability beIt is fixed
JusticeFor posterior probability;Wherein, because of posterior probabilityIt can be according to prior probabilityValue it is different
And change, and eventObedience is uniformly distributed, so being calculated using probability mass function (PMF), obtains posterior probabilityProbability mass function fX;
To probability mass function fXSummation is that final prediction and alarm isProbability;
Confidence level 1- α is provided, to probability mass function fXSummation, finds out section [Z1,Z2],This section, which is exactly that confidence level 1- α is corresponding, sets
Believe section.
We introduce the concept of confidence interval, are the credibilities in order to embody the probability value of prediction.Confidence interval embodies
The probability of the next alarm actually occurred falls in the degree around prediction probability result, it is assumed that confidence level 95% sets
Believe section [Z1,Z2] lower limit mean a possibility that we have 95% assurance to guarantee the alarm of next generation at least Zβ。
In step s 5, n times interative computation is carried out to step S3 and step S4, kth iteration deletes K (0≤K≤N) item
The history alarm that similarity scores are minimum in data set spreads unchecked sequence until data set is sky.Use data setIt is predicted, it is final to compare lower limit of confidence interval Z in n times result1, take the highest result of lower limit of confidence interval to be
Optimum prediction is as a result, corresponding data setFor optimum prediction data set.Because lower limit of confidence interval has reacted pre-
Survey result be target alarm minimum assurance, therefore select the sequence in the corresponding data set of the highest result of lower limit of confidence interval for
Optimum prediction data set.
Progress similitude, which is spread unchecked, the invention proposes the data set and the alarm to be predicted that use prediction compares simultaneously basis
The method that similarity scores are resequenced from high to low, and iteration reduces the low sequence of affinity score, until obtaining optimum prediction
As a result, it is contemplated that the accuracy that can improve prediction is spread unchecked in the history alarm caused by same failure, avoids different faults initiation
But there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result;What is proposed simultaneously is general based on Bayes
The N-gram prediction technique of rate model, it is contemplated that the history alarm for prediction spreads unchecked data set size for the shadow of prediction result
It rings, embodies the credibility of prediction result, avoid because sample size deficiency misleads to prediction result.
It is application of the method for the invention in concrete application scene below.
One group of historical data alarm, which is provided, using TE process simulation spreads unchecked data.Wherein sequence number is the size of 100, n
It is set as 3.
Step (1) calculates the discrimination of each alarm variable in data set, weeds out the alarm variable that discrimination is 0.
Step (2), according to sequence similarity score aligning method by all alarms in data set spread unchecked sequence with
The alarm sequence of appearance compares one by one and according to the descending rearrangement of similarity scores.
Step (3), the sequence concentrated to data is segmented, and counts each data segment quantity, according to nearest two moment
Historical data section is searched in the alarm of appearance from data set, and finds out corresponding conditional probability by maximum likelihood rule approximation.
Step (4) finds out the corresponding probability mass function of prediction result using bayesian probability model, by probability matter
Flow function is summed into obtaining prediction probability, and finds out the confidence interval under corresponding confidence level 95%.As shown in Fig. 2, confidence area
Between be (47.17%, 55.81%).
Step (5) is iterated calculating to data set, repeats step (3) and (4), prediction probability in all results is taken to set
The letter highest result of interval limit is final output.As shown in figure 3, optimal prediction result is x11, prediction probability is
98.82%, confidence interval is [97.33%, 100%] (confidence level 90%), illustrates that system has 90% assurance to think next
The alarm of appearance is x11Minimum probability be 97.33%.
Claims (8)
1. prediction technique is spread unchecked in a kind of industry alarm based on N-gram model, which is characterized in that include the following steps:
(1) it obtains history alarm and spreads unchecked data setCount variable of wherein alarmingCalculate each alarm variable
Discrimination Di, rejecting pretreatment is carried out for 0 alarm variable to the indexing of data concentration zones, forms the first data set
(2) by the first data setIn m-th of history alarm spread unchecked sequence and emerging alarm sequenceSimilitude comparison is carried out one by one, and the sequence after matching is arranged from high to low according to similarity scores,
Form the second data set
(3) the sliding size of setting time window and time window is to the second data setIt is segmented, and counts every
The quantity of a data segment is found out by the second data setWhen as sample next alarm variable being likely to occur and
Corresponding probability;
(4) probability and corresponding confidence interval [Z of the alarm for predicting next appearance are found out by bayesian probability model1,
Z2];
(5) n times interative computation is carried out to step (3) and step (4), takes confidence interval [Z1,Z2] lower limit Z1Highest result is corresponding
Data set in sequence be optimum prediction data set, and corresponding prediction result is exported as final prediction result.
2. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that step
(1) be implemented as:
Alarm variable xiDiscrimination Di=log N/ | m:xi∈xAF(m) }, wherein N represents data setIn include
History, which is alarmed, spreads unchecked the number of sequence, | m:xi∈xAF(m) indicate that sequence is spread unchecked in the history alarm in data set comprising the alarm element
The number of column.
3. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that step
(2) specific implementation process is:
Similarity scores are calculated using smith-waterman algorithm, as emerging alarm sequence xASWith the first data setIn m-th of sequenceWhen carrying out similitude comparison, construct similarity scores matrix H (m),
Initialization the first row and first is classified as 0, and matrix size is j × (ml+ 1), mlFor x 'AF(m) length,
Wherein (1≤i '≤j-1,1≤j '≤ml), Wk′=k ' W1Be plug hole length be k ' when point penalty, W1For unit point penalty,It is direct matching xi′WithScore, H (m)i′-k′,j′-Wk′It is xi′Preceding plug hole length k's ' penalizes
Point, H (m)i′,j′-l′-Wl′It isThe point penalty of preceding plug hole length l ', 0 indicates to be matched to xi′WithTime series without similitude,
Maximum element is x in final score matrix H (m)ASWith x 'AF(m) similarity scores;By the first data set
In alarm spread unchecked sequence and sorted from high to low according to similarity scores, obtain the second data set
4. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that described
In step (3), set to the second data setThe length for the time window being segmented is n and n-1, time window
Sliding size be 1.
5. prediction technique is spread unchecked in the industry alarm according to claim 4 based on N-gram model, which is characterized in that step
(3) it is asked in by the second data setNext alarm variable and corresponding probability being likely to occur when as sample
Specific implementation process is:
It is to the second data set respectively with time window n and n-1Interior data count each data after being segmented
The quantity of section, records into data set
Predict emerging alarm sequenceSubsequent time tjWhen the alarm of generation, look into first
Ask XDSInCorresponding quantity andAll alarms that subsequent time occursIt looks into
It askes Corresponding quantity finds out conditional probability using maximum likelihood rule approximation
SettingIn make conditional probabilityCorresponding element is output result when maximumNote
6. prediction technique is spread unchecked in the industry alarm according to claim 5 based on N-gram model, which is characterized in that n's
Preferred value is 3.
7. prediction technique is spread unchecked in the industry alarm according to claim 5 based on N-gram model, which is characterized in that step
(4) detailed process realized is:
Set occurent sequence xASSubsequent time enter the alarm variable of alarm conditionFor eventIts probability of happening
ForAssuming that it is defined to be uniformly distributedFor prior probability;
Setting is by the second data setIt is alarm variable as the next alarm of sample predictionsFor event
Set the second data setIn there is k alarm to spread unchecked sequence and includeThere is l alarm to spread unchecked sequence packet
ContainThen conditional probabilityExpression eventIt is known
In the case of eventThe probability of generation, wherein
Then by eventEmerging alarm sequence x is speculated in the case where determinationASIt is report that subsequent time, which enters alarm condition,
Alert variableProbability beDefinitionFor posterior probability;
It is calculated using probability mass function PMF, obtains posterior probabilityProbability mass function fX;
To probability mass function fXSummation is that final prediction and alarm is alarm variableProbability;
Confidence level 1- α is provided, to probability mass function fXSummation, finds out section [Z1,Z2],This section, which is exactly that confidence level 1- α is corresponding, sets
Believe section.
8. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that step
(5) specific implementation process is:
N times iteration is carried out to step (3) and step (4), kth deletes K (0≤K≤N) data and concentrates similarity scores minimum
Sequence, use data setIt is predicted, it is final to compare confidence interval [Z in n times result1,Z2] lower limit Z1,
Take confidence interval [Z1,Z2] lower limit Z1Highest result is optimum prediction as a result, corresponding data setIt is best
Predictive data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810889499.4A CN108922140B (en) | 2018-08-07 | 2018-08-07 | It is a kind of based on N-gram model industry alarm spread unchecked prediction technique |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810889499.4A CN108922140B (en) | 2018-08-07 | 2018-08-07 | It is a kind of based on N-gram model industry alarm spread unchecked prediction technique |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108922140A true CN108922140A (en) | 2018-11-30 |
CN108922140B CN108922140B (en) | 2019-08-16 |
Family
ID=64394689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810889499.4A Active CN108922140B (en) | 2018-08-07 | 2018-08-07 | It is a kind of based on N-gram model industry alarm spread unchecked prediction technique |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108922140B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112688946A (en) * | 2020-12-24 | 2021-04-20 | 工业信息安全(四川)创新中心有限公司 | Method, module, storage medium, device and system for constructing abnormality detection features |
US11212162B2 (en) * | 2019-07-18 | 2021-12-28 | International Business Machines Corporation | Bayesian-based event grouping |
CN115909697A (en) * | 2023-02-15 | 2023-04-04 | 山东科技大学 | Alarm state prediction method and system based on amplitude change trend probability inference |
WO2023124778A1 (en) * | 2021-12-29 | 2023-07-06 | 浙江中控技术股份有限公司 | Real-time alarm tracing apparatus and method in process industry production procedure |
CN117118811A (en) * | 2023-10-25 | 2023-11-24 | 南京邮电大学 | Alarm analysis method for industrial alarm flooding |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104456092A (en) * | 2014-12-02 | 2015-03-25 | 中国石油大学(华东) | Multidimensional assessment method of petroleum and natural gas pipeline warning priority |
US20150123784A1 (en) * | 2013-11-03 | 2015-05-07 | Teoco Corporation | System, Method, and Computer Program Product for Identification and Handling of a Flood of Alarms in a Telecommunications System |
CN105006119A (en) * | 2015-06-30 | 2015-10-28 | 中国寰球工程公司 | Alarm system optimization method based on bayesian network |
CN107748901A (en) * | 2017-11-24 | 2018-03-02 | 东北大学 | The industrial process method for diagnosing faults returned based on similitude local spline |
-
2018
- 2018-08-07 CN CN201810889499.4A patent/CN108922140B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150123784A1 (en) * | 2013-11-03 | 2015-05-07 | Teoco Corporation | System, Method, and Computer Program Product for Identification and Handling of a Flood of Alarms in a Telecommunications System |
CN104456092A (en) * | 2014-12-02 | 2015-03-25 | 中国石油大学(华东) | Multidimensional assessment method of petroleum and natural gas pipeline warning priority |
CN105006119A (en) * | 2015-06-30 | 2015-10-28 | 中国寰球工程公司 | Alarm system optimization method based on bayesian network |
CN107748901A (en) * | 2017-11-24 | 2018-03-02 | 东北大学 | The industrial process method for diagnosing faults returned based on similitude local spline |
Non-Patent Citations (2)
Title |
---|
王佳 等: "工业报警序列的模糊关联规则挖掘方法", 《化工学报》 * |
陈忠圣 等: "基于离散傅里叶变换的过程工业报警泛滥序列聚类分析及应用", 《化工学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11212162B2 (en) * | 2019-07-18 | 2021-12-28 | International Business Machines Corporation | Bayesian-based event grouping |
CN112688946A (en) * | 2020-12-24 | 2021-04-20 | 工业信息安全(四川)创新中心有限公司 | Method, module, storage medium, device and system for constructing abnormality detection features |
CN112688946B (en) * | 2020-12-24 | 2022-06-24 | 工业信息安全(四川)创新中心有限公司 | Method, module, storage medium, device and system for constructing abnormality detection features |
WO2023124778A1 (en) * | 2021-12-29 | 2023-07-06 | 浙江中控技术股份有限公司 | Real-time alarm tracing apparatus and method in process industry production procedure |
CN115909697A (en) * | 2023-02-15 | 2023-04-04 | 山东科技大学 | Alarm state prediction method and system based on amplitude change trend probability inference |
CN117118811A (en) * | 2023-10-25 | 2023-11-24 | 南京邮电大学 | Alarm analysis method for industrial alarm flooding |
Also Published As
Publication number | Publication date |
---|---|
CN108922140B (en) | 2019-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108922140B (en) | It is a kind of based on N-gram model industry alarm spread unchecked prediction technique | |
Zaman et al. | Evaluation of machine learning techniques for network intrusion detection | |
CN111475804B (en) | Alarm prediction method and system | |
Salih et al. | Evaluation of classification algorithms for intrusion detection system: A review | |
CN111708343B (en) | Method for detecting abnormal behavior of field process behavior in manufacturing industry | |
CN108449366B (en) | Key message infrastructure security based on artificial intelligence threatens intelligence analysis system | |
Akter et al. | Improved machine learning based classification model for early autism detection | |
Jiang et al. | Electrical-STGCN: An electrical spatio-temporal graph convolutional network for intelligent predictive maintenance | |
CN102521534A (en) | Intrusion detection method based on crude entropy property reduction | |
Bouhoute et al. | On the application of machine learning for cut-in maneuver recognition in platooning scenarios | |
CN111325410A (en) | General fault early warning system based on sample distribution and early warning method thereof | |
Pednekar et al. | Crime rate prediction using KNN | |
CN113705714A (en) | Power distribution Internet of things equipment abnormal behavior detection method and device based on behavior sequence | |
Werner et al. | Near real-time intrusion alert aggregation using concept-based learning | |
Kanumalli et al. | A scalable network intrusion detection system using bi-lstm and cnn | |
Belavadi et al. | Alarm pattern recognition in continuous process control systems using data mining | |
Acharya et al. | Efficacy of CNN-bidirectional LSTM hybrid model for network-based anomaly detection | |
Lam | Detecting unauthorized network intrusion based on network traffic using behavior analysis techniques | |
CN110837953A (en) | Automatic abnormal entity positioning analysis method | |
Tahri et al. | A comparative study of Machine learning Algorithms on the UNSW-NB 15 Dataset | |
CN111221704B (en) | Method and system for determining running state of office management application system | |
Kim et al. | Anomaly pattern detection in streaming data based on the transformation to multiple binary-valued data streams | |
CN113323699A (en) | Method for accurately identifying fault source of hydraulic support system based on data driving | |
Peruzzo et al. | Pattern-based feature extraction for fault detection in quality relevant process control | |
Santhi et al. | A Hybrid feature extraction method with machine learning for detecting the presence of network attacks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |