CN108922140A - It is a kind of based on N-gram model industry alarm spread unchecked prediction technique - Google Patents

It is a kind of based on N-gram model industry alarm spread unchecked prediction technique Download PDF

Info

Publication number
CN108922140A
CN108922140A CN201810889499.4A CN201810889499A CN108922140A CN 108922140 A CN108922140 A CN 108922140A CN 201810889499 A CN201810889499 A CN 201810889499A CN 108922140 A CN108922140 A CN 108922140A
Authority
CN
China
Prior art keywords
alarm
data set
probability
sequence
spread unchecked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810889499.4A
Other languages
Chinese (zh)
Other versions
CN108922140B (en
Inventor
王建东
徐洲
徐一洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Science and Technology
Original Assignee
Shandong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Science and Technology filed Critical Shandong University of Science and Technology
Priority to CN201810889499.4A priority Critical patent/CN108922140B/en
Publication of CN108922140A publication Critical patent/CN108922140A/en
Application granted granted Critical
Publication of CN108922140B publication Critical patent/CN108922140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B29/00Checking or monitoring of signalling or alarm systems; Prevention or correction of operating errors, e.g. preventing unauthorised operation
    • G08B29/18Prevention or correction of operating errors
    • G08B29/185Signal analysis techniques for reducing or preventing false alarms or for enhancing the reliability of the system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B31/00Predictive alarm systems characterised by extrapolation or other computation using updated historic data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Computer Security & Cryptography (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Emergency Management (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to field of signal processing more particularly to a kind of industry alarm based on N-gram model to spread unchecked prediction technique, include the following steps:(1) it obtains history alarm and spreads unchecked data set, count variable of wherein alarming, the discrimination for calculating each alarm variable simultaneously rejects the alarm variable that discrimination is 0;(2) by treated data set sequence and emerging sequence do similitude comparison one by one, and sequence is arranged from high to low according to similarity scores;(3) to treated again, data set is segmented setting time window, and counts the quantity of each data segment, is found out and is calculated next alarm variable and corresponding probability being likely to occur using sample data set;(4) by bayesian probability model find out the alarm for predicting next appearance probability and corresponding confidence interval;(5) operation is iterated to step (3) and (4).It is true that the present invention solves the problems, such as to carry out forecasting inaccuracy when prediction is spread unchecked in alarm at present.

Description

It is a kind of based on N-gram model industry alarm spread unchecked prediction technique
Technical field
The invention belongs to field of signal processing more particularly to a kind of industry alarm based on N-gram model to spread unchecked prediction side Method.
Background technique
In current industrial circle, alarm system is as being monitored the abnormal conditions in industrial process and alarm Role is widely used.However current industrial alarm system still remains many problems, such as creates disturbances to alarm, resides alarm It is spread unchecked with alarm.It creates disturbances to alarm and refers to a large amount of meaningless of short time interior generation, do not need what operator was responded The presence of alarm, these alarms can reduce operator to the responding ability really alarmed;Resident alarm refers to holding after occurring The alarm of some time is held in continuation of insurance, these alarms are not fallen clearly still after operator takes movement usually, will affect operator couple The judgement of working state of system;Alarm, which is spread unchecked, refers to many alarms of generation in the short time, and is usually to be triggered by single incident , these alarms have been usually more than the processing limit of operator, solve more complicated.
It is directed to the research work alarmed and spread unchecked at present mainly in terms of spreading unchecked the similarity analysis of sequence for alarm, and needle The timely research for prejudging and handling spread unchecked to the alarm occurred in real time in industrial process or blank out.
The prediction spread unchecked of alarming, which refers to, spreads unchecked emerging alarm, and system can predict next possible generation Alarm, to allow the operator to be operated in advance.The problem of prediction spread unchecked at present about alarm is primarily present has: 1) not to history alarm spread unchecked classify under the premise of directly predict, cause it is certain as caused by different event still The result for thering is part to alarm again to mislead when the alarm of identical history is spread unchecked for predicting;2) history alarm is not accounted for spread unchecked Influence of the quantity of data for prediction result in database, from the point of view of practical experience, historical data is more, for the knot of prediction Fruit will be more accurate, and traditional n-gram prediction technique, which not can reflect historical data quantity, influences result bring, leads The prediction result for causing output error, influences the judgement of operator.
Two above problem is that alarm forecasting reliability causes obstacle, if not solving that mistake will be may cause Alarm prediction, influences the judgement of operator, causes safety and economic loss in industrial processes.
Summary of the invention
According to the above-mentioned deficiencies of the prior art, the industry alarm based on N-gram model that the present invention provides a kind of is spread unchecked pre- Survey method can solve and not account for the history classification spread unchecked of alarm and data in database are spread unchecked in history alarm at present Alarm is carried out in the case where quantity spreads unchecked prediction and the true problem of the forecasting inaccuracy that occurs.
Present invention solves the technical problem that the technical solution used includes the following steps:
(1) it obtains history alarm and spreads unchecked data setCount variable of wherein alarmingCalculate each alarm The discrimination D of variablei, rejecting pretreatment is carried out for 0 alarm variable to the indexing of data concentration zones, forms the first data set
(2) by the first data setIn m-th of history alarm spread unchecked sequence and emerging alarm sequenceSimilitude comparison is carried out one by one, and the sequence after matching is arranged from high to low according to similarity scores, shape At the second data set
(3) the sliding size of setting time window and time window is to the second data setIt is segmented, and The quantity for counting each data segment is found out by the second data setNext alarm being likely to occur when as sample Variable and corresponding probability;
(4) by bayesian probability model find out the alarm for predicting next appearance probability and corresponding confidence interval [Z1,Z2];
(5) n times interative computation is carried out to step (3) and step (4), takes confidence interval [Z1,Z2] lower limit Z1Highest result Sequence in corresponding data set is optimum prediction data set, and corresponding prediction result is exported as final prediction result.
Further, step (1) is implemented as:
Alarm variable xiDiscrimination Di=log N/ | m:xi∈xAF(m) }, wherein N represents data setMiddle packet The number of sequence is spread unchecked in the history alarm contained, | m:xi∈xAF(m) indicate that the history alarm in data set comprising the alarm element is general The number of indiscriminate sequence rejects the alarm variable that discrimination is 0 and obtains the first data set laterWhen discrimination is 0, Illustrate that data set is spread unchecked in alarmIn spread unchecked by the alarm that various failures cause have this alarm occurrences, because This is it is considered that this alarm variable is the nonsensical alarm that creates disturbances to, therefore this alarm variable is directly spread unchecked data from alarm Concentrate removal.
Further, the specific implementation process of step (2) is:
Similarity scores are calculated using smith-waterman algorithm, as emerging alarm sequence xASWith the first data CollectionIn m-th of sequenceWhen carrying out similitude comparison, similarity scores matrix H is constructed (m), it initializes the first row and first and is classified as 0, matrix size is j × (ml+ 1), mlFor x 'AF(m) length,
Wherein (1≤i '≤j-1,1≤j '≤ml), Wk′=k ' W1Be plug hole length be k ' when point penalty, W1It is penalized for unit Point,It is direct matching xi′WithScore, H (m)i′-k′,j′-Wk′It is xi′Preceding plug hole length k's ' Point penalty, H (m)i′,j′-l′-Wl′It isThe point penalty of preceding plug hole length l ', 0 indicates to be matched to xi′WithTime series is not similar Property, maximum element is x in final score matrix H (m)ASWith x 'AF(m) similarity scores;By the first data setIn alarm spread unchecked sequence and sorted from high to low according to similarity scores, obtain the second data set
Further, it in the step (3), sets to the second data setThe time window being segmented Length is n and n-1, and the sliding size of time window is 1, is asked by the second data setIt is next when as sample can The specific implementation process of the alarm variable and corresponding probability that can occur is, with time window n and n-1 respectively to the second data set ForInterior data count the quantity of each data segment after being segmented, record into data setTo emerging alarm sequenceBecause same failure causes Alarm spread unchecked sequence similarity height, therefore predict subsequent time tjWhen the alarm that the moment occurs, the second data set can refer toMiddle alert data segmentEnter the alarm variable of alarm condition later, it is all in this number It can be seen as being next alarm being likely to occur according to the alarm variable for entering alarm condition after segment, avoid in this way It is that different faults cause but there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result, so inquiry XDS InCorresponding quantity andAll alarms that subsequent time occursInquiryCorresponding quantity finds out conditional probability using maximum likelihood rule approximationSetting In make conditional probabilityCorresponding element is output result when maximumNote
Further, it is contemplated that the accuracy and arithmetic speed of prediction, the preferred value of n are 3.
Further, in step (4), it is contemplated that data set size meeting is spread unchecked in the history alarm in actual conditions for prediction The reliability of prediction result is had an impact, for example is predicted down with the data set that sequence composition is spread unchecked in 10 and 100 alarms One alarm is alarm variable xj0Probability be all 80%, it is clear that 100 alarm spread unchecked sequence composition data set prediction knot Fruit wants relatively reliable, and the influence in step 3 for data set size to result has no embodiment, thus we to introduce Bayes general Rate model carrys out innovatory algorithm.
Set occurent sequence xASSubsequent time enter the alarm variable of alarm conditionFor eventIt is sent out Giving birth to probability isAssuming that it is defined to be uniformly distributedFor prior probability;
Setting is by the second data setIt is alarm variable as the next alarm of sample predictionsFor event
Set the second data setIn there is k alarm to spread unchecked sequence and includeThere is l alarm to spread unchecked sequence Column includeThen conditional probabilityExpression event? Event in the case where knowingThe probability of generation, wherein
Then by eventEmerging alarm sequence x is speculated in the case where determinationASSubsequent time enters alarm condition For variable of alarmingProbability beIt is fixed JusticeFor posterior probability;
Wherein, because of posterior probabilityIt can be according to prior probabilityValue it is different and change, and thing PartObedience is uniformly distributed, and is calculated using probability mass function (PMF), obtains posterior probabilityProbability matter Flow function fX
To probability mass function fXSummation is that final prediction and alarm is alarm variableProbability;
Confidence level 1- α is provided, to probability mass function fXSummation, finds out section [Z1,Z2],This section, which is exactly that confidence level 1- α is corresponding, sets Believe section.
We introduce the concept of confidence interval, are the credibilities in order to embody the probability value of prediction.Confidence interval embodies The probability of the next alarm actually occurred falls in the degree around prediction probability result, it is assumed that confidence level 95% sets Believe section [Z1,Z2] lower limit mean a possibility that we have 95% assurance to guarantee the alarm of next generation at least Zβ
Further, the specific implementation process of step (5) is:
N times iteration is carried out to step (3) and step (4), kth deletes K (0≤K≤N) data and concentrates similarity scores Minimum sequence, uses data setIt is predicted, it is final to compare confidence interval [Z in n times result1,Z2] lower limit Z1, take confidence interval [Z1,Z2] lower limit Z1Highest result is optimum prediction as a result, corresponding data setFor most Good predictive data set because lower limit of confidence interval has reacted the minimum assurance that prediction result is target alarm, therefore selects confidence area Between sequence in the corresponding data set of the highest result of lower limit be optimum prediction data set.
The beneficial effects of the invention are as follows:(1) the invention proposes the data set and the alarm to be predicted that use prediction are general The indiscriminate method for carrying out similitude comparison and being resequenced from high to low according to similarity scores, and iteration reduction affinity score is low Sequence, until obtaining optimum prediction result, it is contemplated that the history alarm caused by same failure, which is spread unchecked, can improve the accurate of prediction Property, avoid different faults initiation but there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result; (2) the N-gram prediction technique proposed by the present invention based on bayesian probability model, it is contemplated that the history alarm for prediction is general Indiscriminate influence of the data set size for prediction result, embodies the credibility of prediction result, avoids because sample size is insufficient It misleads to prediction result.
Detailed description of the invention
Fig. 1 is that industrial alarm described in the embodiment of the present invention based on N-gram model spreads unchecked the flow chart of prediction technique;
Fig. 2 is the probability mass function of prediction result in concrete application scene of the present invention;
Fig. 3 is that prediction probability and confidence interval with the history alarm for prediction spread unchecked number in concrete application scene of the present invention The schematic diagram of mesh variation and variation;
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.
Embodiment one:
Such as Fig. 1, prediction technique is spread unchecked in a kind of industry alarm based on N-gram model of the present invention, including is walked as follows Suddenly:
S1 obtains history alarm and spreads unchecked data setCount variable of wherein alarmingCalculate each alarm The discrimination D of variablei, rejecting pretreatment is carried out for 0 alarm variable to the indexing of data concentration zones, forms the first data set
S2, by the first data setIn m-th of history alarm spread unchecked sequence and emerging alarm sequenceSimilitude comparison is carried out one by one, and the sequence after matching is arranged from high to low according to similarity scores, Form the second data set
The sliding size of S3, setting time window and time window is to the second data setIt is segmented, and The quantity for counting each data segment is found out by the second data setNext alarm being likely to occur when as sample Variable and corresponding probability;
S4, by bayesian probability model find out the alarm for predicting next appearance probability and corresponding confidence interval [Z1,Z2];
S5 carries out n times interative computation to step (3) and step (4), takes confidence interval [Z1,Z2] lower limit Z1Highest result Sequence in corresponding data set is optimum prediction data set, and corresponding prediction result is exported as final prediction result.
In step sl, it obtains history alarm and spreads unchecked data setData set is spread unchecked in statistical history alarmIn include alarm variableCalculate the discrimination D of each alarm variablei;Alarm variable xiDiscrimination be Di=log N/ | m:xi∈xAF(m) }, wherein N represents data setIn include history alarm spread unchecked the number of sequence Mesh, | m:xi∈xAF(m) indicate that the number of sequence is spread unchecked in the history alarm in data set comprising the alarm element;Reject discrimination To obtain the first data set after 0 alarm variableWhen discrimination is 0, illustrate that data set is spread unchecked in alarmIn spread unchecked by the alarm that various failures cause have this alarm occurrences, therefore it is considered that this alarm Variable is the nonsensical alarm that creates disturbances to, therefore this alarm variable is directly spread unchecked in data set from alarm and is removed.
In step s 2, using smith-waterman algorithm, as emerging alarm sequence xASWith the first data setIn m-th of sequenceWhen carrying out similitude comparison, construct similarity scores matrix H (m), Initialization the first row and first is classified as 0, and matrix size is j × (ml+ 1), mlFor x 'AF(m) length initializes the first row and the One is classified as 0, and matrix size is j × (ml+ 1), mlFor x 'AF(m) length:
Wherein (1≤i '≤j-1,1≤j '≤ml), Wk′=k ' W1Be plug hole length be k ' when point penalty, W1It is penalized for unit Point;
It is direct matching xi′WithScore;
H(m)i′-k′,j′-Wk′It is xi′The point penalty of preceding plug hole length k ';
H(m)i′,j′-l′-Wl′It isThe point penalty of preceding plug hole length l ';
0 means to be matched to xi′WithTime series is without similitude.
Matrix is updated according to the above rule, maximum element is x in final score matrix H (m)ASWith x 'AF (m) sequence in the first data set is ranked up by similarity scores from high to low according to similarity scores, obtains the second number According to collection, it is denoted as
In step S3, the length that can use time window is n and n-1, and the sliding size of time window is T=1s, to second Data setInterior data count the quantity of each data segment after being segmented respectively with time window, record into data CollectionTo emerging alarm sequenceBecause same failure is drawn Sequence similarity height is spread unchecked in the alarm of hair, therefore predicts subsequent time tjWhen the alarm that the moment occurs, the second data set can refer toMiddle alert data segmentEnter the alarm variable of alarm condition later, it is all in this number It can be seen as being next alarm being likely to occur according to the alarm variable for entering alarm condition after segment, avoid in this way It is that different faults cause but there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result, specially inquire XDSInCorresponding quantity andAll alarms that subsequent time occursInquiryCorresponding quantity finds out conditional probability using maximum likelihood rule approximation:
Obviously, conditional probabilityCorresponding alarm variable when being maximizedIt is exactly that subsequent time most has Possibly into the alarm variable of alarm condition, note makes conditional probabilityAlarm variable when being maximized is
In addition, it is contemplated that the value of n is selected as 3 by the accuracy and arithmetic speed of prediction, the present invention.
In step s 4, it is contemplated that data set size is spread unchecked in the history alarm in actual conditions for prediction to tie prediction The reliability of fruit has an impact, for example predicts next alarm with the data set that sequence composition is spread unchecked in 10 and 100 alarms For the variable x that alarmsj0Probability be all 80%, it is clear that the prediction result of data set that sequence composition is spread unchecked in 100 alarms will be more Reliably, and the influence in step 3 for data set size to result has no embodiment, therefore we introduce bayesian probability model Innovatory algorithm.
It is implemented as:
Set occurent sequence xASSubsequent time enter the alarm variable of alarm conditionFor eventIt is sent out Giving birth to probability isAssuming that it is defined to be uniformly distributedFor prior probability;
Setting is by the second data setIt is alarm variable as the next alarm of sample predictionsFor event
Set the second data setIn there is k alarm to spread unchecked sequence and includeThere is l alarm to spread unchecked sequence Column includeThen conditional probabilityExpression event? Event in the case where knowingThe probability of generation, wherein
Then by eventOccurent alarm sequence x is speculated in the case where determinationASSubsequent time enters alarm condition For variable of alarmingProbability beIt is fixed JusticeFor posterior probability;Wherein, because of posterior probabilityIt can be according to prior probabilityValue it is different And change, and eventObedience is uniformly distributed, so being calculated using probability mass function (PMF), obtains posterior probabilityProbability mass function fX
To probability mass function fXSummation is that final prediction and alarm isProbability;
Confidence level 1- α is provided, to probability mass function fXSummation, finds out section [Z1,Z2],This section, which is exactly that confidence level 1- α is corresponding, sets Believe section.
We introduce the concept of confidence interval, are the credibilities in order to embody the probability value of prediction.Confidence interval embodies The probability of the next alarm actually occurred falls in the degree around prediction probability result, it is assumed that confidence level 95% sets Believe section [Z1,Z2] lower limit mean a possibility that we have 95% assurance to guarantee the alarm of next generation at least Zβ
In step s 5, n times interative computation is carried out to step S3 and step S4, kth iteration deletes K (0≤K≤N) item The history alarm that similarity scores are minimum in data set spreads unchecked sequence until data set is sky.Use data setIt is predicted, it is final to compare lower limit of confidence interval Z in n times result1, take the highest result of lower limit of confidence interval to be Optimum prediction is as a result, corresponding data setFor optimum prediction data set.Because lower limit of confidence interval has reacted pre- Survey result be target alarm minimum assurance, therefore select the sequence in the corresponding data set of the highest result of lower limit of confidence interval for Optimum prediction data set.
Progress similitude, which is spread unchecked, the invention proposes the data set and the alarm to be predicted that use prediction compares simultaneously basis The method that similarity scores are resequenced from high to low, and iteration reduces the low sequence of affinity score, until obtaining optimum prediction As a result, it is contemplated that the accuracy that can improve prediction is spread unchecked in the history alarm caused by same failure, avoids different faults initiation But there is part identical history alarm of alarming to spread unchecked the misleading generated to prediction result;What is proposed simultaneously is general based on Bayes The N-gram prediction technique of rate model, it is contemplated that the history alarm for prediction spreads unchecked data set size for the shadow of prediction result It rings, embodies the credibility of prediction result, avoid because sample size deficiency misleads to prediction result.
It is application of the method for the invention in concrete application scene below.
One group of historical data alarm, which is provided, using TE process simulation spreads unchecked data.Wherein sequence number is the size of 100, n It is set as 3.
Step (1) calculates the discrimination of each alarm variable in data set, weeds out the alarm variable that discrimination is 0.
Step (2), according to sequence similarity score aligning method by all alarms in data set spread unchecked sequence with The alarm sequence of appearance compares one by one and according to the descending rearrangement of similarity scores.
Step (3), the sequence concentrated to data is segmented, and counts each data segment quantity, according to nearest two moment Historical data section is searched in the alarm of appearance from data set, and finds out corresponding conditional probability by maximum likelihood rule approximation.
Step (4) finds out the corresponding probability mass function of prediction result using bayesian probability model, by probability matter Flow function is summed into obtaining prediction probability, and finds out the confidence interval under corresponding confidence level 95%.As shown in Fig. 2, confidence area Between be (47.17%, 55.81%).
Step (5) is iterated calculating to data set, repeats step (3) and (4), prediction probability in all results is taken to set The letter highest result of interval limit is final output.As shown in figure 3, optimal prediction result is x11, prediction probability is 98.82%, confidence interval is [97.33%, 100%] (confidence level 90%), illustrates that system has 90% assurance to think next The alarm of appearance is x11Minimum probability be 97.33%.

Claims (8)

1. prediction technique is spread unchecked in a kind of industry alarm based on N-gram model, which is characterized in that include the following steps:
(1) it obtains history alarm and spreads unchecked data setCount variable of wherein alarmingCalculate each alarm variable Discrimination Di, rejecting pretreatment is carried out for 0 alarm variable to the indexing of data concentration zones, forms the first data set
(2) by the first data setIn m-th of history alarm spread unchecked sequence and emerging alarm sequenceSimilitude comparison is carried out one by one, and the sequence after matching is arranged from high to low according to similarity scores, Form the second data set
(3) the sliding size of setting time window and time window is to the second data setIt is segmented, and counts every The quantity of a data segment is found out by the second data setWhen as sample next alarm variable being likely to occur and Corresponding probability;
(4) probability and corresponding confidence interval [Z of the alarm for predicting next appearance are found out by bayesian probability model1, Z2];
(5) n times interative computation is carried out to step (3) and step (4), takes confidence interval [Z1,Z2] lower limit Z1Highest result is corresponding Data set in sequence be optimum prediction data set, and corresponding prediction result is exported as final prediction result.
2. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that step (1) be implemented as:
Alarm variable xiDiscrimination Di=log N/ | m:xi∈xAF(m) }, wherein N represents data setIn include History, which is alarmed, spreads unchecked the number of sequence, | m:xi∈xAF(m) indicate that sequence is spread unchecked in the history alarm in data set comprising the alarm element The number of column.
3. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that step (2) specific implementation process is:
Similarity scores are calculated using smith-waterman algorithm, as emerging alarm sequence xASWith the first data setIn m-th of sequenceWhen carrying out similitude comparison, construct similarity scores matrix H (m), Initialization the first row and first is classified as 0, and matrix size is j × (ml+ 1), mlFor x 'AF(m) length,
Wherein (1≤i '≤j-1,1≤j '≤ml), Wk′=k ' W1Be plug hole length be k ' when point penalty, W1For unit point penalty,It is direct matching xi′WithScore, H (m)i′-k′,j′-Wk′It is xi′Preceding plug hole length k's ' penalizes Point, H (m)i′,j′-l′-Wl′It isThe point penalty of preceding plug hole length l ', 0 indicates to be matched to xi′WithTime series without similitude, Maximum element is x in final score matrix H (m)ASWith x 'AF(m) similarity scores;By the first data set In alarm spread unchecked sequence and sorted from high to low according to similarity scores, obtain the second data set
4. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that described In step (3), set to the second data setThe length for the time window being segmented is n and n-1, time window Sliding size be 1.
5. prediction technique is spread unchecked in the industry alarm according to claim 4 based on N-gram model, which is characterized in that step (3) it is asked in by the second data setNext alarm variable and corresponding probability being likely to occur when as sample Specific implementation process is:
It is to the second data set respectively with time window n and n-1Interior data count each data after being segmented The quantity of section, records into data set
Predict emerging alarm sequenceSubsequent time tjWhen the alarm of generation, look into first Ask XDSInCorresponding quantity andAll alarms that subsequent time occursIt looks into It askes Corresponding quantity finds out conditional probability using maximum likelihood rule approximation
SettingIn make conditional probabilityCorresponding element is output result when maximumNote
6. prediction technique is spread unchecked in the industry alarm according to claim 5 based on N-gram model, which is characterized in that n's Preferred value is 3.
7. prediction technique is spread unchecked in the industry alarm according to claim 5 based on N-gram model, which is characterized in that step (4) detailed process realized is:
Set occurent sequence xASSubsequent time enter the alarm variable of alarm conditionFor eventIts probability of happening ForAssuming that it is defined to be uniformly distributedFor prior probability;
Setting is by the second data setIt is alarm variable as the next alarm of sample predictionsFor event
Set the second data setIn there is k alarm to spread unchecked sequence and includeThere is l alarm to spread unchecked sequence packet ContainThen conditional probabilityExpression eventIt is known In the case of eventThe probability of generation, wherein
Then by eventEmerging alarm sequence x is speculated in the case where determinationASIt is report that subsequent time, which enters alarm condition, Alert variableProbability beDefinitionFor posterior probability;
It is calculated using probability mass function PMF, obtains posterior probabilityProbability mass function fX
To probability mass function fXSummation is that final prediction and alarm is alarm variableProbability;
Confidence level 1- α is provided, to probability mass function fXSummation, finds out section [Z1,Z2],This section, which is exactly that confidence level 1- α is corresponding, sets Believe section.
8. prediction technique is spread unchecked in the industry alarm according to claim 1 based on N-gram model, which is characterized in that step (5) specific implementation process is:
N times iteration is carried out to step (3) and step (4), kth deletes K (0≤K≤N) data and concentrates similarity scores minimum Sequence, use data setIt is predicted, it is final to compare confidence interval [Z in n times result1,Z2] lower limit Z1, Take confidence interval [Z1,Z2] lower limit Z1Highest result is optimum prediction as a result, corresponding data setIt is best Predictive data set.
CN201810889499.4A 2018-08-07 2018-08-07 It is a kind of based on N-gram model industry alarm spread unchecked prediction technique Active CN108922140B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810889499.4A CN108922140B (en) 2018-08-07 2018-08-07 It is a kind of based on N-gram model industry alarm spread unchecked prediction technique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810889499.4A CN108922140B (en) 2018-08-07 2018-08-07 It is a kind of based on N-gram model industry alarm spread unchecked prediction technique

Publications (2)

Publication Number Publication Date
CN108922140A true CN108922140A (en) 2018-11-30
CN108922140B CN108922140B (en) 2019-08-16

Family

ID=64394689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810889499.4A Active CN108922140B (en) 2018-08-07 2018-08-07 It is a kind of based on N-gram model industry alarm spread unchecked prediction technique

Country Status (1)

Country Link
CN (1) CN108922140B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112688946A (en) * 2020-12-24 2021-04-20 工业信息安全(四川)创新中心有限公司 Method, module, storage medium, device and system for constructing abnormality detection features
US11212162B2 (en) * 2019-07-18 2021-12-28 International Business Machines Corporation Bayesian-based event grouping
CN115909697A (en) * 2023-02-15 2023-04-04 山东科技大学 Alarm state prediction method and system based on amplitude change trend probability inference
WO2023124778A1 (en) * 2021-12-29 2023-07-06 浙江中控技术股份有限公司 Real-time alarm tracing apparatus and method in process industry production procedure
CN117118811A (en) * 2023-10-25 2023-11-24 南京邮电大学 Alarm analysis method for industrial alarm flooding

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104456092A (en) * 2014-12-02 2015-03-25 中国石油大学(华东) Multidimensional assessment method of petroleum and natural gas pipeline warning priority
US20150123784A1 (en) * 2013-11-03 2015-05-07 Teoco Corporation System, Method, and Computer Program Product for Identification and Handling of a Flood of Alarms in a Telecommunications System
CN105006119A (en) * 2015-06-30 2015-10-28 中国寰球工程公司 Alarm system optimization method based on bayesian network
CN107748901A (en) * 2017-11-24 2018-03-02 东北大学 The industrial process method for diagnosing faults returned based on similitude local spline

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150123784A1 (en) * 2013-11-03 2015-05-07 Teoco Corporation System, Method, and Computer Program Product for Identification and Handling of a Flood of Alarms in a Telecommunications System
CN104456092A (en) * 2014-12-02 2015-03-25 中国石油大学(华东) Multidimensional assessment method of petroleum and natural gas pipeline warning priority
CN105006119A (en) * 2015-06-30 2015-10-28 中国寰球工程公司 Alarm system optimization method based on bayesian network
CN107748901A (en) * 2017-11-24 2018-03-02 东北大学 The industrial process method for diagnosing faults returned based on similitude local spline

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王佳 等: "工业报警序列的模糊关联规则挖掘方法", 《化工学报》 *
陈忠圣 等: "基于离散傅里叶变换的过程工业报警泛滥序列聚类分析及应用", 《化工学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11212162B2 (en) * 2019-07-18 2021-12-28 International Business Machines Corporation Bayesian-based event grouping
CN112688946A (en) * 2020-12-24 2021-04-20 工业信息安全(四川)创新中心有限公司 Method, module, storage medium, device and system for constructing abnormality detection features
CN112688946B (en) * 2020-12-24 2022-06-24 工业信息安全(四川)创新中心有限公司 Method, module, storage medium, device and system for constructing abnormality detection features
WO2023124778A1 (en) * 2021-12-29 2023-07-06 浙江中控技术股份有限公司 Real-time alarm tracing apparatus and method in process industry production procedure
CN115909697A (en) * 2023-02-15 2023-04-04 山东科技大学 Alarm state prediction method and system based on amplitude change trend probability inference
CN117118811A (en) * 2023-10-25 2023-11-24 南京邮电大学 Alarm analysis method for industrial alarm flooding

Also Published As

Publication number Publication date
CN108922140B (en) 2019-08-16

Similar Documents

Publication Publication Date Title
CN108922140B (en) It is a kind of based on N-gram model industry alarm spread unchecked prediction technique
Zaman et al. Evaluation of machine learning techniques for network intrusion detection
CN111475804B (en) Alarm prediction method and system
Salih et al. Evaluation of classification algorithms for intrusion detection system: A review
CN111708343B (en) Method for detecting abnormal behavior of field process behavior in manufacturing industry
CN108449366B (en) Key message infrastructure security based on artificial intelligence threatens intelligence analysis system
Akter et al. Improved machine learning based classification model for early autism detection
Jiang et al. Electrical-STGCN: An electrical spatio-temporal graph convolutional network for intelligent predictive maintenance
CN102521534A (en) Intrusion detection method based on crude entropy property reduction
Bouhoute et al. On the application of machine learning for cut-in maneuver recognition in platooning scenarios
CN111325410A (en) General fault early warning system based on sample distribution and early warning method thereof
Pednekar et al. Crime rate prediction using KNN
CN113705714A (en) Power distribution Internet of things equipment abnormal behavior detection method and device based on behavior sequence
Werner et al. Near real-time intrusion alert aggregation using concept-based learning
Kanumalli et al. A scalable network intrusion detection system using bi-lstm and cnn
Belavadi et al. Alarm pattern recognition in continuous process control systems using data mining
Acharya et al. Efficacy of CNN-bidirectional LSTM hybrid model for network-based anomaly detection
Lam Detecting unauthorized network intrusion based on network traffic using behavior analysis techniques
CN110837953A (en) Automatic abnormal entity positioning analysis method
Tahri et al. A comparative study of Machine learning Algorithms on the UNSW-NB 15 Dataset
CN111221704B (en) Method and system for determining running state of office management application system
Kim et al. Anomaly pattern detection in streaming data based on the transformation to multiple binary-valued data streams
CN113323699A (en) Method for accurately identifying fault source of hydraulic support system based on data driving
Peruzzo et al. Pattern-based feature extraction for fault detection in quality relevant process control
Santhi et al. A Hybrid feature extraction method with machine learning for detecting the presence of network attacks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant