CN111857097B - Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency - Google Patents

Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency Download PDF

Info

Publication number
CN111857097B
CN111857097B CN202010733364.6A CN202010733364A CN111857097B CN 111857097 B CN111857097 B CN 111857097B CN 202010733364 A CN202010733364 A CN 202010733364A CN 111857097 B CN111857097 B CN 111857097B
Authority
CN
China
Prior art keywords
inverse document
word
frequency
word frequency
industrial control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010733364.6A
Other languages
Chinese (zh)
Other versions
CN111857097A (en
Inventor
李少森
梁钰华
孙豪
黄剑湘
杨光
李�浩
张启浩
任君
杨铖
丁丙侯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming Bureau of Extra High Voltage Power Transmission Co
Original Assignee
Kunming Bureau of Extra High Voltage Power Transmission Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming Bureau of Extra High Voltage Power Transmission Co filed Critical Kunming Bureau of Extra High Voltage Power Transmission Co
Priority to CN202010733364.6A priority Critical patent/CN111857097B/en
Publication of CN111857097A publication Critical patent/CN111857097A/en
Application granted granted Critical
Publication of CN111857097B publication Critical patent/CN111857097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B23/00Testing or monitoring of control systems or parts thereof
    • G05B23/02Electric testing or monitoring
    • G05B23/0205Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
    • G05B23/0218Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
    • G05B23/0243Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults model based detection method, e.g. first-principles knowledge model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/24Pc safety
    • G05B2219/24065Real time diagnostics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Debugging And Monitoring (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

The application discloses an industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency, which comprises the following steps: establishing a response corpus of diagnostic commands; sending a diagnosis command to the tested system again to obtain an (n+1) th return display message; filtering stop words and word segmentation processing is carried out on all the echo messages; calculating the inverse document frequency IDF of each word in each group of text list of all the return display messages by using the TF-IDF word frequency and inverse document frequency algorithm; setting a lowest inverse document frequency threshold IDFmin, and deleting words not more than IDFmin; establishing a phrase list V from a text list of the filtered N+1 parts of the feedback messages, and calculating a word frequency value; and setting a word frequency threshold value, and comparing the calculated word frequency value with the set word frequency threshold value to judge the abnormality. The algorithm of the application can define the health degree of the information displayed back by each diagnosis command in a self-learning mode, can greatly reduce the labor development cost of an industrial control monitoring system and improve the event judgment timeliness.

Description

Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency
Technical Field
The application relates to the technical field of industrial control system abnormality diagnosis, in particular to an industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency.
Background
At present, part of industrial control systems realize operation and maintenance based on remote management, do not provide local operation interfaces such as screens, keys and the like for interaction of on-site operation and maintenance personnel, and have to use a debugging computer for access to interact with the device in a mode of debugging software/browser and the like so as to check and analyze system problems. Once the channel or device abnormal event occurs, on-site operation and maintenance personnel can only give an alarm according to the channel interruption of other service systems, the operation and maintenance personnel of a remote monitoring center (such as a dispatching master station of each level) feed back the alarm, and then a debugging computer is used for accessing an industrial control system to check, analyze and process the abnormal cause. If the remote monitoring does not notice the abnormality, the fault can be discovered only when the on-site operation and maintenance personnel operate and maintain regularly and configure the backup, and the fault treatment is generally delayed and not in time. Because of the randomness of the abnormality of the industrial control system, the manual periodic checking analysis is difficult to grasp the detailed information of the abnormality moment, so that the quality of the abnormality analysis is lower along with the time.
Disclosure of Invention
Aiming at the defects of the prior art, the application provides an industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency, which solves the problem of low industrial control system abnormality analysis quality in the prior art.
The application discloses a word frequency and inverse document frequency-based industrial control system abnormality diagnosis information identification method, which comprises the following steps:
step 1: establishing a response corpus of diagnostic commands: sending a diagnosis command to a tested system for N times, and arranging the obtained N return display messages in time sequence to be used as a response corpus of the diagnosis command;
step 2: sending a diagnosis command to the tested system again to obtain an (n+1) th return display message, and adding the (n+1) th return display message to the last of the diagnosis command corpus established in the step (1);
step 3: filtering stop words and word segmentation processing is carried out on the N+1 parts of the back display messages;
step 4: calculating the inverse document frequency IDF of each word in each group of text list in the N+1 parts of the echo message by utilizing a TF-IDF word frequency and inverse document frequency algorithm;
step 5: setting a lowest inverse document frequency threshold IDFmin, and deleting the inverse document frequency IDF of the words of each group of text list calculated in the step 4 if the inverse document frequency IDF is smaller than or equal to the IDFmin value;
step 6: vectorizing a text list of the N+1 return display messages filtered in the step 5: extracting all the phrases in the N+1 group text list, removing the repeated phrases to obtain a phrase list V with the length of M, wherein M is equal to the total number of the phrases subjected to repeated filtering, V represents all the phrases appearing in the N+1 group text list subjected to filtering, and then re-ordering the words of the text list according to the ordering of the words in V by the N+1 group text list subjected to filteringThen the phrase is converted into a vector, the vector size is the number of times the word appears in the return display message where the word is located, and the word frequency value is calculated
Step 7: setting word frequency thresholdAnd (3) adding the word frequency value calculated in the step (6)>Value and set word frequency threshold ++>Comparing if->And identifying the message as an abnormal message and outputting alarm information.
According to one embodiment of the present application, the time interval for sending the diagnostic command in step 1 is T, and the value range of T is 1 to 30 days when the system resource is not mutated, according to the time range in which the returned diagnostic command result may change; under the condition that a network channel is possibly interrupted at any time, the value range of T is 1 s-24 h.
According to one embodiment of the application, the stop words in step 3 include date and time.
According to one embodiment of the application, the date format is yyy-mm-dd and the time format is hh: mm: ss, h: mm.
According to an embodiment of the present application, the word segmentation processing in step 3 specifically includes: and (3) taking the space as a separator, dividing the N+1 group command echo display into a plurality of word groups to form an N+1 group one-dimensional text list.
According to an embodiment of the present application, the calculation formula of the IDF in step 4 is:
according to one embodiment of the present application, IDFmin is not less than 1 in step 5.
According to an embodiment of the present application, in step 6, the word frequency valueThe calculation method of (1) is as follows: extracting all phrases in the N+1 group text list, removing the repeated phrases to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases subjected to repeated filtering, V represents all the phrases appearing in the N+1 group text list subjected to filtering, then reordering the words of the text list according to the ordering of the words in V by the N+1 group text list subjected to filtering, converting the phrases into vectors with the size of the vectors being the number of times the words appear in a return display message in which the words are positioned, obtaining an (N+1) x (M) matrix A, and setting a ij For the element of row j of matrix A, then for each element a in the text list of group N+1 (N+1)j Its word frequency->The definition is as follows:
according to an embodiment of the present application, in step 7The value range of (2) is 0.2-0.5.
The application has the beneficial effects that:
1. the application relates to an industrial control system abnormal diagnosis information identification method based on word frequency and inverse document frequency, which is used for industrial control system diagnosis information identification through word frequency and inverse document frequency algorithm, realizes automatic mining of key information in each diagnosis command display information, such as numerical value abnormal change, sudden generation of alarm content and the like, does not need to manually define key content and criteria of information abnormality or not for each diagnosis command display information, calculates inverse document frequency of each word and number between samples through enough acquisition samples, and can automatically screen unimportant information (such as descriptive characters) in the information, and variable information (such as CPU load, alarm information which does not occur at ordinary times and the like) capable of judging system abnormality is left. And then judging the occurrence frequency of the variable information in the sample through word frequency calculation, and alarming for the rarely-occurring variable (such as suddenly abnormally high CPU load, and the alarm information which does not occur frequently at ordinary times) with very low frequency, so as to prompt the operation and maintenance personnel to pay attention in time.
2. The algorithm of the application can define the health degree of the information displayed back by each diagnosis command in a self-learning mode, can greatly reduce the manual development cost of an automatic monitoring industrial control system, and meanwhile, an analysis method irrelevant to the characteristics of the monitored system can be easily transplanted to the operation state monitoring work of different service systems, so that the adaptability is strong, the manpower can be effectively liberated, the event judgment timeliness is improved, and the operation and maintenance efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is an algorithm flow chart of an industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency;
FIG. 2 is a schematic diagram of N feedback messages in an embodiment of the method for identifying abnormal diagnosis information of an industrial control system based on word frequency and inverse document frequency;
FIG. 3 is a schematic diagram of an N+1 echo message in an embodiment of the method for identifying abnormal diagnosis information of an industrial control system based on word frequency and inverse document frequency.
Detailed Description
Various embodiments of the application are disclosed in the following drawings, in which details of the practice are set forth in the following description for the purpose of clarity. However, it should be understood that these practical details are not to be taken as limiting the application. That is, in some embodiments of the application, these practical details are unnecessary. Moreover, for the purpose of simplifying the drawings, some conventional structures and components are shown in the drawings in a simplified schematic manner.
In addition, the descriptions of the "first," "second," and the like, herein are for descriptive purposes only and are not intended to be specifically construed as order or sequence, nor are they intended to limit the application solely for distinguishing between components or operations described in the same technical term, but are not to be construed as indicating or implying any relative importance or order of such features. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.
The application discloses a word frequency and inverse document frequency-based industrial control system abnormality diagnosis information identification method, wherein the algorithm flow is shown in figure 1 and comprises the following steps:
step 1: establishing a response corpus of diagnostic commands: sending a diagnosis command to a tested system for N times at a time interval T, and arranging the obtained N return display messages into a corpus as shown in figure 2 in time sequence to serve as a response corpus of the diagnosis command;
step 2: sending a diagnosis command to the tested system again to obtain an (n+1) th return display message, and adding the (n+1) th return display message to the last of the diagnosis command response corpus established in the step 1, wherein the arranged effect is shown in figure 3;
step 3: filtering and stopping words and word segmentation are carried out on the N+1 parts of the back display messages, wherein the stopping words comprise a date format yyy-mm-dd, a time format hh is mm, ss is shown, h is mm, and word segmentation is carried out: dividing the N+1 group command echo display into a plurality of word groups by taking the space as a separator to form an N+1 group one-dimensional text list;
step 4: calculating the inverse document frequency IDF of each word in each group of text list in the N+1 parts of return display messages by utilizing a TF-IDF word frequency and inverse document frequency algorithm, wherein the calculation formula of the IDF is as follows:
step 5: setting a lowest inverse document frequency threshold IDFmin, and deleting the inverse document frequency IDF of the words of each group of text list calculated in the step 4 if the inverse document frequency IDF is smaller than or equal to the IDFmin value;
step 6: vectorizing a text list of the N+1 return display messages filtered in the step 5: extracting all phrases in the N+1 group text list, removing the repeated phrases to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases subjected to repeated filtering, V represents all the phrases appearing in the N+1 group text list subjected to filtering, then reordering the words of the text list according to the ordering of the words in V by the N+1 group text list subjected to filtering, converting the phrases into vectors with the size of the vectors being the number of times the words appear in a return display message in which the words are positioned, obtaining an (N+1) x (M) matrix A, and setting a ij For the element of row j of matrix A, then for each element a in the text list of group N+1 (N+1)j Its word frequencyThe definition is as follows:
step 7: setting word frequency thresholdAnd (3) adding the word frequency value calculated in the step (6)>Value and set word frequency threshold ++>Comparing if->And identifying the message as an abnormal message and outputting alarm information.
Example 1
Taking a longitudinal encryption device of a +/-800 kV puer converter station as an example for explanation:
after a top diagnosis command is sent to the longitudinal encryption device for a certain time, a return display message of the tested system is set as follows:
top - 18:29:33 up 2:26, 1 user, load average: 0.00, 0.03, 0.06
Tasks: 0 total, 0 running, 0 sleeping, 0 stopped, 0 zombie
%Cpu(s): 20.0 us, 0.0sy, 0.0 ni, 80.0 id, 0.0wa, 0.0hi, 0.0si, 0.0st
MiB Mem: 987.4 total,91.3 free, 642.2 used, 253.8 buff/cache
MiB Swap: 1022.0 total,776.4 free,245.6 used.185.3 avail Mem
step 1: and sending the diagnosis command to the tested system for N times at a time interval of T=5 seconds, obtaining N return display messages for the diagnosis command, and arranging the return display messages in time sequence to be used as a corpus of the command, wherein the return display messages contain nonsensical information and meaningful information at the same time. Meaningless information such as date and time, notes, etc., and meaningful information includes information reflecting the state of the system under test, such as CPU occupancy rate, memory occupancy rate, alarm cues, etc.
Step 2: after the corpus of the diagnosis command is obtained, sending the diagnosis command to the tested system again, obtaining the (n+1) th echo information, and adding the echo message to the corpus.
Step 3: filtering and stopping word processing is carried out on the N+1 texts: the stop words include date format yyy-mm-dd, time format hh: mm: ss, h: mm, e.g. 18:29:33, 2:26, to be filtered; word segmentation processing is carried out on the N+1 parts of texts: dividing the N+1 group command echo display into a plurality of word groups by taking the space as a separator to form an N+1 group one-dimensional text list: [ top, up,1, user, load, average,0.00,0.03,0.06, tasks … … ].
Step 4: and calculating the inverse document frequency IDF of each word of each text list in the N+1 groups of return display messages by using a word frequency algorithm:whereinThe word top exists in all n+1 text parts, then it
Step 5: setting the lowest inverse document frequency threshold idfmin=1.0, deleting the inverse document frequency of the words in each group of text list if the inverse document frequency is smaller than or equal to the threshold, filtering the nonsensical information in the command echo display, and annotating words with nonsensical meaning like words "Tasks", "top", "user", "load", "average", and the like, which appear in n+1 texts, wherein the inverse document frequency is smaller than 1.0, and filtering.
Step 6: vectorizing the filtered n+1 group text list: extracting all phrases in the N+1 group text list, and removing the repeated phrases to obtain a phrase list V with the length of M: [ "0.00", "0.03", "0.06", … … ], where M is equal to the total number of phrases that have been removed from the duplicate filtering, V represents all phrases that occur within the N+1 set of text lists that have been removed, then the N+1 set of text lists that have been removed are reordered from the words of the text list according to the ordering of the words in V, and then the phrases are converted into vectors: a certain set of text lists contains "0.00"1 times, contains "0.03"0 times, contains "0.06"3 times, then its vectorization is [1,0,3, … … ], and the position of the vector in the list is consistent with the position of the phrase represented by the vector in the phrase list V.
After the process is completed, a (n+1) x (M) matrix A is obtained, and a is set ij For the element of row j of matrix A, then for each element a in the text list of group N+1 (N+1)j Its word frequencyThe definition is as follows:
the results of the vectorized n+1 set of matrices are shown in table 1 below:
step 7: setting word frequency thresholdWord frequency when any vector element in the text list of the (n+1) th groupAnd the algorithm outputs alarm information to remind operation and maintenance personnel of paying attention.
Example two
Take a longitudinal encryption device of a +/-800 kV KunBei converter station as an example for explanation:
step 1: transmitting a top diagnosis command to the longitudinal encryption authentication device with t=10 seconds as one period to obtain 4 return display messages, as shown in table 2:
step 2: the diagnostic command is sent again to the longitudinal encryption authentication device and the 5 th return display message is obtained, as shown in table 3:
step 3: text filtering stop word processing is carried out on 5 parts of feedback messages in a corpus in a unified format, the content of the processed corpus is shown in a table 4, and time-related useless information is deleted:
text word segmentation processing in a unified format is carried out on all the echo messages in the corpus: the space is used as a separator, the N+1 group command is back displayed into an N+1 group one-dimensional text list, and the content of the processed corpus is shown in table 5:
step 4: the word frequency algorithm is applied to calculate the inverse document frequency IDF of each word of each text list in the N+1 groups of echo messages,
IDF calculation is performed on the corpus after filtering the stop words and completing word segmentation, taking the 1 st, 2 nd and 7 th words of the 1 st feedback message as examples, and the results are shown in table 6:
step 5: setting a lowest inverse document frequency threshold idfmin=0.1, if the IDF value is lower than 0.1, determining that the echo information is the over-frequency information, obtaining top and up as non-important echo information from table 6, filtering, retaining 0.00 as important echo information, and updating the corpus after completion of IDF calculation of all echo information as shown in table 7:
performing de-duplication processing on the corpus to generate an important feedback information list, wherein the processing result is shown in a table 8, and displaying a non-repeated set of all important information in the corpus:
vectorizing a text list of the N+1 return display messages filtered in the step 5: extracting all the phrases in the N+1 group text list, removing the repeated phrases to obtain a phrase table V with the length of M, wherein M is equal to the total number of the phrases subjected to repeated filtering, V represents all the phrases appearing in the N+1 group text list subjected to filtering, then reordering the words of the text list according to the ordering of the words in V by the N+1 group text list subjected to filtering, converting the phrases into vectors, wherein the size of the vectors is the number of times that the words appear in a return display message in which the words are positioned, and the conversion result is shown in a table 9: .
By calculation formulaPerforming TF word frequency calculation on the 5 times of back display messages: the calculation results are shown in table 10: />
Step 7: setting word frequency thresholdComparing the word frequency table of the echo information TF with an algorithm design fixed value, judging that abnormal echo information exists in the echo message if the TF value of the echo information TF of a certain message is larger than or equal to the fixed value, and judging that the echo message is a normal message if the TF value of the echo information TF of the certain message is smaller than the fixed value, wherein the final result is shown in a table 10:
from this, it is known that the 7, 8, 9 return display information body in the 5 th message is abnormal, and the message is abnormal message and gives an alarm.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of the present application, should be included in the scope of the claims of the present application.

Claims (9)

1. The industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency is characterized by comprising the following steps:
step 1: establishing a response corpus of diagnostic commands: sending a diagnosis command to a tested system for N times, and arranging the obtained N return display messages in time sequence to be used as a response corpus of the diagnosis command;
step 2: sending a diagnosis command to the tested system again to obtain an (n+1) th return display message, and adding the (n+1) th return display message to the last of the diagnosis command response corpus established in the step (1);
step 3: filtering stop words and word segmentation processing is carried out on the N+1 parts of the back display messages;
step 4: calculating the inverse document frequency IDF of each word in each group of text list in the N+1 parts of the echo message by using the TF-IDF word frequency and the inverse document frequency algorithm;
step 5: setting a lowest inverse document frequency threshold IDFmin, and deleting the inverse document frequency IDF of the words of each group of text list calculated in the step 4 if the inverse document frequency IDF is smaller than or equal to the IDFmin value;
step 6: vectorizing a text list of the N+1 return display messages filtered in the step 5: extracting all phrases in the N+1 group text list, removing the repeated phrases to obtain a phrase list V with the length of M, wherein M is equal to the total number of the phrases subjected to repeated filtering, V represents all the phrases appearing in the N+1 group text list subjected to filtering, then reordering the words of the text list according to the ordering of words in V by the N+1 group text list subjected to filtering, converting the phrases into vectors with the size of the number of times that the words appear in the return display message, and calculating the word frequency value
Step 7: setting word frequency thresholdAnd (3) adding the word frequency value calculated in the step (6)>Value and set word frequency thresholdComparing if->And identifying the message as an abnormal message and outputting alarm information.
2. The method for identifying abnormal diagnostic information of an industrial control system based on word frequency and inverse document frequency according to claim 1, wherein the diagnostic command sending time interval in step 1 is T, the value range of T is determined according to the time range in which the diagnostic command return result may change, and the value range of T is 1-30 days under the condition that system resources are not mutated; under the condition that a network channel is possibly interrupted at any time, the value range of T is 1 s-24 h.
3. The method for identifying abnormal diagnostic information of industrial control system based on word frequency and inverse document frequency according to claim 1, wherein the stop words in the step 3 include date and time.
4. The method for identifying the abnormal diagnosis information of the industrial control system based on word frequency and inverse document frequency according to claim 3, wherein the date format is yyy-mm-dd, and the time format is hh: mm: ss, h: mm.
5. The method for identifying abnormal diagnosis information of industrial control system based on word frequency and inverse document frequency according to claim 1, wherein the word segmentation process in step 3 is specifically: and (3) taking the space as a separator, dividing the N+1 group command echo display into a plurality of word groups to form an N+1 group one-dimensional text list.
6. The method for identifying abnormal diagnosis information of industrial control system based on word frequency and inverse document frequency according to claim 1, wherein the calculation formula of IDF in step 4 is:
7. the identification method of the industrial control system abnormality diagnosis information based on word frequency and inverse document frequency according to claim 1, wherein IDFmin in the step 5 is not less than 1.
8. The method for identifying abnormal diagnostic information of industrial control system based on word frequency and inverse document frequency as claimed in claim 1, wherein in step 6, the word frequency value isThe calculation method of (1) is as follows: to obtain (N+1) x (M) matrix A, let a be ij For the element of row j of matrix A, then for each element a in the text list of group N+1 (N+1)j Its word frequency->The definition is as follows:
9. the method for identifying abnormal diagnostic information of industrial control system based on word frequency and inverse document frequency according to claim 1, wherein the step 7 is characterized in thatThe value range of (2) is 0.2-0.5.
CN202010733364.6A 2020-07-27 2020-07-27 Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency Active CN111857097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010733364.6A CN111857097B (en) 2020-07-27 2020-07-27 Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010733364.6A CN111857097B (en) 2020-07-27 2020-07-27 Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency

Publications (2)

Publication Number Publication Date
CN111857097A CN111857097A (en) 2020-10-30
CN111857097B true CN111857097B (en) 2023-10-31

Family

ID=72947886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010733364.6A Active CN111857097B (en) 2020-07-27 2020-07-27 Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency

Country Status (1)

Country Link
CN (1) CN111857097B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015170145A (en) * 2014-03-07 2015-09-28 Kddi株式会社 Program, device, and server for estimating simple sentence symbolizing target sentence
CN107992597A (en) * 2017-12-13 2018-05-04 国网山东省电力公司电力科学研究院 A kind of text structure method towards electric network fault case
CN108846142A (en) * 2018-07-12 2018-11-20 南方电网调峰调频发电有限公司 A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing
CN109495479A (en) * 2018-11-20 2019-03-19 华青融天(北京)软件股份有限公司 A kind of user's abnormal behaviour recognition methods and device
KR101964412B1 (en) * 2018-12-12 2019-04-01 주식회사 모비젠 Method for diagnosing anomaly log of mobile commmunication data processing system and system thereof
CN110008311A (en) * 2019-04-04 2019-07-12 北京邮电大学 A kind of product information security risk monitoring method based on semantic analysis
CN110321411A (en) * 2019-06-26 2019-10-11 国网江苏省电力有限公司 A kind of power system monitor warning information classification method, system and readable storage medium storing program for executing
WO2020124037A1 (en) * 2018-12-13 2020-06-18 DataRobot, Inc. Methods for detecting and interpreting data anomalies, and related systems and devices

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170154107A1 (en) * 2014-12-11 2017-06-01 Hewlett Packard Enterprise Development Lp Determining term scores based on a modified inverse domain frequency
CN106570513B (en) * 2015-10-13 2019-09-13 华为技术有限公司 The method for diagnosing faults and device of big data network system
CN108259482B (en) * 2018-01-04 2019-05-28 平安科技(深圳)有限公司 Network Abnormal data detection method, device, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015170145A (en) * 2014-03-07 2015-09-28 Kddi株式会社 Program, device, and server for estimating simple sentence symbolizing target sentence
CN107992597A (en) * 2017-12-13 2018-05-04 国网山东省电力公司电力科学研究院 A kind of text structure method towards electric network fault case
CN108846142A (en) * 2018-07-12 2018-11-20 南方电网调峰调频发电有限公司 A kind of Text Clustering Method, device, equipment and readable storage medium storing program for executing
CN109495479A (en) * 2018-11-20 2019-03-19 华青融天(北京)软件股份有限公司 A kind of user's abnormal behaviour recognition methods and device
KR101964412B1 (en) * 2018-12-12 2019-04-01 주식회사 모비젠 Method for diagnosing anomaly log of mobile commmunication data processing system and system thereof
WO2020124037A1 (en) * 2018-12-13 2020-06-18 DataRobot, Inc. Methods for detecting and interpreting data anomalies, and related systems and devices
CN110008311A (en) * 2019-04-04 2019-07-12 北京邮电大学 A kind of product information security risk monitoring method based on semantic analysis
CN110321411A (en) * 2019-06-26 2019-10-11 国网江苏省电力有限公司 A kind of power system monitor warning information classification method, system and readable storage medium storing program for executing

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Automated IT system failure prediction: A deep learning approach;Ke Zhang等;2016 IEEE International Conference on Big Data (Big Data);第1291-1300页 *
BINet: Multivariate Business Process Anomaly Detection Using Deep Learning;Nolle, T;International Conference on Business Process Management;第11080卷;第271-287页 *
一种基于日志信息和CNN-text的软件系统异常检测方法;梅御东;陈旭;孙毓忠;牛逸翔;肖立;王海荣;冯百明;;计算机学报(第02期);第366-380页 *
基于Bi-LSTM和TFIDF的工单事件提取;范华;翁利国;周艳;姜川;孙涛;;电脑知识与技术(第04期);第291-293页 *
基于TF-IDF改进计算模型的实时大数据处理系统设计与实现;王海明;中国优秀硕士学位论文全文数据库信息科技辑(第04(2018)期);第I140-886页 *
基于TF-IDF算法的AAA服务异常检测机制研究;黄晓丹等;移动通信;第83-87页 *
基于关联规则的高铁列控车载设备故障诊断方法研究;刘浩;中国优秀硕士学位论文全文数据库工程科技Ⅱ辑(第01(2019)期);第C033-582页 *
电力企业文本数据挖掘技术研究;吕旭明;雷振江;赵永彬;由广浩;;电力信息与通信技术(第01期);第7-10页 *

Also Published As

Publication number Publication date
CN111857097A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN100589418C (en) The generation method and the generation system of alarm correlation rule
CN107357730B (en) System fault diagnosis and repair method and device
CN103533084B (en) Real-time DMS (device management system) of B/S (browser/server) framework and method thereof
CN111290913A (en) Fault location visualization system and method based on operation and maintenance data prediction
EP2927819B1 (en) Method for automatically processing a number of protocol files of an automation system
CN112346931A (en) Raspberry pie-based private network service cluster monitoring alarm system, method and medium
CN105740992A (en) Hospital medical risk assessment system and method
CN111626498B (en) Equipment running state prediction method, device, equipment and storage medium
CN110311802A (en) Network operation method, device, electronic equipment and storage medium
CN111857097B (en) Industrial control system abnormality diagnosis information identification method based on word frequency and inverse document frequency
CN115237724A (en) Data monitoring method, device, equipment and storage medium based on artificial intelligence
CN114676791A (en) Electric power system alarm information processing method based on fuzzy evidence reasoning
CN106445789A (en) Monitoring visualizing method and system
CN112732791A (en) Wireless AP data analysis platform and method
CN113064399A (en) Industrial monitoring software prediction maintenance system based on big data distributed programming framework
CN116594840A (en) Log fault acquisition and analysis method, system, equipment and medium based on ELK
Takano et al. Psychological biases affecting human cognitive performance in dynamic operational environments
Kang et al. A methodology for evaluating alarm-processing systems using informational entropy-based measure and the analytic hierarchy process
CN116522213A (en) Service state level classification and classification model training method and electronic equipment
JP2007048025A (en) Plant operation support device and plant operation support method
CN110677271A (en) Big data alarm method, device, equipment and storage medium based on ELK
CN111597068A (en) IT operation and maintenance management method and IT operation and maintenance management device
CN110083611B (en) Random hybrid system security analysis method based on statistical model detection
Groth et al. A model-based approach to HRA: example application and quantitative analysis
CN112559238B (en) Troubleshooting strategy generation method and device for Oracle database, processor and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant