CN114168373A - NLP-based disaster recovery system abnormal point detection method - Google Patents

NLP-based disaster recovery system abnormal point detection method Download PDF

Info

Publication number
CN114168373A
CN114168373A CN202111363265.4A CN202111363265A CN114168373A CN 114168373 A CN114168373 A CN 114168373A CN 202111363265 A CN202111363265 A CN 202111363265A CN 114168373 A CN114168373 A CN 114168373A
Authority
CN
China
Prior art keywords
text data
log text
abnormal
risk
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111363265.4A
Other languages
Chinese (zh)
Inventor
董惠良
姜学峰
汪炎平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Tobacco Zhejiang Industrial Co Ltd
Original Assignee
China Tobacco Zhejiang Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Tobacco Zhejiang Industrial Co Ltd filed Critical China Tobacco Zhejiang Industrial Co Ltd
Priority to CN202111363265.4A priority Critical patent/CN114168373A/en
Publication of CN114168373A publication Critical patent/CN114168373A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0775Content or structure details of the error report, e.g. specific table structure, specific error fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a disaster recovery system abnormal point detection method based on NLP, which comprises the following steps: obtaining a log text data set, extracting a key value pair structure based on NLP semantic analysis, establishing a multi-dimensional word vector matrix for each word in a dictionary, and constructing each log text data word vector matrix; inputting the current time log text data and the previous time abnormal risk confidence coefficient into a long-term and short-term memory network to obtain an abnormal risk confidence coefficient, and comparing the abnormal risk confidence coefficient with an abnormal risk threshold value to obtain a risk log text data sequence; and inputting the risk log text data sequence into a Markov chain to obtain a transition probability, taking the current log text data as an abnormal point of the risk log text data sequence when the transition probability reaches an abnormal transition probability threshold, and determining an abnormal station in the production process based on the abnormal point. The method for detecting the abnormal points of the disaster recovery system can accurately detect the abnormal points in the log text.

Description

NLP-based disaster recovery system abnormal point detection method
Technical Field
The invention belongs to the field of disaster recovery backup of computer application systems, and particularly relates to a method for detecting an abnormal point of a disaster recovery system based on NLP.
Background
With the increasing dependence degree of the business of the tobacco industry on the information system, the requirements for the stability, the high availability and the quick recovery capability of the fault of the information system are also increasing, which puts higher requirements on an information center administrator, and the workload and the work difficulty of the administrator are greatly increased due to the fact that problems are quickly positioned and processed in hardware equipment, an operating system, a database, a middleware and even a business system. At present, the national tobacco administration and each provincial company establish a corresponding local or allopatric disaster recovery system aiming at a core business system, and basically can perform manual disaster recovery switching according to disaster grades, so that the switching efficiency is low, when a real disaster occurs, manpower and material resources for disaster recovery switching are huge, and the issuing of a disaster recovery switching instruction is basically based on the judgment of system management personnel on system faults, and the instruction accuracy is to be improved. Particularly, after some semi-automatic and automatic switching modes are introduced, how to accurately grasp the issuing time of the automatic switching instruction becomes a difficult problem to be overcome urgently.
At present, in the disaster recovery switching big data analysis direction, the method for analyzing the abnormity by adopting the service index and the machine index is mature, but the method for analyzing the system log still only stays at the stage of the primary comparison, the existing abnormity detection method based on the log is mainly divided into three types, namely, the PCA is used for abnormity detection, the correlation of different log categories is analyzed for abnormity detection, and the workflow model is used for abnormity detection. Although these anomaly detection methods can be effectively applied to specific fields, they are not a general online anomaly detection method. Therefore, it is necessary to automatically detect the abnormality through the system log. However, the application of abnormal pattern detection of the system log is complex, and the main challenges are as follows:
one, unstructured logs, whose format and semantics vary greatly between different systems, some existing approaches use rule-based approaches to solve this problem, but the design of rules relies on domain knowledge, such as regular expressions, which are commonly used in the industry. More importantly, rule-based methods are not applicable for general anomaly detection, since it is almost impossible to know in advance what the points of interest are in different types of logs.
Secondly, the timeliness. In order to enable a user to timely find the abnormality of the system, the abnormality detection must be timely, and the log data is input in a data stream form, which means that a method for analyzing the whole log data is not applicable.
And thirdly, the types of the exceptions are many. Various types of exceptions may be generated by the system and the application. It is desirable that the anomaly detection system not only be directed to specific anomaly types, but also be able to detect unknown anomalies. Meanwhile, the log message also contains rich information, such as log key, parameter value, timestamp, etc. Most existing anomaly detection methods only analyze specific parts of the log message (such as log keys), which limits the types of anomalies that they can detect.
Therefore, anomaly detection is an important task for establishing a safe and reliable computer system. As tobacco business systems become more complex, they are subject to more and more vulnerabilities and defects. Therefore, anomaly detection is increasingly challenging, and many conventional anomaly detection methods based on standard mining algorithms are no longer effective. The system log records events of system states and signals at different points to help solve performance problems and faults, and the system log in the disaster recovery system of the tobacco core business system is more helpful for determining whether a disaster recovery switching instruction is issued and what data support of the disaster recovery switching instruction is issued. Disaster recovery system logs record some key events that occur during the operation of the system, and therefore they are an excellent source of information for understanding the state of the system, on-line monitoring and anomaly detection. The system log records the state and important events of each time period of the system, and the system log is an important data source for anomaly detection, the anomaly detection is a key step for constructing a high-efficiency disaster recovery system, and based on the reasons, a necessary condition for improving the accuracy of the automatic disaster recovery switching instruction is to improve the accuracy of the anomaly detection.
Disclosure of Invention
The invention provides a method for detecting abnormal points of a disaster recovery system based on NLP, which can accurately detect the abnormal points in a log text.
An NLP-based disaster recovery system abnormal point detection method comprises the following steps:
(1) obtaining a log text data set, extracting each log text data based on NLP semantic analysis to obtain a key value pair structure, carrying out word frequency screening on a plurality of key value pair structures to construct a dictionary of the log text data set, establishing a multi-dimensional word vector matrix for each word in the dictionary by using a word2vec algorithm, and constructing each log text data word vector matrix by using a plurality of multi-dimensional word vector matrices;
(2) sequentially inputting the current-time log text data and the previous-time abnormal risk confidence coefficient to a Long Short-Term Memory network (LSTM) to obtain the current-time abnormal risk confidence coefficient based on the time sequence, taking the current-time log text data as risk log text data when the current-time abnormal risk confidence coefficient reaches an abnormal risk threshold value, and constructing a risk log text data sequence by using the current-time log text data and the risk log text data;
(3) and inputting each log text data in the risk log text data sequence into a Markov chain according to the time sequence to obtain the transfer probability of each log text data to the next log text data, taking the current log text data as an abnormal point of the risk log text data sequence when the transfer probability reaches an abnormal transfer probability threshold, and determining an abnormal station in the production process based on the abnormal point.
Extracting each log text data based on NLP semantic analysis to obtain a key-value pair structure, comprising:
based on NLP semantic analysis, extracting event types and time content text data corresponding to the event types in each log text data to construct a key value pair structure, wherein the event types are keys, and the corresponding time content text data are values.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies the theory and method of efficient communication between human and computer in natural language. The science integrating linguistics, computer science and mathematics is integrated. It is intended to extract information from text data. The purpose is for the computer to process or "understand" natural language to perform automatic translation, text classification, emotion analysis, and the like.
And performing word frequency screening of each word on the event type of each log text data and the time content text data corresponding to the event type to construct a dictionary.
The method comprises the following steps of sequentially inputting the current-time log text data and the previous-time abnormal risk confidence coefficient to a long-term and short-term memory network based on the time sequence to obtain the current-time abnormal risk confidence coefficient, and comprises the following steps:
sequencing each log text data, splicing each log text data word vector matrix with the abnormal risk confidence coefficient at the previous moment, and inputting the splicing result into the long-term and short-term memory network to obtain the abnormal risk confidence coefficient at the current moment.
The risk log text data is determined based on the abnormal risk confidence coefficient output by the long-term and short-term memory network, the abnormal risk confidence coefficient is a floating point number of 0-1, the larger the numerical value is, the higher the represented abnormal risk is, the abnormal risk threshold value is set, and when the abnormal risk confidence coefficient reaches the abnormal risk threshold value, the risk log text data is positioned.
Splicing the vector matrix of each log text data word with the confidence coefficient of the abnormal risk at the previous moment, wherein the splicing process comprises the following steps:
and converting the abnormal risk confidence coefficient into a multi-dimensional one-hot code, splicing the multi-dimensional one-hot code with the multi-dimensional word vector matrix of each log text data, and taking the splicing result as the input of the long-term and short-term memory network.
And converting the abnormal risk confidence coefficient into a multidimensional one-hot coding t as follows:
t=round(p×n)
wherein p is a floating point number of the confidence of the abnormal risk, and n is a word vector dimension of each log text data.
Transition probability lambdaijComprises the following steps:
Figure BDA0003360071990000041
wherein N isijIs the amount of the journal text data of the ith time shifted to the journal text data of the jth time within the unit time interval deltas,
Figure BDA0003360071990000042
Δ S is a unit time interval of two adjacent pieces of log text data, which is the number of log text data at the ith time in the log text data set within time T.
Determining the transfer probability of each log text data in the risk log text data sequence to the next time log text data through a Markov chain, wherein the Markov chain is a multidimensional asymmetric sparse matrix, and diagonal elements in the asymmetric sparse matrix are as follows:
λii=-∑i≠jλij
compared with the prior art, the invention has the beneficial effects that:
(1) the log text of the current tobacco industry disaster recovery system is analyzed through an NLP-type natural semantic analysis algorithm, each word of the log text is analyzed, and the abnormality detection precision is improved, so that powerful data support is provided for issuing of an automatic switching instruction, and a thought is provided for subsequent full-automatic disaster recovery switching.
(2) The method comprises the steps of firstly, locating a risk log text by using a long-short term memory network so as to quickly locate an abnormal text, extracting the log text related to the risk log text by using a time sequence to obtain a risk log text sequence of a workflow, and accurately and quickly obtaining abnormal points in the risk log text sequence by using a Markov chain.
Drawings
Fig. 1 is a block flow diagram of a method for detecting an abnormal point of a disaster recovery system based on NLP according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a method for detecting an abnormal point of a NLP-based disaster recovery system according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the present invention provides a method for detecting an abnormal point of a disaster recovery system based on NLP, which comprises the following steps:
(1) preprocessing a text log, which comprises the following specific steps:
obtaining log original data, namely a log text data set, dividing each log text data into a key-value pair structure of event type-time content text data by using NLP semantic analysis, wherein the log text data set comprises a plurality of key-value pair structures, performing word frequency screening on each word in the plurality of key-value pair structures to construct a dictionary of the log text data set, establishing a multi-dimensional word vector matrix for each word in the dictionary by using a word2vec algorithm, constructing each log text data word vector matrix by using a plurality of multi-dimensional word vector matrices, converting the log text semantic data into the multi-dimensional word vector data by converting the log text data into the plurality of multi-dimensional word vector matrices, and more accurately detecting abnormal data so as to obtain abnormal log text data;
(2) the method comprises the following steps of:
according to the time sequence, all log text data are subjected to time sequencing, current log text data and abnormal risk confidence degrees obtained at the last time through a long-short term memory network are spliced, the splicing result is input to the long-short term memory network again to obtain the abnormal risk confidence degree at the current time, the log text data are continuously input according to the time sequence, the abnormal risks of the input log text data are overlapped continuously, when the abnormal risk degree at the current time reaches an abnormal risk threshold value, the current log text data are used as risk log text data, and the log text data before the current time related to the risk log text data are used for constructing a risk log text data sequence;
the abnormal risk confidence coefficient is a floating point number of 0-1, the larger the numerical value is, the higher the represented abnormal risk is, the abnormal risk threshold value is set, when the abnormal risk confidence coefficient reaches the abnormal risk threshold value, risk log text data are positioned, the abnormal risk confidence coefficient is converted into a multi-dimensional one-hot code, the multi-dimensional one-hot code is spliced with a multi-dimensional word vector matrix of each log text data, the spliced result is used as the input of a long-term and short-term memory network, and the multi-dimensional one-hot code t is as follows:
t=round(p×n)
wherein p is a floating point number of the abnormal risk confidence coefficient, and n is a word vector dimension of each log text data;
(3) the workflow abnormity detection comprises the following specific steps:
the abnormal probability of the current log text data output by the LSTM-based algorithm can be regarded as the result of the accumulation of the abnormal confidence degrees of the previous log text data, when the abnormal risk exceeds a threshold value in the actual production, errors can occur before, and in order to further accurately investigate and position, the probability of the next log record is quantified by calculating a transition probability matrix through a workflow model, namely a Markov chain model;
constructing a workflow model based on the transfer process of the log text data, and calculating a transfer probability matrix, wherein the workflow model is a multidimensional asymmetric sparse matrix, and diagonal elements in the asymmetric sparse matrix are as follows:
λii=-∑i≠jλij
Figure BDA0003360071990000061
wherein λ isijTo transition probabilities, NijIs the amount of the journal text data of the ith time shifted to the journal text data of the jth time within the unit time interval deltas,
Figure BDA0003360071990000062
the quantity of the log text data at the ith moment in the log text data set in the time T is delta S, and the delta S is the unit time interval of two adjacent log text data;
and forming a subsequence, namely a risk log text data sequence, by the logs with higher abnormal risk probability at the current time t and the logs before the time t, inputting the risk log text data sequence into a workflow model to obtain the transition probability of each log text data to the next time log text data, taking the current log text data as an abnormal point of the risk log text data sequence when the transition probability reaches an abnormal transition probability threshold, and determining an abnormal station in the production process based on the abnormal point.
Examples
As shown in fig. 2, the present invention selects a part of log sequences, and detects abnormal points by the method for detecting abnormal points of the NLP-based disaster recovery system, which comprises the following steps:
selecting log original data:
l1:instance:Terminating instance
l2:instance:Instance destroyed successfully
l3:instance:Deleting instance files*
l4:instance:Permission Denied
l7:instance:Took 5s to destroy the instance
l6:instance:Termination of instance complete
s1, storing the log data liI 1.. 6, processing into a key-value pair type, wherein the key is a log type, and the value is log text content, and then processingThe word2vec model is processed as a (k +1) x n dimensional matrix, where k is log liI.e. the log sequence is converted into a three-dimensional array of 6 x (k +1) x n.
And S2, sequentially inputting the matrix of the log sequence into the LSTM to perform the path abnormity detection. Specifically, the risk probability of initializing log is 0, and the risk probability is converted into n-dimensional 0 vector and log l1: the matrix is spliced to form a (k +2) x n dimensional matrix, the matrix is input into the LSTM, and the abnormal risk probability p of the LSTM is output1。p1The value is less than the risk probability threshold value sigma, the probability p is continuously analyzed1After being rounded by Xn, the data is processed into n-dimensional one-hot codes and logs l2The input matrix of (k +2) x n input LSTM, output log l2Is abnormal risk probability p2Determining p2The value is less than the risk probability threshold σ. And so on until the log l is found4Probability of risk p4Greater than the threshold sigma. Forward selection of Log subsequence { l2,l3,l4And proceeds to S3.
S3 sub-sequence of workflow model { l2,l3,l4And analyzing and accurately positioning the specific position where the abnormality occurs. Specifically, the starting point of the log state is set to l2The transition probability matrix of the workflow model, the2The log content with the highest transition probability should be "removing instance files", and l3Are identical to each other, and
Figure BDA0003360071990000071
is normal metastasis. Then use3For the starting point of the state, the log with the highest transition probability should be "Deletion of files complete", l3To l4Transition probability of
Figure BDA0003360071990000072
So that the abnormality occurs in the log l4And the corresponding system flow accurately positions the abnormity.
The above embodiments are only preferred embodiments of the present invention, and the protection scope of the present invention is not limited thereby, and any insubstantial changes and substitutions made by those skilled in the art based on the present invention are within the protection scope of the present invention.

Claims (9)

1. An NLP-based disaster recovery system abnormal point detection method is characterized by comprising the following steps:
(1) obtaining a log text data set, extracting each log text data based on NLP semantic analysis to obtain a key value pair structure, carrying out word frequency screening on a plurality of key value pair structures to construct a dictionary of the log text data set, establishing a multi-dimensional word vector matrix for each word in the dictionary by using a word2vec algorithm, and constructing each log text data word vector matrix by using a plurality of multi-dimensional word vector matrices;
(2) sequentially inputting the log text data at the current moment and the abnormal risk confidence coefficient at the last moment into a long-term and short-term memory network based on the time sequence to obtain the abnormal risk confidence coefficient at the current moment, taking the log text data at the current moment as risk log text data when the abnormal risk confidence coefficient at the current moment reaches an abnormal risk threshold value, and constructing a risk log text data sequence by using the log text data before the current moment and the risk log text data;
(3) and inputting each log text data in the risk log text data sequence into a Markov chain according to the time sequence to obtain the transfer probability of each log text data to the next log text data, taking the current log text data as an abnormal point of the risk log text data sequence when the transfer probability reaches an abnormal transfer probability threshold, and determining an abnormal station in the production process based on the abnormal point.
2. The NLP-based disaster recovery system anomaly point detection method according to claim 1, wherein extracting each log text data based on NLP semantic analysis to obtain a key-value pair structure comprises:
based on NLP semantic analysis, extracting event types and time content text data corresponding to the event types in each log text data to construct a key value pair structure, wherein the event types are keys, and the corresponding time content text data are values.
3. The NLP-based disaster recovery system abnormal point detection method according to claim 2, wherein a dictionary is constructed by performing word frequency filtering of each word on the event type of each log text data and the time content text data corresponding to the event type.
4. The NLP-based disaster recovery system abnormal point detection method according to claim 1, wherein the method of sequentially inputting the current-time log text data and the previous-time abnormal risk confidence into the long-short term memory network based on the time sequence to obtain the current-time abnormal risk confidence comprises:
sequencing each log text data, splicing each log text data word vector matrix with the abnormal risk confidence coefficient at the previous moment, and inputting the splicing result into the long-term and short-term memory network to obtain the abnormal risk confidence coefficient at the current moment.
5. The NLP-based disaster recovery system abnormal point detection method according to claim 1 or 4, wherein the risk log text data is determined based on an abnormal risk confidence value output by a long-and-short-term memory network, the abnormal risk confidence value is a floating point number between 0 and 1, a larger numerical value represents a higher abnormal risk, an abnormal risk threshold value is set, and when the abnormal risk confidence value reaches the abnormal risk threshold value, the risk log text data is located.
6. The NLP-based disaster recovery system abnormal point detection method according to claim 5, wherein the splicing of each log text data word vector matrix and the abnormal risk confidence of the previous time includes:
and converting the abnormal risk confidence coefficient into a multi-dimensional one-hot code, splicing the multi-dimensional one-hot code with the multi-dimensional word vector matrix of each log text data, and taking the splicing result as the input of the long-term and short-term memory network.
7. The NLP-based disaster recovery system abnormal point detection method according to claim 6, wherein the conversion of the abnormal risk confidence into the multidimensional one-hot code t is:
t=round(p×n)
wherein p is a floating point number of the confidence of the abnormal risk, and n is a word vector dimension of each log text data.
8. The NLP-based disaster recovery system abnormal point detection method according to claim 1, wherein transition probability λ isijComprises the following steps:
Figure FDA0003360071980000021
wherein N isijIs the amount of the log text data of the ith time shifted to the log text data of the jth time within the unit time interval deltas,
Figure FDA0003360071980000022
Δ s is a unit time interval of two adjacent pieces of log text data, which is the number of log text data at the ith time in the log text data set within the time T.
9. The NLP-based disaster recovery system anomaly point detection method according to claim 8, wherein a markov chain is used to determine a transition probability for each log text data in the risk log text data sequence to be transferred to the next time log text data, wherein the markov chain is a multidimensional asymmetric sparse matrix, and diagonal elements in the asymmetric sparse matrix are:
λii=-∑i≠jλij
CN202111363265.4A 2021-11-17 2021-11-17 NLP-based disaster recovery system abnormal point detection method Pending CN114168373A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111363265.4A CN114168373A (en) 2021-11-17 2021-11-17 NLP-based disaster recovery system abnormal point detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111363265.4A CN114168373A (en) 2021-11-17 2021-11-17 NLP-based disaster recovery system abnormal point detection method

Publications (1)

Publication Number Publication Date
CN114168373A true CN114168373A (en) 2022-03-11

Family

ID=80479895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111363265.4A Pending CN114168373A (en) 2021-11-17 2021-11-17 NLP-based disaster recovery system abnormal point detection method

Country Status (1)

Country Link
CN (1) CN114168373A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116430817A (en) * 2023-04-26 2023-07-14 同心县启胜新能源科技有限公司 Data acquisition processing method and system applied to photovoltaic module production system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116430817A (en) * 2023-04-26 2023-07-14 同心县启胜新能源科技有限公司 Data acquisition processing method and system applied to photovoltaic module production system
CN116430817B (en) * 2023-04-26 2023-09-29 同心县启胜新能源科技有限公司 Data acquisition processing method and system applied to photovoltaic module production system

Similar Documents

Publication Publication Date Title
Lai et al. A method for pattern mining in multiple alarm flood sequences
CN113434357B (en) Log anomaly detection method and device based on sequence prediction
CN113326244B (en) Abnormality detection method based on log event graph and association relation mining
CN112016602B (en) Method, equipment and storage medium for analyzing correlation between power grid fault cause and state quantity
CN113254255B (en) Cloud platform log analysis method, system, device and medium
CN114915478B (en) Network attack scene identification method, system and storage medium of intelligent park industrial control system based on multi-agent distributed correlation analysis
CN111949480B (en) Log anomaly detection method based on component perception
CN114968727B (en) Database through infrastructure fault positioning method based on artificial intelligence operation and maintenance
Chen et al. Log analytics for dependable enterprise telephony
CN111814870B (en) CPS fuzzy test method based on convolutional neural network
CN114281864A (en) Correlation analysis method for power network alarm information
Lin et al. Facgraph: Frequent anomaly correlation graph mining for root cause diagnose in micro-service architecture
CN114168373A (en) NLP-based disaster recovery system abnormal point detection method
CN107579944B (en) Artificial intelligence and MapReduce-based security attack prediction method
CN113779590B (en) Source code vulnerability detection method based on multidimensional characterization
CN113064873B (en) Log anomaly detection method with high recall rate
Zhu et al. An approach to cloud platform log anomaly detection based on natural language processing and LSTM
Zhao et al. A survey of deep anomaly detection for system logs
An et al. Real-time Statistical Log Anomaly Detection with Continuous AIOps Learning.
Li et al. Improving performance of log anomaly detection with semantic and time features based on bilstm-attention
CN112039907A (en) Automatic testing method and system based on Internet of things terminal evaluation platform
CN115757062A (en) Log anomaly detection method based on sentence embedding and Transformer-XL
CN116167370A (en) Log space-time characteristic analysis-based distributed system anomaly detection method
Chen et al. Unsupervised Anomaly Detection Based on System Logs.
CN113535458B (en) Abnormal false alarm processing method and device, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination