CN114610576A - Log generation monitoring method and device - Google Patents

Log generation monitoring method and device Download PDF

Info

Publication number
CN114610576A
CN114610576A CN202210252421.8A CN202210252421A CN114610576A CN 114610576 A CN114610576 A CN 114610576A CN 202210252421 A CN202210252421 A CN 202210252421A CN 114610576 A CN114610576 A CN 114610576A
Authority
CN
China
Prior art keywords
corpus
word
lda
topic
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210252421.8A
Other languages
Chinese (zh)
Inventor
张馨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202210252421.8A priority Critical patent/CN114610576A/en
Publication of CN114610576A publication Critical patent/CN114610576A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a log generation monitoring method and a log generation monitoring device, which belong to big data, and comprise the following steps: using a tangent plane to act on a transaction interface of a business field system, and determining initial transaction log data; performing word segmentation processing on the initial transaction log data to generate a corpus; sampling a corpus, and training to obtain an LDA model based on the corpus; according to the LDA model based on the corpus, keyword extraction is carried out on the transaction logs in the target service field, a keyword library is established, initial transaction log data are combined, the flow log standards of all service fields are not analyzed and determined manually, flow log files of the system in the target service field are automatically generated, the labor cost is effectively reduced, the flow log files of the system in the target service field are monitored, the running state of the system can be obtained in real time, the accuracy of flow log monitoring is improved, and the running safety of the system is guaranteed.

Description

Log generation monitoring method and device
Technical Field
The invention relates to the technical field of computer data processing, in particular to a log generation monitoring method and device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
The journal of each system of the commercial bank is an important means for evaluating the operation condition of the system. The analysis and monitoring aiming at the journal can effectively track the change condition of the transaction parameters in the specific service field, find the running risk of the system in time, prevent the system from being affected in the future and effectively ensure the stable running of the system.
However, due to financial transaction differences in different business fields, traditional journal generation needs to analyze characteristics of financial transactions according to characteristics of each product system and each transaction to determine fields and an acquisition mode required by the journal, so that the journal cannot be generated quickly, and quick popularization is difficult to achieve.
Therefore, how to provide a new solution, which can solve the above technical problems, is a technical problem to be solved in the art.
Disclosure of Invention
The embodiment of the invention provides a log generation monitoring method, which does not need to compile log generation codes according to the specific conditions of a system in a service field, improves the expandability of generating a journal log, establishes the journal log standard in each service field, does not need manpower to analyze and determine the journal log standard in each service field, automatically generates a journal log file of a system in a target service field, effectively reduces the manpower cost, monitors the journal log file of the system in the target service field, can acquire the running state of the system in real time, improves the accuracy of monitoring the journal log and ensures the running safety of the system, and comprises the following steps:
using a tangent plane to act on a transaction interface of a business field system, and determining initial transaction log data;
performing word segmentation processing on the initial transaction log data to generate a corpus;
sampling a corpus, and training to obtain an LDA (Latent Dirichlet Allocation) model based on the corpus;
extracting keywords from the transaction logs in the target service field according to an LDA model based on a corpus, establishing a keyword library, and generating a journal file of a target service field system by combining initial transaction log data;
and monitoring the flow log file of the target business field system.
An embodiment of the present invention further provides a log generation monitoring apparatus, including:
the initial transaction log data determining module is used for determining initial transaction log data by using a transaction interface acted on a service field system by using a tangent plane;
the corpus generation module is used for performing word segmentation processing on the initial transaction log data to generate a corpus;
the LDA model training module based on the corpus is used for sampling the corpus and training to obtain an LDA model based on the corpus;
the flow log file generation module is used for extracting keywords from the transaction logs in the target service field according to an LDA (latent dirichlet allocation) model based on a corpus, establishing a keyword library and generating a flow log file of a target service field system by combining initial transaction log data;
and the running log file monitoring module is used for monitoring the running log files of the target business field system.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the log generation monitoring method is implemented.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the log generation monitoring method is implemented.
An embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program, and when the computer program is executed by a processor, the log generation monitoring method is implemented.
The embodiment of the invention provides a log generation monitoring method and a log generation monitoring device, which comprise the following steps: using the tangent plane to act on a transaction interface of a business field system, and determining initial transaction log data; performing word segmentation processing on the initial transaction log data to generate a corpus; sampling a corpus, and training to obtain an LDA model based on the corpus; extracting keywords from the transaction logs in the target service field according to an LDA model based on a corpus, establishing a keyword library, and generating a journal file of a target service field system by combining initial transaction log data; and monitoring the flow log file of the target business field system. According to the method, the initial transaction log data of each service field system is obtained by using the tangent plane technology, the log generation code does not need to be written according to the specific situation of the service field system, and the expandability of the generation of the flow log is improved; initial transaction log data acquired by systems in different business fields are trained to generate an LDA model based on a corpus, and a journal standard of each business field is established without manually analyzing and determining the journal standard of each business field. Meanwhile, based on the LDA model based on the corpus obtained after training, a keyword extraction algorithm is adopted to extract keywords from the transaction logs of the new target business field system and establish a keyword library, and the journal file of the target business field system is automatically generated, so that the labor cost is effectively reduced, the journal file of the target business field system is monitored, the running state of the system can be obtained in real time, the monitoring accuracy of the journal is improved, and the running safety of the system is guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
fig. 1 is a schematic diagram of a log generation monitoring method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a process of determining initial transaction log data of a log generation monitoring method according to an embodiment of the present invention.
Fig. 3 is a corpus generation process diagram of a log generation monitoring method according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a process of generating a flow log file of a target service domain system according to a log generation monitoring method in an embodiment of the present invention.
Fig. 5 is a schematic diagram of a computer device for executing a log generation monitoring method implemented by the present invention.
Fig. 6 is a schematic diagram of a log generation monitoring apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
The invention belongs to big data. Fig. 1 is a schematic diagram of a log generation monitoring method according to an embodiment of the present invention, and as shown in fig. 1, an embodiment of the present invention provides a log generation monitoring method, which does not need to write a log generation code according to specific conditions of a system in a service field, improves extensibility of generation of a journal log, establishes a journal standard in each service field, does not need to analyze and determine the journal standard in each service field by manpower, automatically generates a journal file in a target service field system, effectively reduces manpower cost, monitors the journal file in the target service field system, can obtain an operating state of the system in real time, improves accuracy of journal monitoring, and ensures safety of system operation, and the method includes:
step 101: using a tangent plane to act on a transaction interface of a business field system, and determining initial transaction log data;
step 102: performing word segmentation processing on the initial transaction log data to generate a corpus;
step 103: sampling a corpus, and training to obtain an LDA model based on the corpus;
step 104: extracting keywords from the transaction logs in the target service field according to an LDA model based on a corpus, establishing a keyword library, and generating a journal file of a target service field system by combining initial transaction log data;
step 105: and monitoring the flow log file of the target business field system.
The embodiment of the invention provides a log generation monitoring method and a log generation monitoring device, which comprise the following steps: using the tangent plane to act on a transaction interface of a business field system, and determining initial transaction log data; performing word segmentation processing on the initial transaction log data to generate a corpus; sampling a corpus, and training to obtain an LDA model based on the corpus; extracting keywords from the transaction logs in the target service field according to an LDA model based on a corpus, establishing a keyword library, and generating a journal file of a target service field system by combining initial transaction log data; and monitoring the flow log file of the target business field system. According to the method, the initial transaction log data of each service field system is obtained by using the tangent plane technology, the log generation code does not need to be written according to the specific situation of the service field system, and the expandability of the generation of the flow log is improved; initial transaction log data acquired by systems in different business fields are trained to generate an LDA model based on a corpus, and a journal standard of each business field is established without manually analyzing and determining the journal standard of each business field. Meanwhile, based on the LDA model based on the corpus obtained after training, a keyword extraction algorithm is adopted to extract keywords from the transaction logs of the new target business field system and establish a keyword library, and the journal file of the target business field system is automatically generated, so that the labor cost is effectively reduced, the journal file of the target business field system is monitored, the running state of the system can be obtained in real time, the monitoring accuracy of the journal is improved, and the running safety of the system is guaranteed.
For the generation and analysis of the system log, the existing method is as follows: 1. each service field system carries out specific analysis according to the characteristics of each system, determines the standard of each flow log, generates the flow logs in a unique form, further realizes the analysis and monitoring of the flow logs of the system, and has poor expandability; 2. and each product system generates a system flow log with a uniform format by screening public parameter information such as interface transaction codes, calling time and the like, and finally monitors according to uniform indexes.
The technical defects of the prior art are as follows: in the scheme 1, when a product system changes or is newly accessed into the product system, the flow log form needs to be analyzed and determined again, so that the expandability is poor, the popularization cost is high, and the implementation speed is low; the scheme 2 does not consider the transaction characteristics of a specific financial scene, and the running log monitoring granularity is coarse, so that accurate monitoring is difficult to realize.
The embodiment of the invention generates the theme model, namely the running water log standard of each service field by training the system logs of different service fields, and does not need manpower to analyze and determine the running water log standard of each service field. Meanwhile, based on the topic model obtained after training, a Subject-Latent Dirichlet Allocation (S-LDA) keyword extraction algorithm based on LDA (Latent Dirichlet Allocation) is adopted to extract keywords from the new system logs and establish a keyword library, so that the running log file of the target service field system is automatically generated, and the labor cost is effectively reduced.
In the embodiment of the present invention, the aforementioned Latent Dirichlet Allocation (LDA) is a bag-of-words model, which considers a document as a set formed by a group of words, and there is no order between words. A document may contain a plurality of topics, each word in the document being generated by a topic, LDA giving the probability distribution of the document belonging to each topic, and giving the probability distribution of words on each topic. LDA is an unsupervised learning, and has application in the aspects of text theme recognition, text classification, text similarity calculation, article similarity recommendation and the like. The topic-Latent Dirichlet Allocation (S-LAD for short) is an algorithm model for extracting keywords based on probability distribution realized by using topics as main classification items on the basis of Latent Dirichlet Allocation.
In a specific implementation of the log generation monitoring method provided by the embodiment of the present invention, in an embodiment, the method includes:
using a tangent plane to act on a transaction interface of a business field system, and determining initial transaction log data;
performing word segmentation processing on the initial transaction log data to generate a corpus;
sampling a corpus, and training to obtain an LDA model based on the corpus;
according to an LDA model based on a corpus, extracting key words from a transaction log of a target business field, establishing a key word library, and generating a journal file of a target business field system by combining initial transaction log data;
and monitoring the flow log file of the target business field system.
According to the method, the initial transaction log data of each service field system is obtained by using the tangent plane technology, the log generation code does not need to be written according to the specific situation of the service field system, and the expandability of the generation of the flow log is improved; initial transaction log data acquired by systems in different business fields are trained to generate an LDA model based on a corpus, and a journal standard of each business field is established without manually analyzing and determining the journal standard of each business field. Meanwhile, based on the LDA model based on the corpus obtained after training, a keyword extraction algorithm is adopted to extract keywords from the transaction logs of the new target business field system and establish a keyword library, and the journal file of the target business field system is automatically generated, so that the labor cost is effectively reduced, the journal file of the target business field system is monitored, the running state of the system can be obtained in real time, the monitoring accuracy of the journal is improved, and the running safety of the system is guaranteed.
Fig. 2 is a schematic diagram of a process of determining initial transaction log data of a log generation monitoring method according to an embodiment of the present invention, and as shown in fig. 2, when the log generation monitoring method provided in the embodiment of the present invention is implemented specifically, in an embodiment, a service plane is used for a transaction interface of a service system to determine initial transaction log data, where the process includes:
step 201: creating a tangent plane class by using a tangent plane algorithm, acting on transaction interfaces of all service field systems, and acquiring transaction parameter details;
step 202: and generating initial transaction log data according to the transaction parameter details.
In an embodiment, to obtain initial transaction log data, a main process includes: under the condition that each business field system normally operates, firstly, a tangent plane class is established by using a tangent plane algorithm, and the tangent plane class acts on transaction interfaces of all business field systems to obtain transaction parameter details; initial transaction log data is then generated according to the transaction parameter details.
Fig. 3 is a schematic diagram of a corpus generation process of a log generation monitoring method according to an embodiment of the present invention, and as shown in fig. 3, when the log generation monitoring method according to an embodiment of the present invention is implemented specifically, in an embodiment, a word segmentation process is performed on initial transaction log data to generate a corpus, including:
step 301: acquiring initial transaction log data within a set duration;
step 302: extracting the name of a transaction parameter field from initial transaction log data within a set time length;
step 303: and performing English word segmentation operation on the extracted transaction parameter field names by using a word segmentation tool to generate a corpus.
In an embodiment, the process of generating a corpus mainly includes: firstly, acquiring initial transaction log data within a set duration; in one example, initial transaction log data of all business field systems for a half year period can be obtained; then, transaction parameter field name extraction is carried out on the initial transaction log data within a set time length; in one example, the transaction parameter field name extraction can be performed on initial transaction log data of all business field systems in half a year; and finally, performing English word segmentation operation on the extracted transaction parameter field names by using a word segmentation tool to generate a corpus. Specifically, transaction parameter field names are extracted from initial transaction log data of each business field system, which is half a year in period, and an English word segmentation operation is completed by using a word segmentation tool to generate a corpus.
When the log generation monitoring method provided by the embodiment of the present invention is implemented specifically, in an embodiment, a corpus is sampled and trained to obtain an LDA model based on the corpus, including:
sampling the corpus by adopting a Gibbs sampling algorithm, and determining the theme of each word after sampling convergence;
and training to obtain an LDA model based on a corpus according to the theme of each word.
In an embodiment, the process of training the LDA model based on the corpus mainly includes: firstly, adopting a Gibbs sampling algorithm to sample a corpus, and determining the theme of each word after sampling convergence; and then training to obtain an LDA model based on a corpus according to the theme of each word.
Based on the LDA topic model, adopting a Gibbs sampling algorithm to perform sampling, knowing the topic of each word after the sampling is converged, and further obtaining the topic distribution of the corpus and the word distribution of each topic through statistical calculation to finally obtain the LDA model based on the corpus.
In a specific implementation of the log generation monitoring method provided by the embodiment of the present invention, in an embodiment, a Gibbs sampling algorithm is used to sample a corpus, and after sampling convergence, a topic of each word is determined, including:
determining the number of topics and the super-parameter vector;
randomly assigning a theme number to each word of each corpus in the corpus;
rescanning the corpus, and for each word, resampling and updating the topic number of the word by using a Gibbs sampling formula, and updating the number of the word in the corpus;
and repeatedly executing sampling updating until the sampling is converged, and determining the theme of each word in the corpus.
In the embodiment, Gibbs sampling belongs to a special markov chain algorithm, and is often used to solve a series of problems including matrix decomposition, tensor decomposition, and the like, and is also called alternating conditional sampling (alternating conditional sampling), where the term "alternating" means that Gibbs sampling is an iterative algorithm, and corresponding variables are used alternately in the process of iteration, and in addition, the term "condition" is added because the core of Gibbs sampling is bayesian theory, and observation values are used as conditions around prior knowledge and observation data to infer posterior distribution. In the embodiment of the invention, the Gibbs sampling formula constructed by Gibbs sampling is used for re-sampling and updating the topic number of the word, and the number of the word in the corpus is updated.
When the log generation monitoring method provided by the embodiment of the present invention is implemented specifically, in an embodiment, the method for training and obtaining the LDA model based on the corpus according to the topic of each word includes:
counting the topic number of each word in each corpus in a corpus to obtain a document-topic distribution parameter;
counting the distribution of each topic-word in the corpus to obtain topic-word distribution parameters of the LDA model;
determining the topic distribution of the documents in the corpus according to the document-topic distribution parameters;
determining word distribution of each topic in a corpus according to topic-word distribution parameters of the LDA model;
and training the LDA topic model according to the topic distribution of the documents in the corpus and the word distribution of each topic in the corpus to obtain the LDA model based on the corpus.
In an embodiment, a Gibbs sampling algorithm is adopted to sample a corpus, a topic of each word is determined after sampling convergence, an LDA model based on the corpus is obtained by training according to the topic of each word, and the specific process may include:
determining the number of topics and the super-parameter vector;
randomly assigning a topic number to each word of each corpus in the corpus;
rescanning the corpus, and for each word, resampling and updating the topic number of the word by using a Gibbs sampling formula, and updating the number of the word in the corpus;
repeatedly executing sampling updating until sampling is converged, and determining the theme of each word in the corpus;
counting the topic number of each word in each corpus in a corpus to obtain a document-topic distribution parameter;
counting the distribution of each topic-word in the corpus to obtain topic-word distribution parameters of the LDA model;
determining the theme distribution of the documents in the corpus according to the document-theme distribution parameters;
determining word distribution of each topic in a corpus according to topic-word distribution parameters of the LDA model;
and training the LDA topic model according to the topic distribution of the documents in the corpus and the word distribution of each topic in the corpus to obtain the LDA model based on the corpus.
Specifically, based on the LDA topic model, the obtained corpus is sampled by using a Gibbs sampling algorithm, and the LDA model training process based on Gibbs sampling is as follows:
(1) determining a suitable number of topics K and selecting a suitable hyper-parameter vector
Figure BDA0003547327300000081
(2) Randomly assigning a topic number k to each word w of each corpus in the corpus;
(3) rescanning the corpus, and for each word w, resampling and updating the topic number of the word by using a Gibbs sampling formula, and updating the number of the word in the corpus;
(4) repeating the Gibbs sampling process in the step 3, and switching to the step 5 when the sampling is converged;
(5) counting the topic number of each word in each corpus in the corpus to obtain the document-topic distribution
Figure BDA0003547327300000082
Then, the distribution of each topic-word in the corpus is counted to obtain the topic-word distribution of the LDA model
Figure BDA0003547327300000083
(6) And further obtaining the theme distribution of the document and the word distribution of each theme through statistical calculation, and finally obtaining the LDA model trained based on the corpus.
Fig. 4 is a schematic diagram of a process of generating a journal file of a target service field system by using a journal generation monitoring method according to an embodiment of the present invention, and as shown in fig. 4, when the journal generation monitoring method provided by the embodiment of the present invention is implemented specifically, in an embodiment, according to an LDA model based on a corpus, keyword extraction is performed on a transaction journal in a target service field, a keyword library is established, and a journal file of the target service field system is generated by combining initial transaction journal data, where the process includes:
step 401: establishing an LDA-based S-LDA keyword extraction algorithm according to an LDA model based on a corpus;
step 402: extracting keywords from the transaction logs in the target service field according to an LDA-based S-LDA keyword extraction algorithm to establish a keyword library;
step 403: and generating a flow log file of the target business field system according to the keyword library and the initial transaction log data.
In the embodiment, generating a flow log file of a target business field system is a core creation point of the embodiment of the invention, and the main process comprises the following steps:
firstly, establishing an LDA-based S-LDA keyword extraction algorithm according to an LDA model based on a corpus; then, extracting keywords from the transaction logs in the target service field according to an LDA-based S-LDA keyword extraction algorithm, and establishing a keyword library; and finally, generating a flow log file of the target business field system according to the keyword library and the initial transaction log data.
The embodiment of the invention provides an LDA-based S-LDA keyword extraction algorithm, and the S-LDA keyword extraction algorithm is adopted to extract keywords from the segmented transaction logs in the target service field and establish a keyword library. And generating a flow log file of the target business field system according to the initial transaction log and the keyword library generated in the step 101.
When the log generation monitoring method provided by the embodiment of the present invention is implemented specifically, in an embodiment, an LDA-based S-LDA keyword extraction algorithm is established according to an LDA model based on a corpus, and the method includes:
constructing a training corpus required by model training according to a given text file set, and training the training corpus by using an LDA (latent dirichlet allocation) model based on the corpus to obtain an LDA model;
constructing a target corpus according to the target text file, and predicting to obtain the theme distribution of the target corpus by adopting a Gibbs sampling algorithm;
filtering the theme of the target text file to obtain a filtered theme set;
constructing word selection weights of the topics corresponding to the topics of the topic set in the topic distribution of the target text file, sequentially selecting a set number of words from each topic according to the probability sequence of the topic distribution from large to small, and maintaining the sequence of the selected words to construct a keyword candidate word set;
and filtering the keyword candidate word set to determine the keywords of the target text file.
In a specific implementation of the log generation monitoring method provided in an embodiment of the present invention, in an embodiment, the filtering is performed on the topic of the target text file to obtain a filtered topic set, where the filtering includes:
setting a first auxiliary vector;
calculating the similarity between the word distribution of each topic in the target text file and the first auxiliary vector, and determining a first JS divergence value;
and when the first JS divergence value is smaller than a first set divergence threshold value, deleting the current theme from the theme distribution of the target text file to obtain a filtered theme set.
In a specific implementation of the log generation monitoring method provided in an embodiment of the present invention, in an embodiment, the filtering is performed on the keyword candidate word set to determine the keywords of the target text file, including:
setting a second auxiliary vector;
calculating the similarity between the topic distribution of each candidate word in the keyword candidate word set and the second auxiliary vector, and determining a second JS divergence value;
when the second JS divergence value is smaller than a second set divergence threshold value, deleting the current candidate word from the keyword candidate word set, and selecting a candidate word with the part of speech being a noun or a verb and the name S before ranking from the rest keyword candidate word set as a keyword of the target text file; and the candidate words with the names S before the ranking are ranked from big to small according to the occurrence frequency of the candidate words, and the candidate words with the names S before the ranking are obtained.
The embodiment of the invention establishes an LDA-based S-LDA keyword extraction algorithm according to an LDA model based on a corpus, and the process of the LDA-based S-LDA keyword extraction algorithm established by the embodiment of the invention is shown in Table 1:
TABLE 1
Figure BDA0003547327300000101
Figure BDA0003547327300000111
The specific flow of the S-LDA keyword extraction algorithm based on the LDA is as follows:
(1) and constructing a corpus required by model training according to a given document set, and training the corpus based on the LDA topic model to obtain the LDA model.
(2) And constructing a target corpus according to the target document, and predicting the theme distribution of the target corpus by adopting a Gibbs sampling algorithm.
(3) And filtering the subject of the target text file: calculating word distribution and first auxiliary vector of each topic in the target text file according to the JS divergence calculation formula
Figure BDA0003547327300000112
Determining a first JS divergence value,
Figure BDA0003547327300000113
the calculation formula of (1) is shown as formula (1-1), and the calculation formula of the first JS divergence value is shown as formula (1-2). Setting a first set divergence threshold value, in experimentsIs 0.2. And when the JS divergence value is smaller than a first set divergence threshold value, deleting the current theme from the theme distribution of the target text file so as to obtain a filtered theme set.
Figure BDA0003547327300000121
Figure BDA0003547327300000122
Wherein,
Figure BDA0003547327300000123
representing a first auxiliary vector; JS (V)z_i||Aux1) Representing a first JS divergence value; v represents an auxiliary vector
Figure BDA0003547327300000124
The number of feature dimensions of (a); KL (P | | Q) represents the KL Divergence (Kullback Leibler Divergence) of the probability distributions P and Q; z is a radical ofiRepresenting the ith theme obtained by predicting the target document file; vz_iRepresenting a topic ziThe corresponding word distribution.
(4) According to the proportion pro of the theme i corresponding to the theme distribution of the target text fileiConstructing word selection weight of subject iiAnd the specific expression is shown in formula (5-3). According to the probability sequence of the topic distribution from large to small, a certain number of words are selected from each topic in turn according to a formula (5-4), and the sequence of the words is maintained to construct a keyword candidate word set.
Figure BDA0003547327300000125
select_numi=weighti·5n
=5weighti·n (1-4)
Wherein i represents a topic; weightiRepresenting a vote weight for topic i; proiRepresenting the proportion of the theme i corresponding to the theme distribution of the target text file; n represents the number; select _ numiRepresenting a topic ziThe number of the keyword candidate words selected from the subject words; represents dot multiplication.
(5) Setting a second auxiliary vector
Figure BDA0003547327300000126
The expression of which is shown in formulas (1-5). And calculating the similarity of the topic distribution and the auxiliary vector of each word in the keyword candidate word set according to a JS divergence calculation formula, and determining a second JS divergence value, wherein the specific calculation formula is shown as a formula (1-6). The divergence threshold was set at a value of 0.2 in the experiment. Then, the words with higher similarity are filtered and deleted from the candidate word set, and finally words with parts of speech being nouns or verbs, appearing in the target text and having names s before ranking are selected from the keyword candidate word set as keywords of the target text, wherein s is a positive integer and belongs to [3,10 ]]S may be chosen at random.
Figure BDA0003547327300000127
Figure BDA0003547327300000128
Wherein,
Figure BDA0003547327300000131
representing a second auxiliary vector; JS (T)w_i||Aux2) Representing a second JS divergence value; k represents an auxiliary vector
Figure BDA0003547327300000132
The number of feature dimensions of (a); KL (P | | Q) represents the KL Divergence (Kullback Leibler Divergence) of the probability distributions P and Q; w is aiRepresenting the ith word in the keyword candidate word set; t isw_iThe expression wiA corresponding distribution of topics.
The expressions of the above-mentioned formulas (1-1) to (1-6) are only examples, and those skilled in the art will understand that the above formulas may be modified in certain forms and other parameters or data may be added or other specific formulas may be provided according to the needs, and such modifications are within the scope of the present invention.
When the log generation monitoring method provided by the embodiment of the present invention is implemented specifically, in an embodiment, the keyword extraction is performed on the transaction log in the target service field according to an LDA-based S-LDA keyword extraction algorithm, and a keyword library is established, where the method includes:
performing word segmentation processing on the transaction log of the target service field;
and adopting an LDA-based S-LDA keyword extraction algorithm to extract keywords from the transaction logs of the target service field after the word segmentation processing, and establishing a keyword library.
In the embodiment, according to an LDA-based S-LDA keyword extraction algorithm, keyword extraction is carried out on transaction logs in a target service field, and a keyword library is established, wherein the main process comprises the following steps: firstly, performing word segmentation processing on a transaction log in a target service field; and then, performing keyword extraction on the transaction log of the target service field after the word segmentation by adopting an LDA-based S-LDA keyword extraction algorithm to establish a keyword library.
When the log generation monitoring method provided by the embodiment of the present invention is implemented specifically, in an embodiment, generating a journal file of a target business field system according to a keyword library and initial transaction log data includes:
and sequentially taking each keyword in the keyword library, inquiring a value corresponding to the keyword from the initial transaction log data, and generating a flow log file of the target business field system.
In the embodiment, an S-LDA keyword extraction algorithm is adopted to extract keywords from the segmented transaction logs in the target service field, and a keyword library is established. And (3) sequentially taking each keyword in the keyword library, inquiring a value corresponding to the keyword (transaction parameter) from the initial transaction log generated in the step (101), and generating a flow log file of the target business field system. The embodiment of the invention provides an LDA-based S-LDA keyword extraction algorithm, and the S-LDA keyword extraction algorithm is adopted to extract keywords from the segmented transaction logs in the target service field and establish a keyword library. And generating a flow log file of the target business field system according to the initial transaction log and the keyword library generated in the step 101.
When the log generation monitoring method provided by the embodiment of the present invention is implemented specifically, in an embodiment, the monitoring of the running log file of the target business field system includes:
and monitoring the generated running log file of the target service field system according to the keywords, and acquiring the running state information of the target service field system in real time.
In the system operation process, the generated flow log is monitored according to the keywords, the operation state of the system is obtained in real time, and the system operation safety is guaranteed. When the financial transaction enters the product system, the running log containing the keyword (transaction parameter field name) is obtained through the running log generation module. And monitoring the generated flow log and monitoring the running state of the system.
According to the method, the initial transaction log data of each service field system is obtained by using the tangent plane technology, the log generation code does not need to be written according to the specific situation of the service field system, and the expandability of the generation of the flow log is improved; initial transaction log data acquired by systems in different business fields are trained to generate an LDA model based on a corpus, and a journal standard of each business field is established without manually analyzing and determining the journal standard of each business field. Meanwhile, based on the LDA model based on the corpus obtained after training, a new transaction log of the target business field system is subjected to keyword extraction by adopting an LDA-based S-LDA keyword extraction algorithm, a keyword library is established, and a journal file of the target business field system is automatically generated, so that the labor cost is effectively reduced, the journal file of the target business field system is monitored, the running state of the system can be obtained in real time, the monitoring accuracy of the journal is improved, and the running safety of the system is guaranteed.
The following briefly describes a log generation monitoring method provided by an embodiment of the present invention with reference to specific scenarios:
the log generation monitoring method provided by the embodiment of the invention mainly comprises the following steps:
under the condition that the system in each service field normally operates, the tangent plane is used and acts on each transaction interface to obtain the details of transaction parameters and generate an initial transaction log.
Collecting initial transaction logs of half a year of different business fields of the bank generated in the step 101, processing the transaction logs, extracting field names of transaction parameters, performing English word segmentation, and generating a corpus. Based on the LDA topic model, adopting a Gibbs sampling algorithm to perform sampling, knowing the topic of each word after the sampling is converged, and further obtaining the topic distribution of the corpus and the word distribution of each topic through statistical calculation to finally obtain the LDA model based on the corpus.
And providing an LDA-based S-LDA keyword extraction algorithm, and extracting keywords from the segmented transaction logs of the target service field by adopting the S-LDA keyword extraction algorithm to establish a keyword library. And generating a flow log file of the target business field system according to the initial transaction log and the keyword library generated in the step 101.
In the system operation process, the generated flow log is monitored according to the keywords, the operation state of the system is obtained in real time, and the system operation safety is guaranteed.
From the development point of view, technicians need to complete the operation of creating a profile class, obtaining system initial transaction logs of systems in various business fields, establishing a log-keyword model, generating a flow log and monitoring the flow log. Among them, the running water log model building operation and the running water log generating operation are most important to the present invention.
Specifically, the process of implementing a log generation monitoring method according to an embodiment of the present invention mainly includes:
step 1: and (3) using a tangent plane technology, creating tangent plane classes, enabling the tangent plane classes to act on each transaction interface of each service field system, acquiring transaction parameter details, outputting and generating an initial transaction log.
Step 2: and (3) acquiring initial transaction logs of half a year of different business fields of the bank generated in the step (1), processing the transaction logs, extracting field names of transaction parameters, performing English word segmentation, and generating a corpus.
And step 3: and (3) sampling the corpus obtained in the step (2) by adopting a Gibbs sampling algorithm based on the LDA topic model. The LDA model training process based on Gibbs sampling is as follows:
(1) determining the appropriate number of topics K and selecting the appropriate hyper-parameter vector
Figure BDA0003547327300000151
(2) And randomly assigning a topic number k to each word w of each corpus in the corpus.
(3) The corpus is rescanned, for each word w, its topic number is updated using Gibbs sampling formula resampling, and the number of the word in the corpus is updated.
(4) Repeating the Gibbs sampling process in the step 3, and switching to the step 5 when the sampling is converged.
(5) Counting the topic number of each word in each corpus in the corpus to obtain the document-topic distribution
Figure BDA0003547327300000152
Then, the distribution of each topic-word in the corpus is counted to obtain the topic-word distribution of the LDA model
Figure BDA0003547327300000153
And further obtaining the topic distribution of the document and the word distribution of each topic through statistical calculation, and finally obtaining the LDA model trained based on the corpus.
And 4, step 4: an LDA-based S-LDA keyword extraction algorithm is provided, and the specific flow of the algorithm is as follows:
(1) and constructing a corpus required by model training according to a given document set, and training the corpus based on the LDA topic model to obtain the LDA model.
(2) And constructing a target corpus according to the target document, and predicting the theme distribution of the target corpus by adopting a Gibbs sampling algorithm.
(3) And filtering the theme of the target text file: calculating the word distribution and the auxiliary vector of each theme in the target text file according to the JS divergence calculation formula
Figure BDA0003547327300000154
The degree of similarity of (a) to (b),
Figure BDA0003547327300000155
the formula (2) is as formula (1-1), and the formula for calculating similarity is as formula (1-2). A divergence threshold was set, with a value of 0.2 in the experiment. And when the JS divergence value is smaller than the threshold value, deleting the current theme from the theme distribution of the target text file, thereby obtaining a filtered theme set.
(4) According to the proportion pro of the theme i corresponding to the theme distribution of the target text fileiConstructing word selection weight of subject iiThe specific expression of the method is shown in the formulas (1-3). According to the probability sequence of the topic distribution from large to small, a certain number of words are selected from each topic in turn according to a formula (1-4), and a keyword candidate word set is constructed by keeping the appearance sequence of the words.
(5) Setting auxiliary vectors
Figure BDA0003547327300000161
The expression of which is shown in formulas (1-5). And calculating the similarity of the topic distribution and the auxiliary vector of each word in the keyword candidate word set according to the JS divergence, wherein a specific calculation formula is shown in formulas (1-6). The divergence threshold was set at a value of 0.2 in the experiment. Then, the words with higher similarity are filtered and deleted from the candidate word set, and finally words with parts of speech being nouns or verbs, appearing in the target text and having names s before ranking are selected from the keyword candidate word set as keywords of the target text, wherein s is a positive integer and belongs to [3,10 ]]S may be randomly chosen to be a certain value.
And 5: and extracting keywords from the segmented target service field transaction logs by adopting an S-LDA keyword extraction algorithm to establish a keyword library. And (3) sequentially taking each keyword in the keyword library, inquiring a value corresponding to the keyword (transaction parameter) from the initial transaction log generated in the step (1), and generating a flow log file of the target business field system.
Step 6: in the system operation process, the generated flow log is monitored according to the keywords, the operation state of the system is obtained in real time, and the system operation safety is guaranteed.
The embodiment of the invention also provides a modularized example of the log generation monitoring method, which comprises the following steps: the system comprises a section-based system initial transaction log acquisition module, a log-keyword model establishment module, a flow log generation module and a flow log monitoring module.
The system initial transaction log acquisition module based on the section: under the condition that the system in each service field normally operates, the tangent plane is used and acts on each transaction interface to obtain the details of transaction parameters and generate an initial transaction log.
A log-keyword model building module: and extracting the field names of the transaction parameters of the initial transaction log data of each business field system, and completing word segmentation operation by using a word segmentation tool to generate a corpus. Based on the LDA topic model, adopting a Gibbs sampling algorithm to perform sampling, knowing the topic of each word after the sampling is converged, and further obtaining the topic distribution of the corpus and the word distribution of each topic through statistical calculation to finally obtain the LDA model based on the corpus.
The flow log generation module: and extracting keywords from the participled transaction log in the target service field by adopting a proposed LDA-based S-LDA keyword extraction algorithm, and establishing a keyword library. And generating a flow log file of the target business field system according to the initial transaction log and the keyword library of each business field system.
The flow log monitoring module: when the financial transaction enters the product system, the running log containing the keyword (transaction parameter field name) is obtained through the running log generation module. And monitoring the generated flow log and monitoring the running state of the system.
The embodiment of the invention obtains the section transaction logs of each service field system by using the section technology without compiling log generation codes according to the specific conditions of the service field systems, thereby improving the expandability of generating the flow logs. The provided LDA-based S-LDA keyword extraction algorithm is used for analyzing the section transaction log of the system, so that the labor cost is reduced, and the accuracy of monitoring the flow log is improved.
Fig. 5 is a schematic diagram of a computer device for executing a log generation monitoring method implemented by the present invention, and as shown in fig. 5, an embodiment of the present invention further provides a computer device 500, which includes a memory 510, a processor 520, and a computer program 530 stored in the memory and executable on the processor, and when the processor executes the computer program, the log generation monitoring method is implemented.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the log generation monitoring method is implemented.
An embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program, and when the computer program is executed by a processor, the log generation monitoring method is implemented.
The embodiment of the invention also provides a log generation monitoring device, which is described in the following embodiment. Because the principle of the device for solving the problems is similar to a log generation monitoring method, the implementation of the device can refer to the implementation of the log generation monitoring method, and repeated parts are not repeated.
Fig. 6 is a schematic diagram of a log generation monitoring apparatus according to an embodiment of the present invention, and as shown in fig. 6, an embodiment of the present invention further provides a log generation monitoring apparatus, which may include:
an initial transaction log data determining module 601, configured to determine initial transaction log data by using a tangent plane to act on a transaction interface of a business field system;
a corpus generation module 602, configured to perform word segmentation processing on the initial transaction log data to generate a corpus;
the language database-based LDA model training module 603 is used for sampling a language database and training to obtain a language database-based LDA model;
the journal file generation module 604 is configured to perform keyword extraction on the transaction logs in the target service field according to an LDA model based on a corpus, establish a keyword library, and generate a journal file of the target service field system in combination with initial transaction log data;
and a running log file monitoring module 605, configured to monitor a running log file of the target service domain system.
In an embodiment of the invention, when the log generation monitoring apparatus provided in the embodiment of the present invention is implemented specifically, the initial transaction log data determining module is specifically configured to:
creating a tangent plane class by using a tangent plane algorithm, acting on transaction interfaces of all service field systems, and acquiring transaction parameter details;
and generating initial transaction log data according to the transaction parameter details.
In an embodiment of the present invention, when the log generation monitoring apparatus provided in the embodiment of the present invention is implemented specifically, the corpus generating module is specifically configured to:
acquiring initial transaction log data within a set time length;
extracting the name of a transaction parameter field from initial transaction log data within a set time length;
and performing English word segmentation operation on the extracted transaction parameter field names by using a word segmentation tool to generate a corpus.
In an embodiment of the present invention, when the log generation monitoring apparatus provided in the embodiment of the present invention is implemented specifically, the LDA model training module based on the corpus is configured to:
sampling the corpus by adopting a Gibbs sampling algorithm, and determining the theme of each word after sampling convergence;
and training to obtain an LDA model based on a corpus according to the theme of each word.
In an embodiment of the present invention, when the log generation monitoring apparatus provided in the embodiment of the present invention is implemented specifically, the LDA model training module based on the corpus is further configured to:
determining the number of topics and the super-parameter vector;
randomly assigning a topic number to each word of each corpus in the corpus;
rescanning the corpus, and for each word, resampling and updating the topic number of the word by using a Gibbs sampling formula, and updating the number of the word in the corpus;
and repeatedly executing sampling updating until the sampling is converged, and determining the theme of each word in the corpus.
In an embodiment of the present invention, when the log generation monitoring apparatus provided in the embodiment of the present invention is implemented specifically, the LDA model training module based on the corpus is further configured to:
counting the topic number of each word in each corpus in a corpus to obtain a document-topic distribution parameter;
counting the distribution of each topic-word in the corpus to obtain topic-word distribution parameters of the LDA model;
determining the topic distribution of the documents in the corpus according to the document-topic distribution parameters;
determining word distribution of each topic in a corpus according to topic-word distribution parameters of the LDA model;
and training the LDA theme model according to the theme distribution of the documents in the corpus and the word distribution of each theme in the corpus to obtain the LDA model based on the corpus.
In an embodiment of the present invention, when the log generation monitoring apparatus provided in the embodiment of the present invention is implemented specifically, the running log file generation module is specifically configured to:
establishing an LDA-based S-LDA keyword extraction algorithm according to an LDA model based on a corpus;
extracting keywords from the transaction logs in the target service field according to an LDA-based S-LDA keyword extraction algorithm to establish a keyword library;
and generating a flow log file of the target business field system according to the keyword library and the initial transaction log data.
In a specific implementation of the log generation monitoring apparatus provided in the embodiment of the present invention, in an embodiment, the running log file generation module is further configured to:
constructing a training corpus required by model training according to a given text file set, and training the training corpus by using an LDA (latent dirichlet allocation) model based on the corpus to obtain an LDA model;
constructing a target corpus according to the target text file, and predicting to obtain the theme distribution of the target corpus by adopting a Gibbs sampling algorithm;
filtering the theme of the target text file to obtain a filtered theme set;
constructing word selection weights of the topics corresponding to the proportion of the topics of the topic set in the topic distribution of the target text file, sequentially selecting a set number of words from each topic according to the probability sequence of the topic distribution from large to small, and maintaining the appearance sequence of the selected words to construct a keyword candidate word set;
and filtering the keyword candidate word set to determine the keywords of the target text file.
In a specific implementation of the log generation monitoring apparatus provided in the embodiment of the present invention, in an embodiment, the running log file generation module is further configured to:
setting a first auxiliary vector;
calculating the similarity between the word distribution of each topic in the target text file and the first auxiliary vector, and determining a first JS divergence value;
and when the first JS divergence value is smaller than a first set divergence threshold value, deleting the current theme from the theme distribution of the target text file to obtain a filtered theme set.
In a specific implementation of the log generation monitoring apparatus provided in the embodiment of the present invention, in an embodiment, the running log file generation module is further configured to:
setting a second auxiliary vector;
calculating the similarity between the topic distribution of each candidate word in the keyword candidate word set and the second auxiliary vector, and determining a second JS divergence value;
when the second JS divergence value is smaller than a second set divergence threshold value, deleting the current candidate word from the keyword candidate word set, and selecting a candidate word with the part of speech being a noun or a verb and the name S before ranking from the rest keyword candidate word set as a keyword of the target text file; and the candidate words with the names S before ranking are obtained by sorting the candidate words from large to small according to the occurrence frequency of the candidate words and taking the S candidate words before sorting.
In a specific implementation of the log generation monitoring apparatus provided in the embodiment of the present invention, in an embodiment, the running log file generation module is further configured to:
performing word segmentation processing on the transaction log of the target service field;
and adopting an LDA-based S-LDA keyword extraction algorithm to extract keywords from the transaction logs of the target service field after the word segmentation processing, and establishing a keyword library.
In a specific implementation of the log generation monitoring apparatus provided in the embodiment of the present invention, in an embodiment, the running log file generation module is further configured to:
and sequentially taking each keyword in the keyword library, inquiring a value corresponding to the keyword from the initial transaction log data, and generating a flow log file of the target business field system.
In a specific implementation of the log generation monitoring apparatus provided in an embodiment of the present invention, in an embodiment, the running log file monitoring module is specifically configured to:
and monitoring the generated running log file of the target service field system according to the keywords, and acquiring the running state information of the target service field system in real time.
To sum up, the log generation monitoring method and apparatus provided by the embodiment of the present invention include: using a tangent plane to act on a transaction interface of a business field system, and determining initial transaction log data; performing word segmentation processing on the initial transaction log data to generate a corpus; sampling a corpus, and training to obtain an LDA model based on the corpus; extracting keywords from the transaction logs in the target service field according to an LDA model based on a corpus, establishing a keyword library, and generating a journal file of a target service field system by combining initial transaction log data; and monitoring the flow log file of the target business field system. According to the method, the initial transaction log data of each service field system is obtained by using the tangent plane technology, the log generation code does not need to be written according to the specific situation of the service field system, and the expandability of the generation of the flow log is improved; initial transaction log data acquired by systems in different business fields are trained to generate an LDA model based on a corpus, and a journal standard of each business field is established without manually analyzing and determining the journal standard of each business field. Meanwhile, based on the LDA model based on the corpus obtained after training, a new transaction log of the target business field system is subjected to keyword extraction by adopting an LDA-based S-LDA keyword extraction algorithm, a keyword library is established, and a journal file of the target business field system is automatically generated, so that the labor cost is effectively reduced, the journal file of the target business field system is monitored, the running state of the system can be obtained in real time, the monitoring accuracy of the journal is improved, and the running safety of the system is guaranteed.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (29)

1. A log generation monitoring method is characterized by comprising the following steps:
using a tangent plane to act on a transaction interface of a business field system, and determining initial transaction log data;
performing word segmentation processing on the initial transaction log data to generate a corpus;
sampling a corpus, and training to obtain an LDA model based on the corpus;
extracting keywords from the transaction logs in the target service field according to an LDA model based on a corpus, establishing a keyword library, and generating a journal file of a target service field system by combining initial transaction log data;
and monitoring the flow log file of the target business field system.
2. The method of claim 1, wherein determining initial transaction log data using the facets for a transaction interface of the business system comprises:
creating a tangent plane class by using a tangent plane algorithm, acting on transaction interfaces of all service field systems, and acquiring transaction parameter details;
and generating initial transaction log data according to the transaction parameter details.
3. The method of claim 1, wherein performing a tokenization process on the initial transaction log data to generate a corpus comprises:
acquiring initial transaction log data within a set duration;
extracting the name of a transaction parameter field from initial transaction log data within a set time length;
and performing English word segmentation operation on the extracted transaction parameter field names by using a word segmentation tool to generate a corpus.
4. The method of claim 1, wherein the corpus is sampled and the training to obtain a corpus-based LDA model comprises:
sampling the corpus by adopting a Gibbs sampling algorithm, and determining the theme of each word after sampling convergence;
and training to obtain an LDA model based on a corpus according to the theme of each word.
5. The method of claim 4, wherein the corpus is sampled using a Gibbs sampling algorithm, and determining the topic of each word after the sampling converges comprises:
determining the number of topics and the super-parameter vector;
randomly assigning a topic number to each word of each corpus in the corpus;
rescanning the corpus, and for each word, resampling and updating the topic number of the word by using a Gibbs sampling formula, and updating the number of the word in the corpus;
and repeatedly executing sampling updating until the sampling is converged, and determining the theme of each word in the corpus.
6. The method of claim 5, wherein training a corpus-based LDA model based on the topic of each word comprises:
counting the topic number of each word in each corpus in a corpus to obtain a document-topic distribution parameter;
counting the distribution of each topic-word in the corpus to obtain topic-word distribution parameters of the LDA model;
determining the topic distribution of the documents in the corpus according to the document-topic distribution parameters;
determining word distribution of each topic in a corpus according to topic-word distribution parameters of the LDA model;
and training the LDA topic model according to the topic distribution of the documents in the corpus and the word distribution of each topic in the corpus to obtain the LDA model based on the corpus.
7. The method of claim 1, wherein extracting keywords from the target business segment transaction log according to a corpus-based LDA model, creating a keyword library, and generating a journal file of the target business segment system in combination with initial transaction log data, comprises:
establishing an LDA-based S-LDA keyword extraction algorithm according to an LDA model based on a corpus;
extracting keywords from the transaction logs in the target service field according to an LDA-based S-LDA keyword extraction algorithm to establish a keyword library;
and generating a flow log file of the target business field system according to the keyword library and the initial transaction log data.
8. The method of claim 7, wherein building an LDA-based S-LDA keyword extraction algorithm based on a corpus-based LDA model comprises:
constructing a training corpus required by model training according to a given text file set, and training the training corpus by using an LDA (latent dirichlet allocation) model based on the corpus to obtain an LDA model;
constructing a target corpus according to the target text file, and predicting to obtain the theme distribution of the target corpus by adopting a Gibbs sampling algorithm;
filtering the theme of the target text file to obtain a filtered theme set;
constructing word selection weights of the topics corresponding to the proportion of the topics of the topic set in the topic distribution of the target text file, sequentially selecting a set number of words from each topic according to the probability sequence of the topic distribution from large to small, and maintaining the appearance sequence of the selected words to construct a keyword candidate word set;
and filtering the keyword candidate word set to determine the keywords of the target text file.
9. The method of claim 8, wherein filtering the subject matter of the target text file to obtain a filtered set of subject matter comprises:
setting a first auxiliary vector;
calculating the similarity between the word distribution of each topic in the target text file and the first auxiliary vector, and determining a first JS divergence value;
and when the first JS divergence value is smaller than a first set divergence threshold value, deleting the current theme from the theme distribution of the target text file to obtain a filtered theme set.
10. The method of claim 8, wherein filtering the set of keyword candidate words to determine keywords of the target text document comprises:
setting a second auxiliary vector;
calculating the similarity between the topic distribution of each candidate word in the keyword candidate word set and the second auxiliary vector, and determining a second JS divergence value;
when the second JS divergence value is smaller than a second set divergence threshold value, deleting the current candidate word from the keyword candidate word set, and selecting a candidate word with the part of speech being a noun or a verb and the name S before ranking from the rest keyword candidate word set as a keyword of the target text file; and the candidate words with the names S before the ranking are ranked from big to small according to the occurrence frequency of the candidate words, and the candidate words with the names S before the ranking are obtained.
11. The method of claim 7, wherein keyword extraction is performed on the target business segment transaction log according to an LDA-based S-LDA keyword extraction algorithm to create a keyword library, comprising:
performing word segmentation processing on the transaction log of the target service field;
and adopting an LDA-based S-LDA keyword extraction algorithm to extract keywords from the transaction logs of the target service field after the word segmentation processing, and establishing a keyword library.
12. The method of claim 7, wherein generating a journal file for the target business domain system based on the keyword library and the initial transaction log data comprises:
and sequentially taking each keyword in the keyword library, inquiring a value corresponding to the keyword from the initial transaction log data, and generating a flow log file of the target business field system.
13. The method of claim 1, wherein monitoring the journal file of the target business domain system comprises:
and monitoring the generated running log file of the target service field system according to the keywords, and acquiring the running state information of the target service field system in real time.
14. A log generation monitoring apparatus, comprising:
the initial transaction log data determining module is used for determining initial transaction log data by using a transaction interface acted on a service field system by using a tangent plane;
the corpus generation module is used for performing word segmentation processing on the initial transaction log data to generate a corpus;
the LDA model training module based on the corpus is used for sampling the corpus and training to obtain an LDA model based on the corpus;
the flow log file generation module is used for extracting keywords from the transaction logs in the target service field according to an LDA (latent dirichlet allocation) model based on a corpus, establishing a keyword library and generating a flow log file of a target service field system by combining initial transaction log data;
and the running log file monitoring module is used for monitoring the running log files of the target business field system.
15. The apparatus of claim 14, wherein the initial transaction log data determination module is specifically configured to:
creating a tangent plane class by using a tangent plane algorithm, acting on transaction interfaces of all service field systems, and acquiring transaction parameter details;
and generating initial transaction log data according to the transaction parameter details.
16. The apparatus of claim 14, wherein the corpus generation module is specifically configured to:
acquiring initial transaction log data within a set duration;
extracting the name of a transaction parameter field from initial transaction log data within a set time length;
and performing English word segmentation operation on the extracted transaction parameter field names by using a word segmentation tool to generate a corpus.
17. The apparatus of claim 14, wherein a corpus-based LDA model training module to:
sampling the corpus by adopting a Gibbs sampling algorithm, and determining the theme of each word after sampling convergence;
and training to obtain an LDA model based on a corpus according to the theme of each word.
18. The apparatus of claim 17, wherein the corpus-based LDA model training module is further to:
determining the number of topics and the super-parameter vector;
randomly assigning a topic number to each word of each corpus in the corpus;
rescanning the corpus, and for each word, resampling and updating the topic number of the word by using a Gibbs sampling formula, and updating the number of the word in the corpus;
and repeatedly executing sampling updating until the sampling is converged, and determining the theme of each word in the corpus.
19. The apparatus of claim 18, wherein the corpus-based LDA model training module is further to:
counting the topic number of each word in each corpus in a corpus to obtain a document-topic distribution parameter;
counting the distribution of each topic-word in the corpus to obtain topic-word distribution parameters of the LDA model;
determining the topic distribution of the documents in the corpus according to the document-topic distribution parameters;
determining word distribution of each topic in a corpus according to topic-word distribution parameters of the LDA model;
and training the LDA topic model according to the topic distribution of the documents in the corpus and the word distribution of each topic in the corpus to obtain the LDA model based on the corpus.
20. The apparatus of claim 14, wherein the journal file generation module is specifically configured to:
establishing an LDA-based S-LDA keyword extraction algorithm according to an LDA model based on a corpus;
extracting keywords from the transaction logs in the target service field according to an LDA-based S-LDA keyword extraction algorithm to establish a keyword library;
and generating a flow log file of the target business field system according to the keyword library and the initial transaction log data.
21. The apparatus of claim 20, wherein the journal file generation module is further to:
constructing a training corpus required by model training according to a given text file set, and training the training corpus by using an LDA (latent dirichlet allocation) model based on the corpus to obtain an LDA model;
constructing a target corpus according to the target text file, and predicting to obtain the theme distribution of the target corpus by adopting a Gibbs sampling algorithm;
filtering the theme of the target text file to obtain a filtered theme set;
constructing word selection weights of the topics corresponding to the proportion of the topics of the topic set in the topic distribution of the target text file, sequentially selecting a set number of words from each topic according to the probability sequence of the topic distribution from large to small, and maintaining the appearance sequence of the selected words to construct a keyword candidate word set;
and filtering the keyword candidate word set to determine the keywords of the target text file.
22. The apparatus of claim 21, wherein the pipelined log file generation module is further to:
setting a first auxiliary vector;
calculating the similarity between the word distribution of each topic in the target text file and the first auxiliary vector, and determining a first JS divergence value;
and when the first JS divergence value is smaller than a first set divergence threshold value, deleting the current theme from the theme distribution of the target text file to obtain a filtered theme set.
23. The apparatus of claim 21, wherein the pipelined log file generation module is further to:
setting a second auxiliary vector;
calculating the similarity between the topic distribution of each candidate word in the keyword candidate word set and the second auxiliary vector, and determining a second JS divergence value;
when the second JS divergence value is smaller than a second set divergence threshold value, deleting the current candidate word from the keyword candidate word set, and selecting a candidate word with the part of speech being a noun or a verb and the name S before ranking from the rest keyword candidate word set as a keyword of the target text file; and the candidate words with the names S before the ranking are ranked from big to small according to the occurrence frequency of the candidate words, and the candidate words with the names S before the ranking are obtained.
24. The apparatus of claim 20, wherein the journal file generation module is further to:
performing word segmentation processing on the transaction log of the target service field;
and adopting an LDA-based S-LDA keyword extraction algorithm to extract keywords from the transaction logs of the target service field after the word segmentation processing, and establishing a keyword library.
25. The apparatus of claim 20, wherein the journal file generation module is further to:
and sequentially taking each keyword in the keyword library, inquiring a value corresponding to the keyword from the initial transaction log data, and generating a flow log file of the target business field system.
26. The apparatus of claim 14, wherein the pipelined log file monitoring module is specifically configured to:
and aiming at the generated flow log file of the target business field system, monitoring according to the keywords, and acquiring the running state information of the target business field system in real time.
27. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 13 when executing the computer program.
28. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 13.
29. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, carries out the method of any one of claims 1 to 13.
CN202210252421.8A 2022-03-15 2022-03-15 Log generation monitoring method and device Pending CN114610576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210252421.8A CN114610576A (en) 2022-03-15 2022-03-15 Log generation monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210252421.8A CN114610576A (en) 2022-03-15 2022-03-15 Log generation monitoring method and device

Publications (1)

Publication Number Publication Date
CN114610576A true CN114610576A (en) 2022-06-10

Family

ID=81862985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210252421.8A Pending CN114610576A (en) 2022-03-15 2022-03-15 Log generation monitoring method and device

Country Status (1)

Country Link
CN (1) CN114610576A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069595A (en) * 2023-04-06 2023-05-05 华能信息技术有限公司 Operation and maintenance monitoring method based on log

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069595A (en) * 2023-04-06 2023-05-05 华能信息技术有限公司 Operation and maintenance monitoring method based on log
CN116069595B (en) * 2023-04-06 2023-06-09 华能信息技术有限公司 Operation and maintenance monitoring method based on log

Similar Documents

Publication Publication Date Title
CA3088692C (en) Visualizing comment sentiment
EP3819785A1 (en) Feature word determining method, apparatus, and server
CN108875059B (en) Method and device for generating document tag, electronic equipment and storage medium
CN113011156A (en) Quality inspection method, device and medium for audit text and electronic equipment
CN111178701B (en) Risk control method and device based on feature derivation technology and electronic equipment
CN110457707B (en) Method and device for extracting real word keywords, electronic equipment and readable storage medium
CN114676346A (en) News event processing method and device, computer equipment and storage medium
CN114610576A (en) Log generation monitoring method and device
CN113780418A (en) Data screening method, system, equipment and storage medium
CN118113806A (en) Interpretable event context generation method for large model retrieval enhancement generation
CN111104422B (en) Training method, device, equipment and storage medium of data recommendation model
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN116841869A (en) Java code examination comment generation method and device based on code structured information and examination knowledge
CN116186599A (en) Multi-label classification method and system for act text based on comparison learning and graph learning
CN115510847A (en) Code workload analysis method and device
Jain et al. An extensible parsing pipeline for unstructured data processing
CN114969347A (en) Defect duplication checking implementation method and device, terminal equipment and storage medium
CN113901793A (en) Event extraction method and device combining RPA and AI
Li Feature and variability extraction from natural language software requirements specifications
CN114117047A (en) Method and system for classifying illegal voice based on C4.5 algorithm
CN112632229A (en) Text clustering method and device
CN112667666A (en) SQL operation time prediction method and system based on N-gram
Lee et al. Automatic stop word generation for mining software artifact using topic model with pointwise mutual information
CN111538898A (en) Web service package recommendation method and system based on combined feature extraction
CN111079448A (en) Intention identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination