CN114036120A - Real-time analysis method and system based on mass log data - Google Patents

Real-time analysis method and system based on mass log data Download PDF

Info

Publication number
CN114036120A
CN114036120A CN202111298565.9A CN202111298565A CN114036120A CN 114036120 A CN114036120 A CN 114036120A CN 202111298565 A CN202111298565 A CN 202111298565A CN 114036120 A CN114036120 A CN 114036120A
Authority
CN
China
Prior art keywords
data
unit
log data
log
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111298565.9A
Other languages
Chinese (zh)
Inventor
王宜才
丁正
顾晓东
祝敬安
韦红
刘志永
卢亚洲
高树江
邢喜云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinfang Software Co ltd
Shanghai Cintel Intelligent System Co ltd
Original Assignee
Shanghai Xinfang Software Co ltd
Shanghai Cintel Intelligent System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinfang Software Co ltd, Shanghai Cintel Intelligent System Co ltd filed Critical Shanghai Xinfang Software Co ltd
Priority to CN202111298565.9A priority Critical patent/CN114036120A/en
Publication of CN114036120A publication Critical patent/CN114036120A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing

Abstract

The application discloses a real-time analysis method based on massive log data, which comprises the steps of acquiring input query conditions, combining the query conditions into keywords, searching according to the keywords based on stored big data, and analyzing a searched result set according to a set analysis strategy to obtain an analysis result; wherein the big data is obtained as follows: recording data associated with the service in each service process of each service point, generating log data of each service point, acquiring the log data of each service point in real time, caching, filtering and structuring the cached log data to obtain big data for persistent storage, and storing the big data. The method and the device can meet the requirement of real-time analysis of the call logs by an operator, thereby reducing the time of log analysis to the maximum extent and quickly positioning the problem.

Description

Real-time analysis method and system based on mass log data
Technical Field
The invention relates to the field of communication, in particular to a real-time analysis method based on mass log data.
Background
With the development of communication technology, more and more users communicate and exchange by means of communication technologies such as mobile phones, fixed phones, networks and the like, and call of various devices such as voice, video, audio and the like is involved in the use process of the users.
The user hopes to have a safe and stable conversation environment, and the occurrence of abnormal conversation is reduced and eliminated; operators and the like also hope to be able to forego abnormal calls from the source through a technical approach, provide a safe and stable call environment for the public, and be able to analyze and summarize the use habits of users to improve the user experience.
At present, the storage and analysis technology for the service log mainly focuses on the following technologies:
1. implementation by text file
Saving a text file on each service point, wherein the text file comprises time, service name, machine name, automatic machine number, calling number, called number, original called, message type and content; when an analysis requirement exists, the text file of each service point is scanned, required information is filtered out, and then analysis and arrangement are carried out.
2. Implementation through relational databases
Collecting the service logs on each service point to a relational database in real time, and establishing indexes for the data according to requirements; when analysis is needed, data meeting conditions of time periods, calling numbers and the like are scanned and inquired in a full amount, and needed information is obtained from the data.
3. Implementation based on combination of relational database and text file
Classifying and summarizing data transmitted by each service point according to a calling number, a called number, a service name and an automatic machine number, generating a unique task Identifier (ID) for each call, storing different data of each call in different data files respectively, and storing the corresponding relation between the calling number, the called number, the service name, the automatic machine number, the task ID and the data files in a database; when the method is used, the task ID is found according to the query condition, the data file list is obtained, all files are read, data are summarized, and data analysis and display are carried out.
The service points are telecommunication devices and application software used in the communication process.
The above methods all have some disadvantages, which are as follows:
for the method realized by text files, the method needs to read and query files one by one, and has high hysteresis for data analysis.
For the method realized by the relational database, all data of the method exist in the relational database, and for the increasing mass logs, such as hundreds of millions of logs, the influence on the response of the index establishment and the analysis query is slow.
For the method for combining the relational database and the text file, the method carries out data preprocessing for each piece of data based on classification and summarization of a calling number, a called number, a service name and an automaton number, and a data preprocessing node can form a bottleneck.
Meanwhile, the common defects of all the above methods are as follows:
1. data storage management is inconvenient. Whether in a file form or stored in a database, after accumulating for a period of time, the massive logs increase the difficulty of storage and backup, and improper processing can cause data loss.
2. Under the condition of massive logs, the data query response is seriously lagged, and the basic requirement of analyzing the data real-time property cannot be met.
Disclosure of Invention
The invention provides a real-time analysis method based on mass log data, which is used for reducing the analysis time of logs.
The real-time analysis method based on mass log data provided by the invention is realized as follows:
acquiring input query conditions, combining the query conditions into keywords,
based on the stored big data, searching is carried out according to the keywords,
analyzing the searched result set according to a set analysis strategy to obtain an analysis result;
wherein the content of the first and second substances,
the big data is obtained as follows:
recording the data associated with the service in each service process of each service point to generate the log data of each service point,
collecting log data of each service point in real time, caching,
filtering and structuring the cached log data to obtain big data for persistent storage,
and storing the big data.
Preferably, the big data is stored in a distributed cluster manner, wherein hot data is stored in a first distributed storage device, all data is stored in a second distributed storage device as cold data, and after the cold data is accessed, the cold data is transferred to the first distributed storage device as hot data, the performance of the first distributed storage device is higher than that of the second distributed storage device,
based on the stored big data, a search is performed according to the keywords, including,
the hot data is preferentially searched for,
in the case that the hot data cannot be searched, the cold data is searched again.
Optionally, the method further includes performing a graphical display or a tabular display on the analysis result, where the graphical display is a time-sequenced two-dimensional flow chart, and each flow displays detailed information of the flow.
Optionally, the collecting log data of each service point in real time and caching, including,
collecting log data of each service point from each service point,
according to the type of the log data, caching the log data in corresponding queues in a distributed cluster mode, managing the cached data in a mode of a producer and a consumer, and deleting the cached data after the cached data is used by all consumers.
The invention also provides a real-time analysis system based on mass log data, which comprises,
a log data generating device for recording the data associated with the service in each service process of each service point to generate the log data of each service point,
a log data acquisition device for acquiring log data of each service point in real time,
the big data caching device is used for caching the collected log data of each service point,
the data filtering and confirming device is used for filtering and structuring the cached log data to obtain big data for persistent storage,
a big data storage device for storing big data,
and the log data analysis device is used for acquiring input query conditions, combining the query conditions into keywords, searching according to the keywords based on the stored big data, and analyzing the searched result set according to a set analysis strategy to obtain an analysis result.
Preferably, the system further comprises a control unit,
and the analysis result display device is used for carrying out graphic display or table display on the analysis result, wherein the graphic display is a time-sequenced two-dimensional flow chart, and each step of the flow displays the detailed information of the flow.
Optionally, the log data generating means comprises,
the log data sorting unit is used for interacting with each service point according to a standard protocol, sorting the data provided by each service point and generating log data;
the log data file unit is used for performing file processing on the log data generated by the log data sorting unit to form a file suitable for reading;
the log data acquisition device comprises a log data acquisition device,
the log data collection unit is used for tracking the log file and providing the event data for the log data aggregation unit to use;
the log data gathering unit is used for processing the data collected by the log data collecting unit and transmitting the processed data to the first log data output unit at least once, and under the condition that the first log data output unit is blocked and all transmitted events are not confirmed, continuously trying to send data to the first log data output unit until the log data output unit outputs and confirms the received events;
and the first log data output unit is used for interacting with the big data cache device, acquiring data from the log data aggregation unit, transmitting the acquired data to the big data cache device, and returning the acquired data to the log data aggregation unit when receiving a confirmation event of the big data cache device.
Optionally, the big data caching apparatus includes,
the producer unit is used for interacting with the first log data output unit, acquiring log data and issuing the data to the corresponding data queue unit according to types;
the data queue unit is used for receiving and storing the data sent by the producer unit, interacting with the consumer unit, reading the data from the queue and providing the data to the consumer unit;
the consumer unit is used for interacting with the data queue unit, reading data from the queue and transmitting the data to the data filtering and confirming device;
the data filtering and confirming device comprises a data filtering and confirming device,
the log data input unit is used for interacting with the big data caching device and acquiring data from the big data caching device in a continuous stream transmission mode;
the log data filtering unit is used for carrying out one or any combination of processing of filtering, recombining and confirming the data and converting the universal format;
a second log data output unit for outputting the data generated by the log data filtering unit to the big data storage device;
the big data storage device comprises a big data storage device,
and the first data interface unit is used for carrying out data input storage and query output on the big data.
And the search analysis unit is used for acquiring corresponding data from the data storage unit according to the data query condition provided by the first data interface unit.
And the data storage unit is used for storing the log data.
Optionally, the log data analysis means comprises,
the second data interface unit is used for interacting with the big data storage device, providing a uniform reading interface for inquiring data in the big data storage device and acquiring data required by the calculation of the data analysis unit;
the data analysis unit is used for interacting with the second data interface unit and the query interface unit and providing real-time calculation and mining calculation capacity;
the query interface unit is used for interacting with the analysis result display device, analyzing the query conditions to form a proper query statement, submitting the query statement to the data analysis unit for query, sorting the query result into a result set with a specified format, and returning the result set to the analysis result display device;
the analysis result display device comprises a display device,
the query control unit is used for starting and stopping a log data query task, recording query conditions and detecting normalization,
the query request unit is used for receiving a query request of the query control unit, interacting with the log data analysis device and acquiring data required by the calculation of the data processing unit; and interacting with the data processing unit, and handing the inquired data to the data processing unit for processing.
The data processing unit is used for converting and/or processing the data from the query request unit according to the requirements of the query conditions;
and the result display unit is used for displaying the log data finished by the data processing unit in a graphical or tabular form.
Optionally, the big data caching device is a distributed cluster caching device, the device further comprises,
the first cluster management unit is used for cluster management of the producer unit, the data queue unit and the consumer unit;
the big data storage device is a distributed cluster storage device, the device further comprises,
and the second cluster management unit is used for managing each node and providing a search analysis function on all the nodes.
According to the method and the device, the log data generated in each service point are collected into the big data to be stored, searching is carried out according to the query conditions based on the big data, the efficiency of searching abnormal conversation reasons from the service logs is improved, and the problems that the service logs are not stored timely and the analysis speed is low are solved. By adopting the real-time analysis method and the real-time analysis system based on the mass log data, after the operator deploys, the real-time analysis of the call log by the operator can be met, so that the time for log analysis is reduced to the maximum extent, the problem of quick positioning is solved, the damage caused by the operator and a user is avoided, and a safe call environment is really provided for a telephone user. For example, the method can effectively position and process the fault encountered in the call process of the telephone user in real time, greatly reduce the number of fault calls in the communication network, reduce the loss of the telephone user and the operator caused by telephone faults, and effectively improve the service quality of the operator.
Drawings
Fig. 1 is a schematic diagram of a composition structure of a real-time analysis system based on a mass log according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a log data generating device.
Fig. 3 is a schematic diagram of a log data acquisition device.
FIG. 4 is a diagram of a big data buffer.
Fig. 5 is a schematic diagram of a data filtering validation device.
FIG. 6 is a schematic diagram of a large data storage device.
Fig. 7 is a schematic diagram of a log data analysis apparatus.
FIG. 8 is a schematic diagram of an apparatus for analyzing results.
Fig. 9 is a schematic flow chart of the real-time analysis method based on mass logs according to the present application.
Detailed Description
For the purpose of making the objects, technical means and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a schematic diagram of a composition structure of a real-time analysis system based on a mass log according to an embodiment of the present application. The system comprises a log data generating device for outputting log data in a formatted mode, a log data collecting device for collecting the log data in real time, a big data caching device for temporarily storing the log data, a data filtering and confirming device for sorting, filtering and confirming the log data, a big data storing device for persistently storing the log data, a log data analyzing device for analyzing the log data and an analysis result display device for responding to query, wherein the big data caching device is used for temporarily storing the log data; wherein:
the log data generating device is used for realizing a log data output function at each service point, and specifically, generating a log file according to a specified format, a specified time interval and a specified path, wherein the log file records each service module used in the user communication process, the use flow of each service module, the information content and the information state transmitted among the service modules in detail.
The log data acquisition device is used for interacting with the log data generation devices positioned at all the service points, preliminarily sorting the log data acquired from the log data generation devices and generating standard unified log data, namely structured log data; and interacting with the big data cache device, and pushing the structured log data to the big data cache device for storage. In view of the fact that the log data acquisition device acquires more log data from the log data generation device of each service node in real time, for example, in a telecommunication call service, a plurality of pieces of log data are generated in one call, and as the call time is prolonged, the log data are increased, and if the data are not processed, it is very troublesome to inquire all records of one call in massive data in the future, so that when the log data are recorded, a unique identifier is generated for each call according to the time, the service, the number and the like, and the unique identifier is recorded in each row of log data. In view of the fact that the log data has various conditions that the same behavior data is stored in multiple lines, fields are empty and the like, the log data acquisition device can arrange the conditions, and the data in the same format is formed and submitted to the big data caching device.
And the big data caching device is used for temporarily storing and transferring the log data, interacting with the log data acquisition device, receiving the log data reported by the log data acquisition device, storing the log data in real time, interacting with the data filtering and confirming device, reading the log data from the device, and abandoning the storage of the read log data in the big data caching device according to the successfully read log data. The big data cache device can adopt a subscription mode, data are provided for a plurality of subscribers at the same time, and after the subscriber of each piece of data consumes the data, the data are cleared from the cache queue, so that the safety of log data is ensured, and after the data filtering and confirming device takes the log data from the cache, the data are converted according to the format of the big data storage device, and useless information is filtered.
And the data filtering and confirming device is used for interacting with the big data cache device, acquiring the log data from the big data cache device, filtering, recombining and confirming the acquired log data, and simultaneously feeding back the new log data generated after confirmation to the big data storage device.
The big data storage device is a core device of the system and is used for interacting with the data filtering and confirming device and carrying out persistent storage on the log data from the data filtering and confirming device. Meanwhile, the system interacts with the log data analysis device to allow the log data analysis device to inquire data. The big data storage device indexes and stores data in a distributed mode, for example, the unique identification of each call in log data is indexed, the data are respectively stored in a cold-hot classification mode according to time, the recent and frequently used data are stored on a first distributed storage device with high performance as hot data, all data serving as cold data can be stored on a second distributed storage device with general data storage performance, and the cold data can be stored on the first distributed storage device as hot data after being accessed, so that the real-time problem can be solved, and the cost problem caused by mass data storage is also solved.
And the log data analysis device is used for interacting with the big data storage device, acquiring real-time log data, analyzing and summarizing the log data, and sending the result set to the analysis result display device.
And the analysis result display device is used for interacting with the log data analysis device, sending a query request to the log data analysis device, acquiring a result set returned by the log data analysis device, and simultaneously carrying out visual display on the result set in a graphical or tabular form.
Referring to fig. 2, fig. 2 is a schematic diagram of a log data generating apparatus. The log data generating device is composed of a log data sorting unit and a log data file unit. Wherein:
the log data sorting unit is used for interacting with each service point according to a standard protocol, sorting the communication message data provided by each service point, and generating log data containing time, service types, automatic machine numbers, unique call codes, host numbers, called numbers, message data and other contents;
and the log data file unit is used for performing file processing on the log data generated by the log data sorting unit to form a file suitable for reading. The log data file unit supports files forming various suffixes of txt, log, and the like in time periods of minutes, hours, days, and the like. Data sources in multiple formats are also supported so that multiple services can simultaneously use log data for real-time analysis.
Referring to fig. 3, fig. 3 is a schematic diagram of a log data acquisition device. The log data acquisition device is composed of a log data collection unit, a log data aggregation unit and a first log data output unit. Wherein:
the log data collection unit is used for tracking the log file and providing the event data for the log data aggregation unit to use;
and the log data gathering unit is used for processing the data collected by the log data collection unit and transmitting the processed data to the first log data output unit at least once so as to ensure that the data are not lost. In the case where the first log data output unit is blocked and all transmitted events are not confirmed, the log data aggregation unit will continue to attempt to send data to the first log data output unit until the first log data output unit outputs a confirmation received event; by adopting a data filtering and sorting technology, the device can support the collection of files in various formats and can collect data in multiple lines.
And the first log data output unit is used for interacting with the big data caching device, acquiring data from the log data aggregation unit, transmitting the acquired data to the big data caching device, and returning the acquired data to the log data aggregation unit when receiving a confirmation event of the big data caching device so as to finish one-time data transmission.
Referring to fig. 4, fig. 4 is a schematic diagram of a big data caching apparatus. The big data caching device is composed of a producer unit, a data queue unit, a consumer unit and a first cluster management unit. Wherein:
the producer unit is used for interacting with the log data output unit of the first log data acquisition device, acquiring log data and issuing the data to the data queue unit; according to different types of the log data, the log data can be issued to corresponding data queue units, for example, the log data of different services are classified by adopting different keywords to form independent transmission and storage.
The data queue unit is used for receiving and storing the data sent by the producer unit, interacting with the consumer unit and reading the data from the queue, and the same data can be used by a plurality of consumer units;
the consumer unit is used for interacting with the data queue unit, reading data from the queue and transmitting the data to the data filtering and confirming device;
the first cluster management unit is used for cluster management of the producer unit, the data queue unit and the consumer unit.
In this way, the big data caching device stores and manages the log data in a mode of a producer and a consumer, the log data acquisition device serves as the producer to store the log data into the big data caching device, and the data filtering and confirming device serves as the consumer to consume the data from the big data caching device.
Referring to fig. 5, fig. 5 is a schematic diagram of a data filtering verification device. The data filtering and confirming device is the final arrangement before log data persistence storage, acquires log data from a consumer unit of the big data caching device, arranges, filters and confirms the log data according to a set format, and then transmits the log data to the big data storage device coursing persistence storage. The data filtering and confirming device consists of a log data input unit, a log data filtering unit and a second log data output unit. Wherein the content of the first and second substances,
and the log data input unit is used for interacting with the big data caching device and acquiring various data from the big data caching device in a continuous streaming mode.
The log data filtering unit is used for data processing and conversion, analyzing each event, identifying named fields to construct a structure, and converting the named fields into a universal format.
And the second log data output unit is used for outputting the log data generated by the log data filtering unit to the big data storage device.
Referring to FIG. 6, FIG. 6 is a schematic diagram of a large data storage device. The big data storage device persists log data and provides functions for searching and analyzing. The big data storage device is composed of a first data interface unit, a search analysis unit and a data storage unit. Wherein:
and the first data interface unit is used for carrying out input storage and query output on data of the large data storage device.
And the search analysis unit is used for acquiring corresponding data from the data storage unit according to the data query condition provided by the first data interface unit.
The data storage unit is used for storing the log data; preferably, through the classified collection of the data, when the data is put into storage for persistence, the same group of data is set with a globally unique label, so that the query speed is greatly improved.
And a second cluster management unit for managing the respective nodes and providing a search analysis function on all the nodes in the case of a cluster in which the data storage device is composed of a plurality of nodes.
Referring to fig. 7, fig. 7 is a schematic diagram of a log data analysis apparatus. The device is used for arranging data query conditions, searching data in the big data storage device under the optimal conditions, and processing search results as required. The log data analysis device is composed of a second data interface unit, a data analysis unit and an inquiry interface unit. Wherein:
and the second data interface unit is used for interacting with the big data storage device, providing a uniform reading interface for inquiring data in the big data storage device and acquiring data required by the calculation of the data analysis unit.
And the data analysis unit is used for interacting with the second data interface unit and the query interface unit and providing real-time calculation and mining calculation capacity.
And the query interface unit is used for interacting with the analysis result display device, analyzing the query conditions to form a proper query statement, submitting the query statement to the data analysis unit for query, arranging the query result into a result set with a specified format, and returning the result set to the analysis result display device.
Referring to fig. 8, fig. 8 is a schematic diagram of an analysis result device. The analysis result device is a control center of the system and is responsible for initiating and stopping the query process and displaying the query result, and provides graphic and tabular display effects for the query result. The analysis result display device comprises a query control unit, a query request unit, a data processing unit, a result display unit and the like. Wherein:
and the query control unit is used for starting and stopping the log data query task, inputting query conditions and detecting normalization.
The query request unit is used for receiving a query request of the query control unit, interacting with the log data analysis device and acquiring data required by the calculation of the data processing unit; and interacting with the data processing unit, and delivering the inquired data to the data processing unit for processing.
And the data processing unit is used for converting and processing the data from the query request unit according to the requirements of the query conditions.
And the result display unit is used for displaying the log data finished by the data processing unit in the forms of graphics or tables and the like.
The above system realizes the following functions.
1. Log data real-time acquisition function
And transmitting all or part of log data generated in the communication process to the system for communication condition analysis and processing according to analysis requirements of an operator.
The log data acquisition device is used for carrying out preliminary arrangement on communication information in real time according to a preset strategy, filtering and removing invalid information, and pushing the log data to the big data cache device for storage.
2. Data caching function
For parallel mass data from a log data acquisition device, a precaution measure for data loss caused by untimely data processing is needed, so that a data caching device is additionally arranged before data processing, is a distributed and subscribed message queue system, and has the functions of quick persistence, high throughput, load balancing and the like.
3. Data conversion filtering function
Before the log data is put in a warehouse for persistent storage, the data is processed and converted to ensure that the data is in an optimal format according with query; the data conversion filtering function can add fields, remove fields, segment data through regular expressions, and can also perform different data processing modes according to condition judgment.
4. Log data real-time analysis function
The log data are collected and transmitted by the log data collecting device immediately after being generated, and are finally stored in the big data storage device, so that data required by query can be obtained from the big data storage device in real time, analysis is carried out according to a set analysis strategy, for example, multi-dimensional identification analysis is carried out according to a calling number, a called number, a designated time period, a service type and the like, and an analysis result is provided for the analysis result display device.
5. Analysis result display function
After the analysis result is obtained, the analysis result display device can perform graphical or tabular display according to a selected mode, wherein the graphical display is a time-sequenced two-dimensional flow chart, and each step of the flow can display detailed information of the flow.
The embodiment collects log data on line in real time, and analyzes the log by adopting a big data technology, so that the analysis efficiency is high; by adopting a cache mechanism, when the concurrency of the collected log data exceeds the warehousing quantity, the log can be cached to prevent the log from being lost; the data which is stored in a warehouse and is subjected to persistence is classified into hot data and cold data, so that the data volume of each inquiry scanning is reduced, and the inquiry speed is improved.
Referring to fig. 9, fig. 9 is a schematic flow chart of the real-time analysis method based on mass logs according to the present application. The method comprises the steps of (1) carrying out,
step 901, making standardized check on the input query condition, after the check is passed, executing step 902, otherwise, outputting a prompt,
the query criteria are combined into keywords according to rules for unique identification of log data, e.g., different identifications for classification, a search is performed based on data stored by the big data storage device 902,
wherein the hot data has a higher search priority, i.e., the hot data is searched first,
and if the hot data cannot be searched, then the cold data is removed for searching.
If the final search result is empty, directly sending the query result;
if the final search result is not null, a result set is obtained, and step 903 is executed to perform data analysis and sorting on the search result.
Step 903, sorting the result set, and then displaying the result according to graphics or tables; the graphical display is a time-sequenced two-dimensional flow chart, each flow can display the detailed information of the flow, and the table display is used for performing list paging display on the detailed contents of the result set.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A real-time analysis method based on mass log data is characterized by comprising the following steps,
acquiring input query conditions, combining the query conditions into keywords,
based on the stored big data, searching is carried out according to the keywords,
analyzing the searched result set according to a set analysis strategy to obtain an analysis result;
wherein the content of the first and second substances,
the big data is obtained as follows:
recording the data associated with the service in each service process of each service point to generate the log data of each service point,
collecting log data of each service point in real time, caching,
filtering and structuring the cached log data to obtain big data for persistent storage,
and storing the big data.
2. The real-time analysis method of claim 1, wherein the big data is stored in a distributed cluster, wherein hot data is stored in a first distributed storage device, all data is stored in a second distributed storage device as cold data, and the cold data is transferred to the first distributed storage device as hot data when accessed, wherein the performance of the first distributed storage device is higher than the performance of the second distributed storage device,
based on the stored big data, a search is performed according to the keywords, including,
the hot data is preferentially searched for,
in the case that the hot data cannot be searched, the cold data is searched again.
3. The real-time analysis method of claim 1, further comprising graphically displaying or tabulating the analysis results, wherein the graphical display is a time-sequenced two-dimensional flow chart, and each flow shows the detailed information of the flow.
4. The real-time analysis method of claim 1, wherein the collecting log data of each service point in real-time and buffering comprises,
collecting log data of each service point from each service point,
according to the type of the log data, caching the log data in corresponding queues in a distributed cluster mode, managing the cached data in a mode of a producer and a consumer, and deleting the cached data after the cached data is used by all consumers.
5. A real-time analysis system based on mass log data is characterized in that the system comprises,
a log data generating device for recording the data associated with the service in each service process of each service point to generate the log data of each service point,
a log data acquisition device for acquiring log data of each service point in real time,
the big data caching device is used for caching the collected log data of each service point,
the data filtering and confirming device is used for filtering and structuring the cached log data to obtain big data for persistent storage,
a big data storage device for storing big data,
and the log data analysis device is used for acquiring input query conditions, combining the query conditions into keywords, searching according to the keywords based on the stored big data, and analyzing the searched result set according to a set analysis strategy to obtain an analysis result.
6. The real-time analysis system of claim 5, further comprising,
and the analysis result display device is used for carrying out graphic display or table display on the analysis result, wherein the graphic display is a time-sequenced two-dimensional flow chart, and each step of the flow displays the detailed information of the flow.
7. The real-time analysis system of claim 6, wherein the log data generation means comprises,
the log data sorting unit is used for interacting with each service point according to a standard protocol, sorting the data provided by each service point and generating log data;
the log data file unit is used for performing file processing on the log data generated by the log data sorting unit to form a file suitable for reading;
the log data acquisition device comprises a log data acquisition device,
the log data collection unit is used for tracking the log file and providing the event data for the log data aggregation unit to use;
the log data gathering unit is used for processing the data collected by the log data collecting unit and transmitting the processed data to the first log data output unit at least once, and under the condition that the first log data output unit is blocked and all transmitted events are not confirmed, continuously trying to send data to the first log data output unit until the log data output unit outputs and confirms the received events;
and the first log data output unit is used for interacting with the big data cache device, acquiring data from the log data aggregation unit, transmitting the acquired data to the big data cache device, and returning the acquired data to the log data aggregation unit when receiving a confirmation event of the big data cache device.
8. The real-time analysis system of claim 7, wherein the big data caching means comprises,
the producer unit is used for interacting with the first log data output unit, acquiring log data and issuing the data to the corresponding data queue unit according to types;
the data queue unit is used for receiving and storing the data sent by the producer unit, interacting with the consumer unit, reading the data from the queue and providing the data to the consumer unit;
the consumer unit is used for interacting with the data queue unit, reading data from the queue and transmitting the data to the data filtering and confirming device;
the data filtering and confirming device comprises a data filtering and confirming device,
the log data input unit is used for interacting with the big data caching device and acquiring data from the big data caching device in a continuous stream transmission mode;
the log data filtering unit is used for carrying out one or any combination of processing of filtering, recombining and confirming the data and converting the universal format;
a second log data output unit for outputting the data generated by the log data filtering unit to the big data storage device;
the big data storage device comprises a big data storage device,
and the first data interface unit is used for carrying out data input storage and query output on the big data.
And the search analysis unit is used for acquiring corresponding data from the data storage unit according to the data query condition provided by the first data interface unit.
And the data storage unit is used for storing the log data.
9. The real-time analysis system of claim 8, wherein the log data analysis means comprises,
the second data interface unit is used for interacting with the big data storage device, providing a uniform reading interface for inquiring data in the big data storage device and acquiring data required by the calculation of the data analysis unit;
the data analysis unit is used for interacting with the second data interface unit and the query interface unit and providing real-time calculation and mining calculation capacity;
the query interface unit is used for interacting with the analysis result display device, analyzing the query conditions to form a proper query statement, submitting the query statement to the data analysis unit for query, sorting the query result into a result set with a specified format, and returning the result set to the analysis result display device;
the analysis result display device comprises a display device,
the query control unit is used for starting and stopping a log data query task, recording query conditions and detecting normalization,
the query request unit is used for receiving a query request of the query control unit, interacting with the log data analysis device and acquiring data required by the calculation of the data processing unit; interacting with the data processing unit, and handing the inquired data with the data processing unit for processing;
the data processing unit is used for converting and/or processing the data from the query request unit according to the requirements of the query conditions;
and the result display unit is used for displaying the log data finished by the data processing unit in a graphical or tabular form.
10. The real-time analysis system of claim 9, wherein the big data caching apparatus is a distributed cluster caching apparatus, the apparatus further comprising,
the first cluster management unit is used for cluster management of the producer unit, the data queue unit and the consumer unit;
the big data storage device is a distributed cluster storage device, the device further comprises,
and the second cluster management unit is used for managing each node and providing a search analysis function on all the nodes.
CN202111298565.9A 2021-11-04 2021-11-04 Real-time analysis method and system based on mass log data Pending CN114036120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111298565.9A CN114036120A (en) 2021-11-04 2021-11-04 Real-time analysis method and system based on mass log data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111298565.9A CN114036120A (en) 2021-11-04 2021-11-04 Real-time analysis method and system based on mass log data

Publications (1)

Publication Number Publication Date
CN114036120A true CN114036120A (en) 2022-02-11

Family

ID=80136398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111298565.9A Pending CN114036120A (en) 2021-11-04 2021-11-04 Real-time analysis method and system based on mass log data

Country Status (1)

Country Link
CN (1) CN114036120A (en)

Similar Documents

Publication Publication Date Title
US9413703B2 (en) Synchronizing conversation structures in web-based email systems
CN101163265A (en) Distributed database based on multimedia message log inquiring method and system
CN110865997A (en) Online identification method for hidden danger of power system equipment and application platform thereof
AU2005231112A1 (en) Methods and systems for structuring event data in a database for location and retrieval
CN102833111B (en) A kind of visual HTTP data monitoring and managing method and device
CN111552885A (en) System and method for realizing automatic real-time message pushing operation
US20020091685A1 (en) System and method for filtering data events
CN113836094B (en) File life cycle management method and system for distributed video storage
CN101953188A (en) Method and apparatus for processing multi-channel request subscription under service management platform
CN101667932A (en) Method of network element equipment log management and device
CN114238388A (en) Heterogeneous data collection and retrieval system based on multiple protocols
CN107257289A (en) A kind of risk analysis equipment, monitoring system and monitoring method
CN107451301B (en) Processing method, device, equipment and storage medium for real-time delivery bill mail
CN109145092B (en) Database updating and intelligent question and answer management method, device and equipment
CN114036120A (en) Real-time analysis method and system based on mass log data
CN111552719A (en) Data management method, device and system, big data platform and readable storage medium
CN105740397A (en) Big data parallel operation-based voice mail business data analysis method
CN113986656B (en) Power grid data safety monitoring system based on data center platform
CN112650767B (en) Data exchange method and system with preposed data filtering
JP2679972B2 (en) Information service processing method
CN112506960A (en) Multi-model data storage method and system based on ArangoDB engine
CN107612721B (en) Management system and method of operation and maintenance data, convergence server and processing server
CN111723262A (en) System and method for combining headlines, abstracts and texts of network news in batches
CN111861641A (en) Multi-channel order integrated management system and method based on communication industry
CN111010676A (en) Short message caching method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination