WO2020140624A1 - Method for extracting data from log, and related device - Google Patents

Method for extracting data from log, and related device Download PDF

Info

Publication number
WO2020140624A1
WO2020140624A1 PCT/CN2019/118038 CN2019118038W WO2020140624A1 WO 2020140624 A1 WO2020140624 A1 WO 2020140624A1 CN 2019118038 W CN2019118038 W CN 2019118038W WO 2020140624 A1 WO2020140624 A1 WO 2020140624A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
data
type
updated
extraction information
Prior art date
Application number
PCT/CN2019/118038
Other languages
French (fr)
Chinese (zh)
Inventor
陈珍妮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020140624A1 publication Critical patent/WO2020140624A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of artificial intelligence technology, and in particular, to a method and related equipment for extracting data from logs.
  • the corresponding data is obtained by searching the corresponding data in the database table of the system.
  • the data stored in the database table is not complete.
  • the database table only includes data such as the final result processed by the system. The inventor realized that the collected data depends on the data stored in the database table. If the corresponding data to be collected is not saved in the database table, the data needs to be collected from other channels. The efficiency of data acquisition is low, and the data obtained is not complete.
  • the present application provides a method and apparatus for extracting data from a log.
  • a method for extracting data from a log includes: performing log update on a running system New monitoring; if the log update is monitored, the updated log is identified through the neural network model to determine the log type of the updated log; the data extraction information search corresponding to the log type in the configuration file is performed, and the data extraction The information indicates a data item for data extraction from the log of the log type; extracting data corresponding to the data item from the updated log according to the found data extraction information.
  • an apparatus for extracting data from a log includes: a monitoring module configured to: perform log update monitoring on a running system; an identification module configured to: if a log update is monitored, then Recognize the updated log through the neural network model to determine the log type of the updated log; the search module is configured to: search for data extraction information corresponding to the log type in the configuration file, and the data extraction information indicates A data item for data extraction from the log of the log type; an extraction module configured to: extract data corresponding to the data item from the updated log according to the found data extraction information.
  • an electronic device includes: a processor; and a memory, where the computer-readable instructions are stored on the memory, and the computer-readable instructions are executed by the processor to implement the following steps:
  • a computer non-volatile readable storage medium has stored thereon a computer program, and when the computer program is executed by a processor, the following steps are implemented: performing log update monitoring on the running system; if When the log update is monitored, the updated log is identified through the neural network model to determine the log type of the updated log; the data extraction information corresponding to the log type is searched in the configuration file, and the data extraction information indicates from Data items for data extraction in the log of the log type; extracting data corresponding to the data items from the updated log according to the found data extraction information.
  • the method of the present application through log update monitoring, identification of the log type of the updated log, search of data extraction information defined by the log type, and extracting pairs from the updated log according to the data extraction information
  • the data should be extracted from the log to obtain data related to system operation, real-time collection of system operation data, and data integrity is ensured.
  • the deep learning method is used to identify the type of the local chronicle, which improves the recognition efficiency and accuracy, and ensures the efficiency and real-time performance of data extraction.
  • FIG. 1 is a block diagram of a server according to an exemplary embodiment
  • FIG. 2 is a flowchart of a method for extracting data from a log according to an exemplary embodiment
  • FIG. 3 is a flowchart of step S 130 of the embodiment corresponding to FIG. 2;
  • FIG. 4 is a flowchart of steps before step S130 of the embodiment corresponding to FIG. 2;
  • FIG. 5 is a flowchart of steps before step S150 of the embodiment corresponding to FIG. 2;
  • step S170 of the embodiment corresponding to FIG. 2 is a flowchart of steps after step S170 of the embodiment corresponding to FIG. 2;
  • FIG. 7 is a flowchart of step S430 of the embodiment corresponding to FIG. 6;
  • FIG. 8 is a block diagram of a device for extracting data from a log according to an exemplary embodiment
  • Fig. 9 is a block diagram of an electronic device according to an exemplary embodiment.
  • Fig. 1 is a block diagram of a server according to an exemplary embodiment.
  • a server with this hardware structure can be used to perform the method of extracting data from the log of the present application, wherein the system runs on the server to provide services for each terminal of the system, thereby generating logs during the operation of the system, and the server can The generated log is subjected to data extraction according to the method of this application.
  • the main body of the method of extracting data from the log of the present application is not limited to the server shown in FIG. 1, the main body of the method of the present application may also be a device with logic operation processing capabilities, such as a desktop computer, a laptop computer, and The server cluster, cloud server, etc. composed of multiple servers are not specifically limited herein.
  • server is only an example adapted to the present application, and cannot be considered as providing any limitation on the scope of use of the present application.
  • the server cannot also be interpreted as requiring or having to have one or more components in the exemplary server 200 shown in FIG.
  • the server 200 includes: a power supply 210, an interface 230, at least one memory 250, and at least one central processor ( CPU, Central Processing Units) 270.
  • CPU Central Processing Unit
  • the power supply 210 is used to provide an operating voltage for each hardware device on the server 200.
  • the interface 230 includes at least one wired or wireless network interface 231, at least one serial-to-parallel conversion interface 233, at least one input-output interface 235, and at least one USB interface 237, etc., for communicating with external devices, such as data with the terminal 100 transmission.
  • the memory 250 may be a read-only memory, a random access memory, a magnetic disk, or an optical disk.
  • the resources stored on the memory 250 include an operating system 251, application programs 253, and data 255.
  • the storage method may be temporary storage. Or permanent storage.
  • the operating system 251 is used to manage and control the hardware devices and application programs 253 on the server 200 to implement the calculation and processing of the massive data 255 by the central processor 270, which may be Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM , FreeBSDTM, etc.
  • the application program 253 is a computer program that completes at least one specific job based on the operating system 251, and may include at least one module (not shown in FIG. 2), and each module may separately include a series of computers for the server 200. Readable instructions. Data 255 can be stored on disk Logs etc.
  • the central processor 270 may include one or more processors, and is configured to communicate with the memory 250 through a bus for computing and processing the massive data 255 in the memory 250.
  • the server 200 applicable to the present application will complete the method of extracting data from the log by the central processor 270 reading a series of computer-readable instructions stored in the memory 250.
  • the server 200 may be one or more application specific integrated circuits (Applicati on Specific Integrated Circuit, ASIC for short), digital signal processor, digital signal processing equipment, programmable logic device, field Programming gate arrays, controllers, microcontrollers, microprocessors or other electronic components are implemented to perform the following method of extracting data from the log. Therefore, the implementation of this application is not limited to any specific hardware circuit, software, or a combination of both.
  • ASIC Application specific integrated circuits
  • Fig. 2 is a flow chart showing a method for extracting data from a log according to an exemplary embodiment. The method may be executed by the server shown in FIG. 1, and may include the following steps:
  • Step S110 Perform log update monitoring on the running system.
  • the running system is, for example, a system that provides services for various program clients, such as a trading system, a valuation system, a fund system, etc. in a financial company, or an application program running on a terminal device
  • the system continuously performs logic processing during operation, for example, receiving a request initiated by the client, performing request processing according to the request initiated by the client, and issuing instructions to the client.
  • logic processing for example, receiving a request initiated by the client, performing request processing according to the request initiated by the client, and issuing instructions to the client.
  • the logic performed The treatment is also different. Therefore, during the logical processing of the system, a log is generated according to the logical processing performed, for example, after receiving a request initiated by a client, a log of the received request is generated, or after a request is processed, a log of the result of the request processing is generated, After the client login is successful, the user login log is generated.
  • the corresponding log storage unit is configured in the system, so that the logs generated during the operation of the system are stored in the configured log storage unit, so that the log can be updated in the log storage unit Monitoring, that is, whether a new log is stored in the log storage unit for the emergency over the year, the new log is a log newly generated by the running system.
  • Step S130 If the log update is monitored, the updated log is identified through the neural network model to determine the log type of the updated log. [0041] In response to different logical operations performed by the system, logs corresponding to different log types are generated, wherein logs with different log types have different formats of logs on the one hand and different data contained in the logs on the other hand. For example, for example, in a system, logs are generated for user login and user request processing results, where the logs generated for user login (called login type logs) are:
  • the log generated for user request processing success (called the request processing success type log) is:
  • the user login time and the logged-in user included therein are the carried in the login type log data.
  • request processing success type log including the time "20180 904-14:00" when the user initiated the request, the user "Amy” who initiated the request, the request type “product new”, the request processing result "success”, and the system response time "2.2 seconds"
  • the time included in the log by the user to initiate the request, the user who initiated the request, the type of request, the request processing result, and the time of the system response are the data carried in the log.
  • the updated log can be identified through the neural network model to determine the log type of the updated log, that is, the updated log is identified by deep learning.
  • the neural network model After training, the neural network model performs feature extraction on the updated log, and then predicts the label of the updated log according to the extracted feature, thereby determining the log type of the updated log.
  • the neural network model used may be a convolutional neural network model, a recursive neural network model, or a recurrent neural network model, which is not specifically limited here.
  • Step S150 Search for data extraction information corresponding to the log type in the configuration file, and the data extraction information indicates a data item for data extraction from the log of the log type.
  • the configuration file includes data extraction information configured for the log corresponding to the log type for which data extraction is required. Therefore, the configuration file may include one or more sets of data extraction information, where one set of data extraction information corresponds to a log type. In the data extraction information corresponding to each log type, data items to be extracted from the log corresponding to the log type are configured. For example, the log in the above example, if you log from the login type in the above example: [0050] 20180904-11:21: User jenny logged into the system
  • the logged-in time and the logged-in user are data items that need to be extracted, and in the log, the logged-in time “20180904-11:21” corresponds to the logged-in data item
  • the data of the login user "jenny" is the data corresponding to the data item of the login user.
  • Step S170 Extract data corresponding to the data item from the updated log according to the found data extraction information.
  • the data extraction information corresponding to each log type indicates one or more data items that require data extraction. Therefore, the data corresponding to the data item is extracted from the updated log according to the found data extraction information. Thus, data collection from the log is realized.
  • the operation-related data realizes the real-time collection of system operation data. Furthermore, the deep learning method is used to identify the log types, which improves the identification efficiency and accuracy, and further ensures the efficiency and real-time performance of data extraction.
  • the system log contains all the information related to the operation of the system, the data extracted from the log guarantees the integrity of the extracted data relative to the method obtained from the database or indirectly.
  • step S130 includes:
  • Step S131 Construct a feature vector of the updated log.
  • the feature vector may be constructed based on the text of the updated log. Because the format of logs of different log types is different, the feature vectors constructed for logs of different log types are also different. The constructed feature vector reflects the characteristics of the updated log.
  • the login type log mentioned above such as
  • 20180904-11:21 User jenny logged into the system [0060] and the request processing success type log, such as
  • YYYYYYY-YY User YY initiated a YYYY request, processing YY, response time YY seconds, except the position occupied by the Y symbol is, the rest of the log type is the same of.
  • the configured keywords and the location of the keywords are fixed.
  • the login type log after the specific login user XX is "Login to the system", so that when constructing the feature vector of a log of a certain log type, the feature vector is constructed according to the keyword in the log and the location of the keyword. That is, the keyword search is performed in the updated log, and the position of the keyword in the updated log is obtained, thereby constructing the feature vector of the updated log.
  • step S131 it further includes segmenting the updated log, and then constructing a feature vector of the updated log according to the encoding corresponding to each word.
  • Step S132 Perform classification prediction on the feature vector to obtain a type label corresponding to the updated log.
  • Step S133 Determine the log type of the updated log according to the type tag.
  • step S132 the classification prediction is performed according to the constructed feature vector, that is, the probability of each type label of the feature vector is predicted, and then the probability of predicting each type of label is traversed, and the type label with the maximum probability is used as the The type tag corresponding to the update log. Therefore, the log type of the updated log is determined according to the obtained type log.
  • the method further includes:
  • Step S210 Acquire a plurality of sample logs, and acquire a sample label marked for each sample log.
  • Step S220 training the neural network model through several sample logs and corresponding type tags.
  • Step S230 When the neural network model converges, the training of the neural network model ends.
  • the neural network model predicts the type label of the sample log for each sample log. If the predicted type label is inconsistent with the sample label marked on the sample log, the neural network model is adjusted Type parameters until the predicted type label is consistent with the sample label. Repeat this process for each sample log.
  • a prediction accuracy test is performed on the neural network model, that is, several test logs are input into the neural network model, the neural network model predicts the type label of each test log, and the type label of each test log Compared with the type label marked on the test log, if they are consistent, the Bem neural network model predicts the test log accurately, if not, the neural network model predicts the test log incorrectly, so that the neural network is statistically obtained
  • the training of the neural network model, and the neural network model after the training is used to identify the updated log in step S130.
  • step S150 it further includes:
  • Step S310 Acquire a template log of the same type as the log corresponding to the log to be extracted.
  • Step S320 In the template log, replace the data corresponding to the data item with the variable configured for the data item, and obtain data extraction information corresponding to the log type according to the replaced template log configuration.
  • Step S330 a configuration file is formed from the data extraction information corresponding to each log type.
  • the template log may be any log of the log type. Corresponding to the situation where data needs to be extracted from logs of multiple log types, correspondingly, a template log of each log type is obtained.
  • the log format is the same, in which there are the same parts, for example, the keywords in the log, and the position of the keywords are the same, but different
  • the part is only a few, such as the data corresponding to the data items that need to be extracted.
  • Metrics.login.pattern %timestamp%: SP%usemame% logged into the system
  • timestamp is a variable configured for the data item of login time
  • username is a variable configured for the data item of login user.
  • the template log is used to replace the data corresponding to the data item with the variables configured for the data item. That is equivalent to assigning the data corresponding to the data item to the variable configured for the data item.
  • the second line in the data extraction information defines the output variable, that is, the variable corresponding to the data item needs to be extracted as the output variable, so that when data extraction is performed according to the data extraction information, the corresponding data item in the log can be obtained data.
  • the configuration file may be configured for logs of multiple log types, data extraction information is configured for each log type, and the data extraction information corresponding to each log type constitutes a configuration file.
  • the corresponding identification is configured for the data extraction information of each log type, and the identification of the data extraction information and the log type are created Association, so that after identifying the log type of the updated log by identifying in step S130, the data extraction information identifier associated with the log type can be directly searched, so as to quickly find the data extraction information corresponding to the log type.
  • the method further includes:
  • Step S410 Search the data table corresponding to the log type.
  • Step S430 Write the extracted data to a data table to store the data.
  • the extracted data is different, so that each log type is configured with a corresponding data table for storing the data extracted from the log of the log type. And write the data to the corresponding data table to realize the storage of the extracted data. Therefore, when performing analysis processing, the analysis is performed directly based on the data stored in the data table, and the analysis results are obtained, for example, information such as user login volume, system processing success volume, and system processing failure volume are obtained.
  • step S430 includes:
  • Step S431 Locate the data field associated with the data item in the data table.
  • Step S432 Write the data corresponding to the data item into the table unit configured for the data field.
  • the data extracted for the log of each log type may be data of one data item, or data of multiple data items. Therefore, for the case where the extracted data is data of multiple data items, a data field is configured for each data item in the data table, and the data item is associated with the data field, thereby writing the extracted data to the data table During data, the data field associated with the data item is located and searched, and then the data of the data item is written into the table unit configured as the data field. Further, in the data table, data is written line by line, that is, after data is written in one line in the data table, the next extracted data is written in the next line of the line, and so on.
  • FIG. 8 is a block diagram of an apparatus for extracting data from logs according to an exemplary embodiment.
  • the apparatus may be deployed in the server 200 shown in FIG. 1 and execute any of the above method embodiments. All or part of the method of extracting data from the log.
  • the device includes but is not limited to: a monitoring module 110, an identification module 130, a search module 150, and an extraction module 170, wherein: the monitoring module 110 is configured to: perform log update monitoring on the running system.
  • An identification module 130 which is connected to the monitoring module 110, is configured to: if a log update is monitored, identify the updated log through a neural network model to determine the log type of the updated log.
  • a search module 150 which is connected to the identification module 130, is configured to: search for data extraction information corresponding to the log type in the configuration file, and the data extraction information indicates data items to be extracted from the log of the log type.
  • Extraction module 170 which is connected to the search module 150, and is configured to: extract data corresponding to the data item from the updated log according to the found data extraction information.
  • the recognition module 130 includes: a feature vector construction unit configured to: construct the updated feature vector of the log.
  • the classification prediction unit is configured to: perform classification prediction on the feature vector to obtain the type label corresponding to the updated log.
  • the log type determination unit is configured to: determine the log type of the updated log according to the type label.
  • the device for extracting data from the log further includes the following module, which performs the corresponding steps before the identification module is executed: a sample log acquisition module configured to: acquire a plurality of sample logs, and acquire the same The sample label marked in this log.
  • the training module is configured to: train the neural network model through several sample logs and corresponding type labels.
  • the end of training module is configured to: end the training of the neural network model when the neural network model converges.
  • the device for extracting data from the log further includes the following module, which performs the corresponding step before the search module is executed: a template log acquisition module configured to: acquire a log corresponding to the log to be extracted Template logs with the same log type.
  • the data extraction information generating module is configured to: replace the data corresponding to the data item with the variables configured for the data item in the template log, and obtain the data extraction information corresponding to the log type according to the template log configuration after the replacement.
  • the configuration file generation module is configured as follows: the configuration file is composed of data extraction information corresponding to each log type.
  • the apparatus for extracting data from the log further includes: a data table search module configured to: perform a search for a data table corresponding to the log type.
  • the data writing module is configured to: write the extracted data into a data table for data storage.
  • the data writing module includes: a data field positioning unit configured to: locate the data field associated with the data item in the data table.
  • Write unit configured to: map data items The data is written to the table cell configured for the data field.
  • the present application also provides an electronic device, which can be used in the server 200 shown in FIG. 1 to perform all of the methods for extracting data from logs shown in any of the above method embodiments Or some steps.
  • the slave electronic device 1000 includes: a processor 1001; and a memory 1002, where the computer readable instructions are stored on the memory 1002, and when the computer readable instructions are executed by the processor 1001, the method of any of the above method implementations is implemented .
  • the executable instruction when executed by the processor 1001, the method in any of the above embodiments is implemented.
  • the executable instructions are, for example, computer-readable instructions.
  • the processor 1001 executes the processor reads the computer-readable instructions stored in the memory through the communication line/bus 1003 connected to the memory.
  • a computer non-volatile readable storage medium is also provided, on which a computer program is stored, and when the computer program is executed by a processor, the slave log in any of the above method embodiments is implemented The method of extracting data.
  • the computer non-volatile readable storage medium includes, for example, a memory 250 of a computer program, and the above instructions can be executed by the central processor 270 of the server 200 to implement the above method of extracting data from the log.

Abstract

A method and apparatus for extracting data from a log, relating to the technical field of artificial intelligence. The method comprises: monitoring log update of the running system (S110); if the log update is monitored, identifying the updated log by means of a neural network model to determine the log type of the updated log (S130); searching for in a configuration file data extraction information corresponding to the log type, the data extraction information indicating a data item where the data is extracted from the log of the log type (S150); and extracting, according to the searched data extraction information, the data corresponding to the data item from the updated log (S170). According to the method, the required data is extracted from the running system in real time, and the efficiency is high.

Description

说明书 发明名称:从日志中提取数据的方法和相关设备 技术领域 Specification Title: Method and related equipment for extracting data from logs Technical field
[0001] 本申请要求 2019年 1月 4日递交、 发明名称为“从日志中提取数据的方法、 装置 及计算机可读存储介质”的中国专利申请 CN201910007431.3的优先权, 在此通过 引用将其全部内容合并于此。 [0001] This application requires the priority of the Chinese patent application CN201910007431.3 submitted on January 4, 2019, with the invention titled "Method, Device and Computer-readable Storage Media for Extracting Data from Logs", hereby incorporated by reference All of its contents are merged here.
[0002] 本申请涉及人工智能技术领域, 特别涉及一种从日志中提取数据的方法和相关 设备。 [0002] The present application relates to the field of artificial intelligence technology, and in particular, to a method and related equipment for extracting data from logs.
背景技术 Background technique
[0003] 为了系统的运行状态, 需要进行系统运行相关数据的收集, 例如系统登录用户 、 用户登录时间、 处理成功的请求、 处理失败的请求、 响应时间、 处理失败原 因等, 从而对系统进行综合的统计分析, 例如得到系统处理效率、 用户偏好等 [0003] In order to run the system, it is necessary to collect data related to system operation, such as system login user, user login time, successful processing request, failed processing request, response time, processing failure reason, etc., so as to integrate the system Statistical analysis of, for example, system processing efficiency, user preferences, etc.
[0004] 为了获得与系统运行相关的数据, 通过在系统的数据库表里面进行对应数据的 查找, 从而获得对应的数据。 但是数据库表中保存的数据并不完整, 一般出于 数据库的冗余等考虑, 数据库表里仅包括系统处理的最终结果等数据。 发明人 意识到: 所收集的数据依赖于数据库表中存储的数据, 如果数据库表中未保存 对应需要收集的数据, 则需要从其他途径来收集数据, 数据获得的效率低, 且 获得的数据不完整。 [0004] In order to obtain data related to the operation of the system, the corresponding data is obtained by searching the corresponding data in the database table of the system. However, the data stored in the database table is not complete. Generally, due to the redundancy of the database, the database table only includes data such as the final result processed by the system. The inventor realized that the collected data depends on the data stored in the database table. If the corresponding data to be collected is not saved in the database table, the data needs to be collected from other channels. The efficiency of data acquisition is low, and the data obtained is not complete.
[0005] 由上可知, 如何有效获得与系统运行相关的数据的问题还有待解决。 [0005] It can be seen from the above that the problem of how to effectively obtain data related to system operation has yet to be resolved.
发明概述 Summary of the invention
技术问题 technical problem
问题的解决方案 Solution to the problem
技术解决方案 Technical solution
[0006] 为了解决相关技术中存在获得与系统运行相关的数据的问题, 本申请提供了一 种从日志中提取数据的方法及装置。 [0006] In order to solve the problem of obtaining data related to system operation in the related art, the present application provides a method and apparatus for extracting data from a log.
[0007] 第一方面, 一种从日志中提取数据的方法, 包括: 对所运行的系统进行日志更 新监控; 如果监控到日志更新, 则通过神经网络模型进行所更新日志的识别, 以确定所更新日志的日志类型; 在配置文件中进行所述日志类型所对应数据提 取信息查找, 所述数据提取信息指示了从所述日志类型的日志中进行数据提取 的数据项; 根据所查找到的数据提取信息从所述所更新日志中提取所述数据项 对应的数据。 [0007] In a first aspect, a method for extracting data from a log includes: performing log update on a running system New monitoring; if the log update is monitored, the updated log is identified through the neural network model to determine the log type of the updated log; the data extraction information search corresponding to the log type in the configuration file is performed, and the data extraction The information indicates a data item for data extraction from the log of the log type; extracting data corresponding to the data item from the updated log according to the found data extraction information.
[0008] 第二方面, 一种从日志中提取数据的装置, 包括: 监控模块, 被配置为: 对所 运行的系统进行日志更新监控; 识别模块, 被配置为: 如果监控到日志更新, 则通过神经网络模型进行所更新日志的识别, 以确定所更新日志的日志类型; 查找模块, 被配置为: 在配置文件中进行所述日志类型所对应数据提取信息查 找, 所述数据提取信息指示了从所述日志类型的日志中进行数据提取的数据项 ; 提取模块, 被配置为: 根据所查找到的数据提取信息从所述所更新日志中提 取所述数据项对应的数据。 [0008] In a second aspect, an apparatus for extracting data from a log includes: a monitoring module configured to: perform log update monitoring on a running system; an identification module configured to: if a log update is monitored, then Recognize the updated log through the neural network model to determine the log type of the updated log; the search module is configured to: search for data extraction information corresponding to the log type in the configuration file, and the data extraction information indicates A data item for data extraction from the log of the log type; an extraction module configured to: extract data corresponding to the data item from the updated log according to the found data extraction information.
[0009] 第三方面, 一种电子设备, 包括: 处理器; 及存储器, 所述存储器上存储有计 算机可读指令, 所述计算机可读指令被所述处理器执行时实现如下步骤: [0009] In a third aspect, an electronic device includes: a processor; and a memory, where the computer-readable instructions are stored on the memory, and the computer-readable instructions are executed by the processor to implement the following steps:
[0010] 对所运行的系统进行日志更新监控; 如果监控到日志更新, 则通过神经网络模 型进行所更新日志的识别, 以确定所更新日志的日志类型; 在配置文件中进行 所述日志类型所对应数据提取信息查找, 所述数据提取信息指示了从所述日志 类型的日志中进行数据提取的数据项; 根据所查找到的数据提取信息从所述所 更新日志中提取所述数据项对应的数据 [0010] Perform log update monitoring on the running system; if a log update is monitored, identify the updated log through a neural network model to determine the log type of the updated log; perform the log type in the configuration file Corresponding to data extraction information search, the data extraction information indicates a data item for data extraction from the log of the log type; extracting the data item corresponding to the data item from the updated log according to the found data extraction information Data
[0011] 第四方面, 一种计算机非易失性可读存储介质, 其上存储有计算机程序, 所述 计算机程序被处理器执行时实现下步骤: 对所运行的系统进行日志更新监控; 如果监控到日志更新, 则通过神经网络模型进行所更新日志的识别, 以确定所 更新日志的日志类型; 在配置文件中进行所述日志类型所对应数据提取信息查 找, 所述数据提取信息指示了从所述日志类型的日志中进行数据提取的数据项 ; 根据所查找到的数据提取信息从所述所更新日志中提取所述数据项对应的数 据。 [0011] According to a fourth aspect, a computer non-volatile readable storage medium has stored thereon a computer program, and when the computer program is executed by a processor, the following steps are implemented: performing log update monitoring on the running system; if When the log update is monitored, the updated log is identified through the neural network model to determine the log type of the updated log; the data extraction information corresponding to the log type is searched in the configuration file, and the data extraction information indicates from Data items for data extraction in the log of the log type; extracting data corresponding to the data items from the updated log according to the found data extraction information.
[0012] 按照本申请的方法, 通过日志更新监控、 所更新日志的日志类型的识别、 日志 类型定义的数据提取信息的查找、 根据数据提取信息从所更新的日志中提取对 应的数据, 从而实现了从日志中提取得到与系统运行相关的数据, 实现了实时 地收集系统运行的数据, 而且保证了数据的完整性。 而且采用深度学习的方式 对曰志类型进行识别, 提高了识别效率和识别准确率, 保证了数据提取的效率 和实时性。 [0012] According to the method of the present application, through log update monitoring, identification of the log type of the updated log, search of data extraction information defined by the log type, and extracting pairs from the updated log according to the data extraction information The data should be extracted from the log to obtain data related to system operation, real-time collection of system operation data, and data integrity is ensured. In addition, the deep learning method is used to identify the type of the local chronicle, which improves the recognition efficiency and accuracy, and ensures the efficiency and real-time performance of data extraction.
[0013] 应当理解的是, 以上的一般描述和后文的细节描述仅是示例性的, 并不能限制 本申请。 [0013] It should be understood that the above general description and the following detailed description are only exemplary and do not limit the present application.
发明的有益效果 Beneficial effects of invention
对附图的简要说明 Brief description of the drawings
附图说明 BRIEF DESCRIPTION
[0014] 此处的附图被并入说明书中并构成本说明书的一部分, 示出了符合本申请的实 施例, 并于说明书一起用于解释本申请的原理。 [0014] The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments consistent with the present application, and are used together with the specification to explain the principles of the present application.
[0015] 图 1是根据一示例性实施例示出的一种服务器的框图; [0015] Fig. 1 is a block diagram of a server according to an exemplary embodiment;
[0016] 图 2是根据一示例性实施例示出的一种从日志中提取数据的方法的流程图; [0017] 图 3是图 2对应实施例的步骤 S 130的流程图; [0016] FIG. 2 is a flowchart of a method for extracting data from a log according to an exemplary embodiment; [0017] FIG. 3 is a flowchart of step S 130 of the embodiment corresponding to FIG. 2;
[0018] 图 4是图 2对应实施例的步骤 S 130之前步骤的流程图; [0018] FIG. 4 is a flowchart of steps before step S130 of the embodiment corresponding to FIG. 2;
[0019] 图 5是图 2对应实施例的步骤 S150之前步骤的流程图; [0019] FIG. 5 is a flowchart of steps before step S150 of the embodiment corresponding to FIG. 2;
[0020] 图 6是图 2对应实施例的步骤 S170之后步骤的流程图; 6 is a flowchart of steps after step S170 of the embodiment corresponding to FIG. 2;
[0021] 图 7是图 6对应实施例的步骤 S430的流程图; [0021] FIG. 7 is a flowchart of step S430 of the embodiment corresponding to FIG. 6;
[0022] 图 8是根据一示例性实施例示出的一种从日志中提取数据的装置的框图; [0022] Fig. 8 is a block diagram of a device for extracting data from a log according to an exemplary embodiment;
[0023] 图 9是根据一示例性实施例示出的一种电子设备的框图。 [0023] Fig. 9 is a block diagram of an electronic device according to an exemplary embodiment.
[0024] 通过上述附图, 已示出本申请明确的实施例, 后文中将有更详细的描述, 这些 附图和文字描述并不是为了通过任何方式限制本申请构思的范围, 而是通过参 考特定实施例为本领域技术人员说明本申请的概念。 [0024] The above-mentioned drawings have shown clear embodiments of the present application, and will be described in more detail later. These drawings and text descriptions are not intended to limit the scope of the present application in any way, but by reference. The specific embodiments illustrate the concept of the present application for those skilled in the art.
发明实施例 Invention Example
本发明的实施方式 Embodiments of the invention
[0025] 这里将详细地对示例性实施例执行说明, 其示例表示在附图中。 下面的描述涉 及附图时, 除非另有表示, 不同附图中的相同数字表示相同或相似的要素。 以 下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式 。 相反, 它们仅是与如所附权利要求书中所详述的、 本申请的一些方面相一致 的装置和方法的例子。 [0025] Exemplary embodiments will be described in detail here, examples of which are shown in the drawings. When the following description refers to the drawings, unless otherwise indicated, the same numerals in different drawings represent the same or similar elements. To The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with this application. Rather, they are merely examples of devices and methods consistent with some aspects of the application as detailed in the appended claims.
[0026] 图 1是根据一示例性实施例示出的一种服务器的框图。 具有此硬件结构的服务 器可用于执行本申请的从日志中提取数据的方法, 其中系统运行于服务器中, 从而为系统的各个终端提供服务, 从而在系统运行过程中产生日志, 而服务器 可以根据所产生的日志按照本申请的方法进行数据提取。 当然, 本申请从日志 中提取数据的方法的执行主体并不限于图 1所示的服务器中, 本申请方法的执行 主体还可以是具备逻辑运算处理能力的设备, 例如台式电脑、 笔记本电脑、 由 多个服务器构成的服务器集群、 云端服务器等, 在此不进行具体限定。 [0026] Fig. 1 is a block diagram of a server according to an exemplary embodiment. A server with this hardware structure can be used to perform the method of extracting data from the log of the present application, wherein the system runs on the server to provide services for each terminal of the system, thereby generating logs during the operation of the system, and the server can The generated log is subjected to data extraction according to the method of this application. Of course, the main body of the method of extracting data from the log of the present application is not limited to the server shown in FIG. 1, the main body of the method of the present application may also be a device with logic operation processing capabilities, such as a desktop computer, a laptop computer, and The server cluster, cloud server, etc. composed of multiple servers are not specifically limited herein.
[0027] 需要说明的是, 该服务器只是一个适配于本申请的示例, 不能认为是提供了对 本申请使用范围的任何限制。 该服务器也不能解释为需要依赖于或者必须具有 图 2中示出的示例性的服务器 200中的一个或者多个组件。 [0027] It should be noted that the server is only an example adapted to the present application, and cannot be considered as providing any limitation on the scope of use of the present application. The server cannot also be interpreted as requiring or having to have one or more components in the exemplary server 200 shown in FIG.
[0028] 该服务器的硬件结构可因配置或者性能的不同而产生较大的差异, 如图 2所示 , 服务器 200包括: 电源 210、 接口 230、 至少一存储器 250、 以及至少一中央处 理器 (CPU, Central Processing Units) 270。 [0028] The hardware structure of the server may vary greatly due to differences in configuration or performance. As shown in FIG. 2, the server 200 includes: a power supply 210, an interface 230, at least one memory 250, and at least one central processor ( CPU, Central Processing Units) 270.
[0029] 其中, 电源 210用于为服务器 200上的各硬件设备提供工作电压。 [0029] The power supply 210 is used to provide an operating voltage for each hardware device on the server 200.
[0030] 接口 230包括至少一有线或无线网络接口 231、 至少一串并转换接口 233、 至少 一输入输出接口 235以及至少一 USB接口 237等, 用于与外部设备通信, 例如与终 端 100进行数据传输。 The interface 230 includes at least one wired or wireless network interface 231, at least one serial-to-parallel conversion interface 233, at least one input-output interface 235, and at least one USB interface 237, etc., for communicating with external devices, such as data with the terminal 100 transmission.
[0031] 存储器 250作为资源存储的载体, 可以是只读存储器、 随机存储器、 磁盘或者 光盘等, 其上所存储的资源包括操作系统 251、 应用程序 253及数据 255等, 存储 方式可以是短暂存储或者永久存储。 其中, 操作系统 251用于管理与控制服务器 200上的各硬件设备以及应用程序 253, 以实现中央处理器 270对海量数据 255的 计算与处理, 其可以是 Windows ServerTM、 Mac OS XTM、 UnixTM、 LinuxTM 、 FreeBSDTM等。 应用程序 253是基于操作系统 251之上完成至少一项特定工作 的计算机程序, 其可以包括至少一模块 (图 2中未示出) , 每个模块都可以分别 包含有对服务器 200的一系列计算机可读指令。 数据 255可以是存储于磁盘中的 日志等。 [0031] As a carrier for resource storage, the memory 250 may be a read-only memory, a random access memory, a magnetic disk, or an optical disk. The resources stored on the memory 250 include an operating system 251, application programs 253, and data 255. The storage method may be temporary storage. Or permanent storage. Among them, the operating system 251 is used to manage and control the hardware devices and application programs 253 on the server 200 to implement the calculation and processing of the massive data 255 by the central processor 270, which may be Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM , FreeBSDTM, etc. The application program 253 is a computer program that completes at least one specific job based on the operating system 251, and may include at least one module (not shown in FIG. 2), and each module may separately include a series of computers for the server 200. Readable instructions. Data 255 can be stored on disk Logs etc.
[0032] 中央处理器 270可以包括一个或多个以上的处理器, 并设置为通过总线与存储 器 250通信, 用于运算与处理存储器 250中的海量数据 255。 [0032] The central processor 270 may include one or more processors, and is configured to communicate with the memory 250 through a bus for computing and processing the massive data 255 in the memory 250.
[0033] 如上面所详细描述的, 适用本申请的服务器 200将通过中央处理器 270读取存储 器 250中存储的一系列计算机可读指令的形式来完成从日志中提取数据的方法。 [0033] As described in detail above, the server 200 applicable to the present application will complete the method of extracting data from the log by the central processor 270 reading a series of computer-readable instructions stored in the memory 250.
[0034] 在示例性实施例中, 服务器 200可以被一个或多个应用专用集成电路 (Applicati on Specific Integrated Circuit , 简称 ASIC) 、 数字信号处理器、 数字信号处理设 备、 可编程逻辑器件、 现场可编程门阵列、 控制器、 微控制器、 微处理器或其 他电子元件实现, 用于执行下述从日志中提取数据的方法。 因此, 实现本申请 并不限于任何特定硬件电路、 软件以及两者的组合。 [0034] In an exemplary embodiment, the server 200 may be one or more application specific integrated circuits (Applicati on Specific Integrated Circuit, ASIC for short), digital signal processor, digital signal processing equipment, programmable logic device, field Programming gate arrays, controllers, microcontrollers, microprocessors or other electronic components are implemented to perform the following method of extracting data from the log. Therefore, the implementation of this application is not limited to any specific hardware circuit, software, or a combination of both.
[0035] 图 2是根据一示例性实施例示出的一种从日志中提取数据的方法的流程图。 该 方法可以由图 1所示的服务器执行, 可以包括以下步骤: [0035] Fig. 2 is a flow chart showing a method for extracting data from a log according to an exemplary embodiment. The method may be executed by the server shown in FIG. 1, and may include the following steps:
[0036] 步骤 S110, 对所运行的系统进行日志更新监控。 [0036] Step S110: Perform log update monitoring on the running system.
[0037] 其中所运行的系统例如是为各个程序客户端提供服务的系统, 例如金融公司中 的交易系统、 估值系统、 基金系统等, 也可以是终端设备上所运行的应用程序 [0037] The running system is, for example, a system that provides services for various program clients, such as a trading system, a valuation system, a fund system, etc. in a financial company, or an application program running on a terminal device
[0038] 系统在运行过程中不断地进行逻辑处理, 例如接收客户端发起的请求, 根据客 户端发起的请求进行请求处理、 向客户端下发指令, 当然, 根据系统的不同, 所进行的逻辑处理也不相同。 从而在系统进行逻辑处理过程中, 对应根据所进 行的逻辑处理生成日志, 例如在接收客户端发起的请求之后, 生成接收请求的 日志, 又比如在请求处理之后, 生成请求处理结果的日志, 在客户端登录成功 之后, 生成用户登录日志等。 [0038] The system continuously performs logic processing during operation, for example, receiving a request initiated by the client, performing request processing according to the request initiated by the client, and issuing instructions to the client. Of course, depending on the system, the logic performed The treatment is also different. Therefore, during the logical processing of the system, a log is generated according to the logical processing performed, for example, after receiving a request initiated by a client, a log of the received request is generated, or after a request is processed, a log of the result of the request processing is generated, After the client login is successful, the user login log is generated.
[0039] 系统中配置对应的日志存储单兀, 从而, 在系统在运彳了的过程中所广生的日志 存储于所配置的日志存储单元中, 从而可以在日志存储单元中进行日志的更新 监控, 即急案快过年日志存储单元中是否存入新的日志, 该新的日志即为所运 行系统新产生的日志。 [0039] The corresponding log storage unit is configured in the system, so that the logs generated during the operation of the system are stored in the configured log storage unit, so that the log can be updated in the log storage unit Monitoring, that is, whether a new log is stored in the log storage unit for the emergency over the year, the new log is a log newly generated by the running system.
[0040] 步骤 S130, 如果监控到日志更新, 则通过神经网络模型进行所更新日志的识别 , 以确定所更新日志的日志类型。 [0041] 针对系统所执行的逻辑运算不同, 对应产生不同日志类型的日志, 其中日志类 型不同的日志, 一方面, 日志的格式不同, 另一方面, 日志中所包含的数据不 同。 举例来说, 例如在某一系统中, 会针对用户登录以及用户请求处理结果生 成日志, 其中, 针对用户登录所生成的日志 (称之为登录类型日志) 为: [0040] Step S130: If the log update is monitored, the updated log is identified through the neural network model to determine the log type of the updated log. [0041] In response to different logical operations performed by the system, logs corresponding to different log types are generated, wherein logs with different log types have different formats of logs on the one hand and different data contained in the logs on the other hand. For example, for example, in a system, logs are generated for user login and user request processing results, where the logs generated for user login (called login type logs) are:
[0042] 20180904-11:21: 用户 jenny登陆了系统 [0042] 20180904-11:21: User jenny logged into the system
[0043] 针对用户请求处理成功生成的日志 (称之为请求处理成功类型日志) 为: [0043] The log generated for user request processing success (called the request processing success type log) is:
[0044] 20180904-14:00: 用户 Amy发起了产品新建的请求, 处理成功, 响应时间 2.2秒 [0044] 20180904-14:00: User Amy initiated a request for product creation, the processing was successful, and the response time was 2.2 seconds
[0045] 针对上述的登录类型日志, 其中包括的用户登录时间“20180904-11:21”, 登录 用户“jenny”, 其中所包含的用户登录时间、 登录用户即为该登录类型日志中所 携带的数据。 对于上述请求处理成功类型日志, 包括用户发起请求的时间“20180 904-14:00”, 发起请求的用户“Amy”, 请求的类型“产品新建”, 请求处理结果“成 功”, 系统的响应时间“2.2秒”, 该日志中所包含的用户发起请求的时间、 发起请 求的用户、 请求的类型、 请求处理结果、 系统响应的时间即为该日志所携带的 数据。 [0045] For the above-mentioned login type log, which includes the user login time "20180904-11:21" and the login user "jenny", the user login time and the logged-in user included therein are the carried in the login type log data. For the above request processing success type log, including the time "20180 904-14:00" when the user initiated the request, the user "Amy" who initiated the request, the request type "product new", the request processing result "success", and the system response time "2.2 seconds", the time included in the log by the user to initiate the request, the user who initiated the request, the type of request, the request processing result, and the time of the system response are the data carried in the log.
[0046] 由于不同日志类型的日志的格式不同, 从而可以通过神经网络模型对所更新的 日志进行识别, 以确定所更新日志的日志类型, 即采用深度学习的方式对所更 新的日志进行识别。 [0046] Since the log format of different log types is different, the updated log can be identified through the neural network model to determine the log type of the updated log, that is, the updated log is identified by deep learning.
[0047] 神经网络模型在训练完成后, 对所更新的日志进行特征提取, 然后根据所提取 的特征进行所更新日志的标签的预测, 从而确定所更新日志的日志类型。 其中 , 所采用的神经网络模型可以是卷积神经网络模型、 递归神经网络模型、 循环 神经网络模型, 在此不进行具体限定。 [0047] After training, the neural network model performs feature extraction on the updated log, and then predicts the label of the updated log according to the extracted feature, thereby determining the log type of the updated log. Among them, the neural network model used may be a convolutional neural network model, a recursive neural network model, or a recurrent neural network model, which is not specifically limited here.
[0048] 步骤 S150, 在配置文件中进行日志类型所对应数据提取信息查找, 数据提取信 息指示了从日志类型的日志中进行数据提取的数据项。 [0048] Step S150: Search for data extraction information corresponding to the log type in the configuration file, and the data extraction information indicates a data item for data extraction from the log of the log type.
[0049] 其中, 配置文件中包括为需要进行提取数据的日志类型所对应日志所配置的数 据提取信息。 所以配置文件可以包括一组或者多组数据提取信息, 其中一组数 据提取信息对应一日志类型。 在每一日志类型所对应的数据提取信息中, 配置 了从该日志类型所对应的日志中需要提取的数据项。 例如上述举例中的日志, 如果从上述所举例的登录类型日志: [0050] 20180904-11:21: 用户 jenny登陆了系统 [0049] Wherein, the configuration file includes data extraction information configured for the log corresponding to the log type for which data extraction is required. Therefore, the configuration file may include one or more sets of data extraction information, where one set of data extraction information corresponds to a log type. In the data extraction information corresponding to each log type, data items to be extracted from the log corresponding to the log type are configured. For example, the log in the above example, if you log from the login type in the above example: [0050] 20180904-11:21: User jenny logged into the system
[0051] 进行登录时间和登录用户的提取, 则登录时间和登录用户为需要进行数据提取 的数据项, 而在该日志中, 登录时间“20180904-11:21”即为该登录时间数据项对 应的数据, 登录用户“jenny”即为该登录用户数据项所对应的数据。 [0051] To extract the login time and the logged-in user, the logged-in time and the logged-in user are data items that need to be extracted, and in the log, the logged-in time “20180904-11:21” corresponds to the logged-in data item The data of the login user "jenny" is the data corresponding to the data item of the login user.
[0052] 步骤 S170, 根据所查找到的数据提取信息从所更新日志中提取数据项对应的数 据。 [0052] Step S170: Extract data corresponding to the data item from the updated log according to the found data extraction information.
[0053] 每一日志类型所对应的数据提取信息指示了需要进行数据提取的一个或者多个 数据项。 从而, 根据所查找到的数据提取信息从所更新日志中进行数据项所对 应数据的提取。 从而实现了从日志进行数据的收集。 [0053] The data extraction information corresponding to each log type indicates one or more data items that require data extraction. Therefore, the data corresponding to the data item is extracted from the updated log according to the found data extraction information. Thus, data collection from the log is realized.
[0054] 为了了解系统的使用情况, 需要统计与系统运行相关的数据, 例如系统处理成 功量、 系统处理失败量、 系统针对每一请求的响应时间、 系统登录用户量, 甚 至为了分析用户行为偏好, 需要进一步统计用户登录时间、 用户所发起请求的 类型等。 按照本申请的方法, 不需要从存储系统运行结果的数据库中进行数据 查找并获取, 根据其他的数据源间接获得需要收集的数据。 而通过日志更新监 控、 所更新日志的日志类型的识别、 日志类型定义的数据提取信息的查找、 根 据数据提取信息从所更新的日志中提取对应的数据, 从而实现了从日志中提取 得到与系统运行相关的数据, 实现了实时地收集系统运行的数据。 而且采用深 度学习的方式对日志类型进行识别, 提高了识别效率和识别准确率, 进一步保 证了数据提取的效率和实时性。 另外, 由于系统的日志中包括了系统运行相关 的全部信息, 所以从日志中提取得到的数据相对于从数据库或者间接获得的方 式, 保证了所提取数据的完整性。 [0054] In order to understand the usage of the system, statistics related to the operation of the system need to be counted, such as the amount of system processing success, the amount of system processing failure, the system response time for each request, the number of system login users, and even to analyze user behavior preferences , Need to further count user login time, the type of request initiated by the user, etc. According to the method of the present application, there is no need to search and obtain data from a database storing the operating results of the system, and indirectly obtain data to be collected according to other data sources. And through log update monitoring, identification of the log type of the updated log, search of the data extraction information defined by the log type, and extracting the corresponding data from the updated log according to the data extraction information, so as to achieve the extraction from the log and the system The operation-related data realizes the real-time collection of system operation data. Furthermore, the deep learning method is used to identify the log types, which improves the identification efficiency and accuracy, and further ensures the efficiency and real-time performance of data extraction. In addition, because the system log contains all the information related to the operation of the system, the data extracted from the log guarantees the integrity of the extracted data relative to the method obtained from the database or indirectly.
[0055] 在一实施例中, 如图 3所示, 步骤 S130包括: [0055] In an embodiment, as shown in FIG. 3, step S130 includes:
[0056] 步骤 S131, 构建所更新日志的特征向量。 [0056] Step S131: Construct a feature vector of the updated log.
[0057] 其中, 特征向量是可以基于对所更新日志的文本构建的。 由于不同日志类型的 日志的格式是不相同的, 所以针对不同日志类型的日志所构建的特征向量也是 不同的。 所构建的特征向量即体现了所更新日志的特征。 [0057] The feature vector may be constructed based on the text of the updated log. Because the format of logs of different log types is different, the feature vectors constructed for logs of different log types are also different. The constructed feature vector reflects the characteristics of the updated log.
[0058] 举例来说, 例如上文提到的登录类型日志, 比如 [0058] For example, for example, the login type log mentioned above, such as
[0059] 20180904-11:21: 用户 jenny登陆了系统 [0060] 和请求处理成功类型日志, 比如 [0059] 20180904-11:21: User jenny logged into the system [0060] and the request processing success type log, such as
[0061] 20180904-14:00: 用户 Amy发起了产品新建的请求, 处理成功, 响应时间 2.2秒 [0061] 20180904-14:00: User Amy initiated a request for product creation, the processing was successful, and the response time was 2.2 seconds
[0062] 在该两种日志类型的日志中所配置的关键词是不一样的, 在登录类型的日志中 , 其中 XXXXXXXX-XX: XX用户 XX登录了系统, 除 X符号所占据的位置夕卜, 其他部分的内容是相同的。 [0062] The keywords configured in the logs of the two log types are different. In the log of the log type, XXXXXXXX-XX: XX user XX logged into the system, except for the position occupied by the X symbol. The content of other parts is the same.
[0063] 对于请求处理成功类型日志中: YYYYYYYY-YY: 用户 YY发起了 YYYY的请 求, 处理 YY, 响应时间 YY秒, 除 Y符号所占据的位置为, 该日志类型的日志中 其他部分是相同的。 由上两个示例中可以看出, 在每种日志类型的日志中, 所 配置的关键字, 以及关键字所在的位置是固定的, 例如在登录类型日志中, 在 具体的登录用户 XX之后是“登录了系统”, 从而在构建某一登录类型日志的特征 向量时, 根据该日志中的关键字以及关键字所在的位置进行特征向量的构建。 即在所更新日志中进行关键字的查找, 并获取关键字在所更新日志中的位置, 从而构建得到所更新日志的特征向量。 [0063] For the request processing success type log: YYYYYYYY-YY: User YY initiated a YYYY request, processing YY, response time YY seconds, except the position occupied by the Y symbol is, the rest of the log type is the same of. It can be seen from the previous two examples that in each type of log, the configured keywords and the location of the keywords are fixed. For example, in the login type log, after the specific login user XX is "Login to the system", so that when constructing the feature vector of a log of a certain log type, the feature vector is constructed according to the keyword in the log and the location of the keyword. That is, the keyword search is performed in the updated log, and the position of the keyword in the updated log is obtained, thereby constructing the feature vector of the updated log.
[0064] 进一步的, 在步骤 S131之前, 还包括, 对所更新日志进行分词, 然后根据每个 词所对应的编码构建所更新日志的特征向量。 [0064] Further, before step S131, it further includes segmenting the updated log, and then constructing a feature vector of the updated log according to the encoding corresponding to each word.
[0065] 步骤 S132, 对特征向量进行分类预测, 得到所更新日志对应的类型标签。 [0065] Step S132: Perform classification prediction on the feature vector to obtain a type label corresponding to the updated log.
[0066] 步骤 S133 , 根据类型标签确定所更新日志的日志类型。 [0066] Step S133: Determine the log type of the updated log according to the type tag.
[0067] 在神经网络模型中, 根据系统所生成日志, 为每一日志类型的日志配置对应的 类型标签。 从而在步骤 S132中, 根据所构建的特征向量进行分类预测, 即是预 测该特征向量分别每一类型标签的概率, 然后遍历预测为每一类型标签的概率 , 以概率最大值的类型标签作为所更新日志对应的类型标签。 从而根据所得到 的类型日志确定所更新日志的日志类型。 [0067] In the neural network model, according to the log generated by the system, configure a corresponding type label for each log type log. Therefore, in step S132, the classification prediction is performed according to the constructed feature vector, that is, the probability of each type label of the feature vector is predicted, and then the probability of predicting each type of label is traversed, and the type label with the maximum probability is used as the The type tag corresponding to the update log. Therefore, the log type of the updated log is determined according to the obtained type log.
[0068] 在一实施例中, 如图 4所示, 步骤 S130之前, 还包括: [0068] In an embodiment, as shown in FIG. 4, before step S130, the method further includes:
[0069] 步骤 S210, 获取若干样本日志, 以及获取对每一样本日志所标注的样本标签。 [0069] Step S210: Acquire a plurality of sample logs, and acquire a sample label marked for each sample log.
[0070] 步骤 S220, 通过若干样本日志和所对应的类型标签进行神经网络模型的训练。 [0070] Step S220, training the neural network model through several sample logs and corresponding type tags.
[0071] 步骤 S230, 当神经网络模型收敛, 结束神经网络模型的训练。 [0071] Step S230: When the neural network model converges, the training of the neural network model ends.
[0072] 神经网络模型针对每一样本日志进行该样本日志的类型标签的预测, 如果预测 得到的类型标签与对该样本日志所标注的样本标签不一致, 则调整神经网络模 型的参数, 直至所预测得到的类型标签与样本标签一致。 对每一样本日志重复 执行该过程。 [0072] The neural network model predicts the type label of the sample log for each sample log. If the predicted type label is inconsistent with the sample label marked on the sample log, the neural network model is adjusted Type parameters until the predicted type label is consistent with the sample label. Repeat this process for each sample log.
[0073] 训练一段时间之后, 对神经网络模型进行预测精度测试, 即将若干测试日志输 入到神经网络模型中, 神经网络模型预测得到每一测试日志的类型标签, 并将 每一测试日志的类型标签与对该测试日志所标注的类型标签进行对比, 如果一 致, 贝 m神经网络模型对该测试日志的预测准确, 如果不一致, 则该神经网络 模型对该测试日志的预测错误, 从而统计得到神经网络模型的预测准确率 (预 测准确率=预测准确的测试日志数量/测试日志总量) , 如果所得到的预测准确率 满足设定的准确率要求, 则该训练之后的神经网络模型收敛, 从而结束该神经 网络模型的训练, 并将结束训练的神经网络模型用于步骤 S130中对所更新日志 进行识别。 通过对神经网络模型进行训练, 从而提高对更新日志的识别准确度 [0073] After a period of training, a prediction accuracy test is performed on the neural network model, that is, several test logs are input into the neural network model, the neural network model predicts the type label of each test log, and the type label of each test log Compared with the type label marked on the test log, if they are consistent, the Bem neural network model predicts the test log accurately, if not, the neural network model predicts the test log incorrectly, so that the neural network is statistically obtained The prediction accuracy of the model (prediction accuracy = number of test logs with accurate prediction/total number of test logs), if the obtained prediction accuracy meets the requirements of the set accuracy, the neural network model after the training converges, thus ending The training of the neural network model, and the neural network model after the training is used to identify the updated log in step S130. By training the neural network model, to improve the recognition accuracy of the update log
[0074] 在一实施例中, 如图 5所示, 步骤 S150之前, 还包括: [0074] In an embodiment, as shown in FIG. 5, before step S150, it further includes:
[0075] 步骤 S310, 获取与待进行数据提取的日志所对应日志类型相同的模板日志。 [0075] Step S310: Acquire a template log of the same type as the log corresponding to the log to be extracted.
[0076] 步骤 S320, 在模板日志中, 以为数据项所配置的变量替换数据项所对应的数据 , 并根据替换后的模板日志配置得到日志类型所对应的数据提取信息。 [0076] Step S320: In the template log, replace the data corresponding to the data item with the variable configured for the data item, and obtain data extraction information corresponding to the log type according to the replaced template log configuration.
[0077] 步骤 S330, 由每一日志类型所对应的数据提取信息构成配置文件。 [0077] Step S330, a configuration file is formed from the data extraction information corresponding to each log type.
[0078] 其中模板日志可以是该日志类型中的任意一个日志。 对应于需要从多个日志类 型的日志中提取数据的情况, 则对应的, 获取每一个日志类型的模板日志。 [0078] The template log may be any log of the log type. Corresponding to the situation where data needs to be extracted from logs of multiple log types, correspondingly, a template log of each log type is obtained.
[0079] 如上所描述, 针对同一日志类型的不同日志, 其日志的格式是相同的, 其中是 存在相同的部分的, 例如日志中的关键字, 以及关键字的位置是相同的, 而不 同的部分仅在于少数, 例如需要进行数据提取的数据项所对应的数据。 [0079] As described above, for different logs of the same log type, the log format is the same, in which there are the same parts, for example, the keywords in the log, and the position of the keywords are the same, but different The part is only a few, such as the data corresponding to the data items that need to be extracted.
[0080] 在进行数据提取信息配置前, 为每一需要提取的数据项配置变量, 在模型日志 中, 以为数据项所配置的变量替换模板日志中该数据项对应的数据, 并定义以 变量的值作为输出, 即得到该日志类型对应的数据提取信息。 从而在按照该数 据提取信息在该日志类型对应的日志中进行数据提取时, 则从日志中提取得到 变量所在位置的数据, 即为数据项所对应的数据。 [0080] Before configuring the data extraction information, configure variables for each data item that needs to be extracted. In the model log, replace the data corresponding to the data item in the template log with the variable configured for the data item, and define the variable The value is used as output to obtain the data extraction information corresponding to the log type. Therefore, when data is extracted in the log corresponding to the log type according to the data extraction information, the data at the variable position is extracted from the log, which is the data corresponding to the data item.
[0081] 举例来说, 例如需要从登陆类型日志中提取登陆时间和登录用户这两个数据项 对应的数据, 则配置得到针对该日志类型的数据提取信息: [0081] For example, for example, two data items of login time and login user need to be extracted from the login type log The corresponding data is configured to obtain data extraction information for the log type:
[0082] Metrics.login.pattern=%timestamp%:SP%usemame%登陆了系统 [0082] Metrics.login.pattern=%timestamp%: SP%usemame% logged into the system
[0083] Metric s . login. index= timestamp, username [0083] Metric s. Login. index= timestamp, username
[0084] 其中, timestamp为对登录时间这一数据项配置的变量, username为对登录用 户这一数据项所配置的变量。 在数据提取信息的第一行即实现了在模板日志中 , 利用为数据项所配置的变量替换数据项所对应的数据。 即相当于将该数据项 对应的数据赋值为该数据项所配置的变量。 数据提取信息中的第二行即定义了 输出的变量, 即将需要提取数据项所对应的变量作为输出变量, 从而在按照数 据提取信息进行数据提取时, 则可以得到日志中该数据项所对应的数据。 [0084] where timestamp is a variable configured for the data item of login time, and username is a variable configured for the data item of login user. In the first line of the data extraction information, the template log is used to replace the data corresponding to the data item with the variables configured for the data item. That is equivalent to assigning the data corresponding to the data item to the variable configured for the data item. The second line in the data extraction information defines the output variable, that is, the variable corresponding to the data item needs to be extracted as the output variable, so that when data extraction is performed according to the data extraction information, the corresponding data item in the log can be obtained data.
[0085] 由于配置文件可以是针对多种日志类型的日志所配置的, 所以, 分别针对每一 日志类型配置得到数据提取信息, 而每一日志类型所对应的数据提取信息即构 成了配置文件。 [0085] Since the configuration file may be configured for logs of multiple log types, data extraction information is configured for each log type, and the data extraction information corresponding to each log type constitutes a configuration file.
[0086] 进一步的, 为了方便在配置文件中进行每一日志类型所对应数据提取信息的定 位, 为每一日志类型的数据提取信息配置相应的标识, 并将数据提取信息的标 识与日志类型创建关联, 从而在步骤 S130中通过识别确定所更新日志的日志类 型后, 可以直接查找与日志类型关联的数据提取信息标识, 从而快速查找到该 日志类型对应的数据提取信息。 [0086] Further, in order to facilitate the positioning of the data extraction information corresponding to each log type in the configuration file, the corresponding identification is configured for the data extraction information of each log type, and the identification of the data extraction information and the log type are created Association, so that after identifying the log type of the updated log by identifying in step S130, the data extraction information identifier associated with the log type can be directly searched, so as to quickly find the data extraction information corresponding to the log type.
[0087] 在一实施例中, 如图 6所示, 步骤 S170之后, 还包括: [0087] In an embodiment, as shown in FIG. 6, after step S170, the method further includes:
[0088] 步骤 S410, 进行日志类型所对应数据表的查找。 [0088] Step S410: Search the data table corresponding to the log type.
[0089] 步骤 S430, 将所提取的数据写入数据表, 以进行数据的存储。 [0089] Step S430: Write the extracted data to a data table to store the data.
[0090] 针对不同日志类型的日志, 所提取的数据是不同的, 从而为每一日志类型配置 对应的数据表, 用于存储从该日志类型的日志中所提取得到的数据。 并将数据 写入到对应的数据表中, 实现所提取数据地存储。 从而进行分析处理时, 直接 根据数据表中所存储的数据进行分析, 得到分析结果, 例如得到用户登录量、 系统处理成功量、 系统处理失败量等信息。 [0090] For the logs of different log types, the extracted data is different, so that each log type is configured with a corresponding data table for storing the data extracted from the log of the log type. And write the data to the corresponding data table to realize the storage of the extracted data. Therefore, when performing analysis processing, the analysis is performed directly based on the data stored in the data table, and the analysis results are obtained, for example, information such as user login volume, system processing success volume, and system processing failure volume are obtained.
[0091] 在一实施例中, 如图 7所示, 步骤 S430包括: [0091] In an embodiment, as shown in FIG. 7, step S430 includes:
[0092] 步骤 S431 在数据表中进行数据项所关联数据字段的定位。 [0092] Step S431: Locate the data field associated with the data item in the data table.
[0093] 步骤 S432, 将数据项所对应的数据写入为数据字段配置的表单元中。 [0094] 针对每一日志类型的日志所提取的数据可以是一个数据项的数据, 也可以是多 个数据项的数据。 从而针对所提取的数据是多个数据项的数据的情形, 在数据 表中为每个数据项配置数据字段, 并将数据项与数据字段进行关联, 从而在向 数据表中写入所提取的数据时, 进行数据项所关联的数据字段的定位查找, 进 而将该数据项的数据写入为数据字段所配置的表单元中。 进一步的, 在数据表 中, 数据的写入是按照逐行写入, 即在数据表中的一行写入数据后, 下一次所 提取的数据即写入该行的下一行, 以此类推。 [0093] Step S432: Write the data corresponding to the data item into the table unit configured for the data field. [0094] The data extracted for the log of each log type may be data of one data item, or data of multiple data items. Therefore, for the case where the extracted data is data of multiple data items, a data field is configured for each data item in the data table, and the data item is associated with the data field, thereby writing the extracted data to the data table During data, the data field associated with the data item is located and searched, and then the data of the data item is written into the table unit configured as the data field. Further, in the data table, data is written line by line, that is, after data is written in one line in the data table, the next extracted data is written in the next line of the line, and so on.
[0095] 下述为本申请装置实施例, 可以用于执行本申请上述服务器 200执行的从日志 中提取数据的方法实施例。 对于本申请装置实施例中未披露的细节, 请参照本 申请从日志中提取数据的方法实施例。 [0095] The following is an embodiment of the apparatus of the present application, which may be used to execute an embodiment of the method for extracting data from a log executed by the server 200 of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiment of the present application for extracting data from logs.
[0096] 图 8是根据一示例性实施例示出的一种从日志中提取数据的装置的框图, 该装 置可以部署于图 1所示的服务器 200中, 执行以上方法实施例中任一所示的从日 志中提取数据的方法的全部或者部分步骤。 如图 8所示, 该装置包括但不限于: 监控模块 110、 识别模块 130、 查找模块 150以及提取模块 170, 其中: 监控模块 1 10, 被配置为: 对所运行的系统进行日志更新监控。 识别模块 130, 该模块与监 控模块 110相连, 被配置为: 如果监控到日志更新, 则通过神经网络模型进行所 更新日志的识别, 以确定所更新日志的日志类型。 查找模块 150, 该模块与识别 模块 130相连, 被配置为: 在配置文件中进行日志类型所对应数据提取信息查找 , 数据提取信息指示了从日志类型的日志中进行数据提取的数据项。 提取模块 1 70, 该模块与查找模块 150相连, 被配置为: 根据所查找到的数据提取信息从所 更新日志中提取数据项对应的数据。 [0096] FIG. 8 is a block diagram of an apparatus for extracting data from logs according to an exemplary embodiment. The apparatus may be deployed in the server 200 shown in FIG. 1 and execute any of the above method embodiments. All or part of the method of extracting data from the log. As shown in FIG. 8, the device includes but is not limited to: a monitoring module 110, an identification module 130, a search module 150, and an extraction module 170, wherein: the monitoring module 110 is configured to: perform log update monitoring on the running system. An identification module 130, which is connected to the monitoring module 110, is configured to: if a log update is monitored, identify the updated log through a neural network model to determine the log type of the updated log. A search module 150, which is connected to the identification module 130, is configured to: search for data extraction information corresponding to the log type in the configuration file, and the data extraction information indicates data items to be extracted from the log of the log type. Extraction module 170, which is connected to the search module 150, and is configured to: extract data corresponding to the data item from the updated log according to the found data extraction information.
[0097] 上述装置中各个模块的功能和作用的实现过程具体详见上述从日志中提取数据 的方法中对应步骤的实现过程, 在此不再赘述。 [0097] For the implementation process of the functions and functions of the various modules in the above device, see the implementation process of the corresponding steps in the above method for extracting data from the log, which will not be repeated here.
[0098] 可以理解, 这些模块可以通过硬件、 软件、 或二者结合来实现。 当以硬件方式 实现时, 这些模块可以实施为一个或多个硬件模块, 例如一个或多个专用集成 电路。 当以软件方式实现时, 这些模块可以实施为在一个或多个处理器上执行 的一个或多个计算机程序, 例如图 2的中央处理器 270所执行的存储在存储器 250 中的程序。 [0099] 在一实施例中, 识别模块 130包括: 特征向量构建单元, 被配置为: 构建所更 新曰志的特征向量。 分类预测单元, 被配置为: 对特征向量进行分类预测, 得 到所更新日志对应的类型标签。 日志类型确定单元, 被配置为: 根据类型标签 确定所更新日志的日志类型。 [0098] It can be understood that these modules may be implemented by hardware, software, or a combination of both. When implemented in hardware, these modules may be implemented as one or more hardware modules, such as one or more application specific integrated circuits. When implemented in software, these modules may be implemented as one or more computer programs executed on one or more processors, such as the programs stored in the memory 250 executed by the central processor 270 of FIG. 2. [0099] In an embodiment, the recognition module 130 includes: a feature vector construction unit configured to: construct the updated feature vector of the log. The classification prediction unit is configured to: perform classification prediction on the feature vector to obtain the type label corresponding to the updated log. The log type determination unit is configured to: determine the log type of the updated log according to the type label.
[0100] 上述装置中各个模块的功能和作用的实现过程具体详见上述方法实施例中对应 步骤的实现过程, 在此不再赘述。 [0100] For the implementation process of the functions and functions of each module in the above apparatus, please refer to the implementation process of the corresponding steps in the above method embodiment for details, which will not be repeated here.
[0101] 在一实施例中, 从日志中提取数据的装置还包括如下的模块, 在识别模块执行 前执行对应步骤: 样本日志获取模块, 被配置为: 获取若干样本日志, 以及获 取对每一样本日志所标注的样本标签。 训练模块, 被配置为: 通过若干样本日 志和所对应的类型标签进行神经网络模型的训练。 训练结束模块, 被配置为: 当神经网络模型收敛, 结束神经网络模型的训练。 [0101] In an embodiment, the device for extracting data from the log further includes the following module, which performs the corresponding steps before the identification module is executed: a sample log acquisition module configured to: acquire a plurality of sample logs, and acquire the same The sample label marked in this log. The training module is configured to: train the neural network model through several sample logs and corresponding type labels. The end of training module is configured to: end the training of the neural network model when the neural network model converges.
[0102] 上述装置中各个模块的功能和作用的实现过程具体详见上述方法实施例中对应 步骤的实现过程, 在此不再赘述。 [0102] For the implementation process of the functions and functions of the various modules in the above device, see the implementation process of the corresponding steps in the above method embodiments for details, and details are not described herein again.
[0103] 在一实施例中, 从日志中提取数据的装置还包括如下的模块, 在查找模块执行 前执行对应步骤: 模板日志获取模块, 被配置为:获取与待进行数据提取的日志 所对应日志类型相同的模板日志。 数据提取信息生成模块, 被配置为: 在模板 日志中, 以为数据项所配置的变量替换数据项所对应的数据, 并根据替换后的 模板日志配置得到日志类型所对应的数据提取信息。 配置文件生成模块, 被配 置为: 由每一日志类型所对应的数据提取信息构成配置文件。 [0103] In an embodiment, the device for extracting data from the log further includes the following module, which performs the corresponding step before the search module is executed: a template log acquisition module configured to: acquire a log corresponding to the log to be extracted Template logs with the same log type. The data extraction information generating module is configured to: replace the data corresponding to the data item with the variables configured for the data item in the template log, and obtain the data extraction information corresponding to the log type according to the template log configuration after the replacement. The configuration file generation module is configured as follows: the configuration file is composed of data extraction information corresponding to each log type.
[0104] 上述装置中各个模块的功能和作用的实现过程具体详见上述方法实施例中对应 步骤的实现过程, 在此不再赘述。 [0104] For the implementation process of the functions and functions of the various modules in the above device, see the implementation process of the corresponding steps in the above method embodiment for details, which will not be repeated here.
[0105] 在一实施例中, 从日志中提取数据的装置还包括: 数据表查找模块, 被配置为 : 进行日志类型所对应数据表的查找。 数据写入模块, 被配置为: 将所提取的 数据写入数据表, 以进行数据的存储。 [0105] In an embodiment, the apparatus for extracting data from the log further includes: a data table search module configured to: perform a search for a data table corresponding to the log type. The data writing module is configured to: write the extracted data into a data table for data storage.
[0106] 上述装置中各个模块的功能和作用的实现过程具体详见上述方法实施例中对应 步骤的实现过程, 在此不再赘述。 [0106] For the implementation process of the functions and functions of the various modules in the above device, see the implementation process of the corresponding steps in the above method embodiment for details, which will not be repeated here.
[0107] 在一实施例中, 数据写入模块包括: 数据字段定位单元, 被配置为: 在数据表 中进行数据项所关联数据字段的定位。 写入单元, 被配置为: 将数据项所对应 的数据写入为数据字段配置的表单元中。 [0107] In an embodiment, the data writing module includes: a data field positioning unit configured to: locate the data field associated with the data item in the data table. Write unit, configured to: map data items The data is written to the table cell configured for the data field.
[0108] 上述装置中各个单元的功能和作用的实现过程具体详见上述方法实施例中对应 步骤的实现过程, 在此不再赘述。 [0108] For the implementation process of the functions and functions of the units in the above device, see the implementation process of the corresponding steps in the above method embodiments for details, and details are not described herein again.
[0109] 可选的, 本申请还提供一种电子设备, 该装置可以用于图 1所示的服务器 200中 , 执行以上方法实施例中任一所示的从日志中提取数据的方法的全部或者部分 步骤。 如图 9所示, 从电子设备 1000包括: 处理器 1001 ; 及存储器 1002, 存储器 1002上存储有计算机可读指令, 计算机可读指令被处理器 1001执行时实现以上 方法实施中任一项的方法。 [0109] Optionally, the present application also provides an electronic device, which can be used in the server 200 shown in FIG. 1 to perform all of the methods for extracting data from logs shown in any of the above method embodiments Or some steps. As shown in FIG. 9, the slave electronic device 1000 includes: a processor 1001; and a memory 1002, where the computer readable instructions are stored on the memory 1002, and when the computer readable instructions are executed by the processor 1001, the method of any of the above method implementations is implemented .
[0110] 其中, 可执行指令被处理器 1001执行时实现以上任一实施例中的方法。 其中可 执行指令比如是计算机可读指令, 在处理器 1001执行时, 处理器通过与存储器 之间所连接的通信线 /总线 1003读取存储于存储器中的计算机可读指令。 [0110] Wherein, when the executable instruction is executed by the processor 1001, the method in any of the above embodiments is implemented. The executable instructions are, for example, computer-readable instructions. When the processor 1001 executes, the processor reads the computer-readable instructions stored in the memory through the communication line/bus 1003 connected to the memory.
[0111] 该实施例中电子设备的处理器执行操作的具体方式已经在有关该从日志中提取 数据的方法的实施例中执行了详细描述, 此处将不做详细阐述说明。 [0111] The specific manner in which the processor of the electronic device performs operations in this embodiment has been described in detail in the embodiment regarding the method for extracting data from the log, and will not be elaborated here.
[0112] 在示例性实施例中, 还提供了一种计算机非易失性可读存储介质, 其上存储有 计算机程序, 计算机程序被处理器执行时实现如上任一方法实施例中的从日志 中提取数据的方法。 其中计算机非易失性可读存储介质例如包括计算机程序的 存储器 250, 上述指令可由服务器 200的中央处理器 270执行以实现上述从日志中 提取数据的方法。 [0112] In an exemplary embodiment, a computer non-volatile readable storage medium is also provided, on which a computer program is stored, and when the computer program is executed by a processor, the slave log in any of the above method embodiments is implemented The method of extracting data. The computer non-volatile readable storage medium includes, for example, a memory 250 of a computer program, and the above instructions can be executed by the central processor 270 of the server 200 to implement the above method of extracting data from the log.
[0113] 该实施例中的处理器执行操作的具体方式已经在有关该从日志中提取数据的方 法的实施例中执行了详细描述, 此处将不做详细阐述说明。 [0113] The specific manner in which the processor performs operations in this embodiment has been described in detail in the embodiment regarding the method of extracting data from the log, and will not be elaborated here.
[0114] 应当理解的是, 本申请并不局限于上面已经描述并在附图中示出的精确结构, 并且可以在不脱离其范围执行各种修改和改变。 本申请的范围仅由所附的权利 要求来限制。 [0114] It should be understood that the present application is not limited to the precise structure that has been described above and shown in the drawings, and that various modifications and changes can be performed without departing from the scope thereof. The scope of this application is limited only by the appended claims.

Claims

权利要求书 Claims
[权利要求 1] 一种从日志中提取数据的方法, 包括: [Claim 1] A method for extracting data from a log, including:
对所运行的系统进行日志更新监控; Perform log update monitoring on the running system;
如果监控到日志更新, 则通过神经网络模型进行所更新日志的识别, 以确定所更新日志的日志类型; If the log update is monitored, the updated log is identified through the neural network model to determine the log type of the updated log;
在配置文件中进行所述日志类型所对应数据提取信息查找, 所述数据 提取信息指示了从所述日志类型的日志中进行数据提取的数据项; 根据所查找到的数据提取信息从所述所更新日志中提取所述数据项对 应的数据。 Searching for data extraction information corresponding to the log type in the configuration file, the data extraction information indicating data items for data extraction from the log of the log type; based on the found data extraction information from the The data corresponding to the data item is extracted from the update log.
[权利要求 2] 根据权利要求 1所述的方法, 其中, 所述通过神经网络模型进行所更 新日志的识别, 以确定所更新日志的日志类型, 包括: [Claim 2] The method according to claim 1, wherein the identification of the updated log by the neural network model to determine the log type of the updated log includes:
构建所更新日志的特征向量; Construct the feature vector of the updated log;
对所述特征向量进行分类预测, 得到所述所更新日志对应的类型标签 根据所述类型标签确定所述所更新日志的日志类型。 Classify and predict the feature vector to obtain a type tag corresponding to the updated log. Determine the log type of the updated log according to the type tag.
[权利要求 3] 根据权利要求 1所述的方法, 其中, 所述通过神经网络模型进行所更 新日志的识别, 以确定所更新日志的日志类型之前, 还包括: 获取若干样本日志, 以及获取对每一所述样本日志所标注的样本标签 通过所述若干样本日志和所对应的类型标签进行所述神经网络模型的 训练; [Claim 3] The method according to claim 1, wherein, before identifying the updated log through the neural network model to determine the log type of the updated log, the method further includes: obtaining a number of sample logs, and obtaining a pair of The sample label marked in each of the sample logs is used to train the neural network model through the several sample logs and the corresponding type tags;
当所述神经网络模型收敛, 结束所述神经网络模型的训练。 When the neural network model converges, the training of the neural network model ends.
[权利要求 4] 根据权利要求 1所述的方法, 其中, 所述在配置文件中进行所述日志 类型所对应数据提取信息查找之前, 还包括: [Claim 4] The method according to claim 1, wherein, before searching the data extraction information corresponding to the log type in the configuration file, the method further includes:
获取与待进行数据提取的日志所对应日志类型相同的模板日志; 在所述模板日志中, 以为所述数据项所配置的变量替换所述数据项所 对应的数据, 并根据替换后的所述模板日志配置得到所述日志类型所 对应的数据提取信息; 由每一所述日志类型所对应的数据提取信息构成所述配置文件。 Obtain a template log of the same type as the log corresponding to the log to be extracted; in the template log, replace the data corresponding to the data item with the variable configured for the data item, and according to the replaced Template log configuration to obtain data extraction information corresponding to the log type; The configuration file is constituted by data extraction information corresponding to each of the log types.
[权利要求 5] 根据权利要求 1所述的方法, 其中, 所述根据所查找到的数据提取信 息从所述所更新日志中提取所述数据项对应的数据之后, 还包括: 进行所述日志类型所对应数据表的查找; [Claim 5] The method according to claim 1, wherein, after extracting the data corresponding to the data item from the updated log according to the found data extraction information, further comprising: performing the log Search for the data table corresponding to the type;
将所提取的所述数据写入所述数据表, 以进行所述数据的存储。 Write the extracted data to the data table to store the data.
[权利要求 6] 根据权利要求 5所述的方法, 其中, 所述将所提取的所述数据写入所 述数据表, 包括: [Claim 6] The method according to claim 5, wherein the writing the extracted data into the data table includes:
在所述数据表中进行所述数据项所关联数据字段的定位; Positioning the data field associated with the data item in the data table;
将所述数据项所对应的数据写入为所述数据字段配置的表单元中。 Writing data corresponding to the data item into a table cell configured for the data field.
[权利要求 7] 一种从日志中提取数据的装置, 包括: [Claim 7] An apparatus for extracting data from a log, including:
监控模块, 被配置为: 对所运行的系统进行日志更新监控; 识别模块, 被配置为: 如果监控到日志更新, 则通过神经网络模型进 行所更新日志的识别, 以确定所更新日志的日志类型; The monitoring module is configured to: monitor the log update of the running system; the identification module is configured to: if the log update is monitored, identify the updated log through the neural network model to determine the log type of the updated log ;
查找模块, 被配置为: 在配置文件中进行所述日志类型所对应数据提 取信息查找, 所述数据提取信息指示了从所述日志类型的日志中进行 数据提取的数据项; The search module is configured to: perform a search for data extraction information corresponding to the log type in a configuration file, and the data extraction information indicates data items for data extraction from the log of the log type;
提取模块, 被配置为: 根据所查找到的数据提取信息从所述所更新日 志中提取所述数据项对应的数据。 The extraction module is configured to: extract data corresponding to the data item from the updated log according to the found data extraction information.
[权利要求 8] 根据权利要求 7所述的装置, 其中, 所述识别模块包括: [Claim 8] The apparatus according to Claim 7, wherein the identification module includes:
特征向量构建单元, 被配置为: 构建所更新日志的特征向量; 分类预测单元, 被配置为: 对所述特征向量进行分类预测, 得到所述 所更新日志对应的类型标签; The feature vector construction unit is configured to: construct the feature vector of the updated log; the classification prediction unit is configured to: perform classification prediction on the feature vector to obtain the type label corresponding to the updated log;
日志类型确定单元, 被配置为: 根据所述类型标签确定所述所更新日 志的日志类型。 The log type determining unit is configured to: determine the log type of the updated log according to the type tag.
[权利要求 9] 根据权利要求 7所述的装置, 所述装置还包括: [Claim 9] The device according to claim 7, the device further comprising:
样本日志获取模块, 被配置为: 获取若干样本日志, 以及获取对每一 所述样本日志所标注的样本标签; A sample log obtaining module, configured to: obtain a number of sample logs, and obtain a sample label marked for each of the sample logs;
训练模块, 被配置为: 通过所述若干样本日志和所对应的类型标签进 行所述神经网络模型的训练; The training module is configured to: enter the sample logs and the corresponding type tags into Perform the training of the neural network model;
训练结束模块, 被配置为: 当所述神经网络模型收敛, 结束所述神经 网络模型的训练。 The training end module is configured to: end the training of the neural network model when the neural network model converges.
[权利要求 10] 根据权利要求 7所述的装置, 所述装置还包括: [Claim 10] The apparatus according to claim 7, the apparatus further comprising:
模板日志获取模块, 被配置为:获取与待进行数据提取的日志所对应 日志类型相同的模板日志; The template log obtaining module is configured to: obtain template logs of the same type as the log corresponding to the log to be extracted;
数据提取信息生成模块, 被配置为: 在所述模板日志中, 以为所述数 据项所配置的变量替换所述数据项所对应的数据, 并根据替换后的所 述模板日志配置得到所述日志类型所对应的数据提取信息; 配置文件生成模块, 被配置为: 由每一所述日志类型所对应的数据提 取信息构成所述配置文件。 The data extraction information generating module is configured to: replace the data corresponding to the data item with the variables configured for the data item in the template log, and obtain the log according to the template log configuration after replacement Data extraction information corresponding to the type; a configuration file generation module configured to: constitute the configuration file from the data extraction information corresponding to each of the log types.
[权利要求 11] 根据权利要求 7所述的装置, 所述装置还包括: [Claim 11] The device according to claim 7, the device further comprising:
数据表查找模块, 被配置为: 进行所述日志类型所对应数据表的查找 数据写入模块, 被配置为: 将所提取的所述数据写入所述数据表, 以 进行所述数据的存储。 The data table search module is configured to: perform a search data writing module of the data table corresponding to the log type, and is configured to: write the extracted data into the data table to store the data .
[权利要求 12] 根据权利要求 11所述的装置, 所述数据写入模块包括: [Claim 12] The device according to claim 11, the data writing module comprising:
数据字段定位单元, 被配置为: 在所述数据表中进行所述数据项所关 联数据字段的定位; The data field positioning unit is configured to: locate the data field associated with the data item in the data table;
写入单元, 被配置为: 将所述数据项所对应的数据写入为所述数据字 段配置的表单元中。 The writing unit is configured to: write data corresponding to the data item into a table unit configured for the data field.
[权利要求 13] 一种电子设备, 包括: 处理器; 及存储器, 所述存储器上存储有计算 机可读指令, 所述计算机可读指令被所述处理器执行时实现如下步骤 对所运行的系统进行日志更新监控; [Claim 13] An electronic device, comprising: a processor; and a memory, computer-readable instructions are stored on the memory, and when the computer-readable instructions are executed by the processor, the following steps are implemented for the running system Carry out log update monitoring;
如果监控到日志更新, 则通过神经网络模型进行所更新日志的识别, 以确定所更新日志的日志类型; If the log update is monitored, the updated log is identified through the neural network model to determine the log type of the updated log;
在配置文件中进行所述日志类型所对应数据提取信息查找, 所述数据 提取信息指示了从所述日志类型的日志中进行数据提取的数据项; 根据所查找到的数据提取信息从所述所更新日志中提取所述数据项对 应的数据。 Searching the data extraction information corresponding to the log type in the configuration file, the data The extraction information indicates a data item for data extraction from the log of the log type; extracting data corresponding to the data item from the updated log according to the found data extraction information.
[权利要求 14] 根据权利要求 13所述的电子设备, 其中, 在所述通过神经网络模型进 行所更新日志的识别, 以确定所更新日志的日志类型的步骤中, 所述 处理器被配置为: [Claim 14] The electronic device according to claim 13, wherein in the step of identifying the updated log through the neural network model to determine the log type of the updated log, the processor is configured to :
构建所更新日志的特征向量; Construct the feature vector of the updated log;
对所述特征向量进行分类预测, 得到所述所更新日志对应的类型标签 根据所述类型标签确定所述所更新日志的日志类型。 Classify and predict the feature vector to obtain a type tag corresponding to the updated log. Determine the log type of the updated log according to the type tag.
[权利要求 15] 根据权利要求 13所述的电子设备, 其中, 在所述通过神经网络模型进 行所更新日志的识别, 以确定所更新日志的日志类型的步骤之前, 所 述处理器还被配置为: [Claim 15] The electronic device according to claim 13, wherein the processor is further configured before the step of identifying the updated log through the neural network model to determine the log type of the updated log For:
获取若干样本日志, 以及获取对每一所述样本日志所标注的样本标签 通过所述若干样本日志和所对应的类型标签进行所述神经网络模型的 训练; Acquiring a plurality of sample logs, and acquiring a sample label marked on each of the sample logs to train the neural network model through the several sample logs and the corresponding type labels;
当所述神经网络模型收敛, 结束所述神经网络模型的训练。 When the neural network model converges, the training of the neural network model ends.
[权利要求 16] 根据权利要求 13所述的电子设备, 其中, 在所述在配置文件中进行所 述日志类型所对应数据提取信息查找的步骤之前, 所述处理器还被配 置为: [Claim 16] The electronic device according to claim 13, wherein, before the step of searching for data extraction information corresponding to the log type in the configuration file, the processor is further configured to:
获取与待进行数据提取的日志所对应日志类型相同的模板日志; 在所述模板日志中, 以为所述数据项所配置的变量替换所述数据项所 对应的数据, 并根据替换后的所述模板日志配置得到所述日志类型所 对应的数据提取信息; Obtain a template log of the same type as the log corresponding to the log to be extracted; in the template log, replace the data corresponding to the data item with the variable configured for the data item, and according to the replaced Template log configuration to obtain data extraction information corresponding to the log type;
由每一所述日志类型所对应的数据提取信息构成所述配置文件。 The configuration file is constituted by data extraction information corresponding to each of the log types.
[权利要求 17] 根据权利要求 13所述的电子设备, 其中, 在所述根据所查找到的数据 提取信息从所述所更新日志中提取所述数据项对应的数据的步骤之后 , 所述处理器还被配置为: [Claim 17] The electronic device according to claim 13, wherein after the step of extracting data corresponding to the data item from the updated log based on the found data extraction information , The processor is further configured to:
进行所述日志类型所对应数据表的查找; Search the data table corresponding to the log type;
将所提取的所述数据写入所述数据表, 以进行所述数据的存储。 Write the extracted data to the data table to store the data.
[权利要求 18] 根据权利要求 17所述的电子设备, 其中, 在所述将所提取的所述数据 写入所述数据表的步骤中, 所述处理器被配置为: 在所述数据表中进行所述数据项所关联数据字段的定位; [Claim 18] The electronic device according to claim 17, wherein, in the step of writing the extracted data into the data table, the processor is configured to: in the data table Locate the data field associated with the data item in
将所述数据项所对应的数据写入为所述数据字段配置的表单元中。 Writing data corresponding to the data item into a table cell configured for the data field.
[权利要求 19] 一种计算机非易失性可读存储介质, 其上存储有计算机程序, 所述计 算机程序被处理器执行时实现下步骤: [Claim 19] A computer non-volatile readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the following steps:
对所运行的系统进行日志更新监控; Perform log update monitoring on the running system;
如果监控到日志更新, 则通过神经网络模型进行所更新日志的识别, 以确定所更新日志的日志类型; If the log update is monitored, the updated log is identified through the neural network model to determine the log type of the updated log;
在配置文件中进行所述日志类型所对应数据提取信息查找, 所述数据 提取信息指示了从所述日志类型的日志中进行数据提取的数据项; 根据所查找到的数据提取信息从所述所更新日志中提取所述数据项对 应的数据。 Searching for data extraction information corresponding to the log type in the configuration file, the data extraction information indicating data items for data extraction from the log of the log type; based on the found data extraction information from the The data corresponding to the data item is extracted from the update log.
[权利要求 20] 根据权利要求 19所述的计算机非易失性可读存储介质, 其中, 在所述 通过神经网络模型进行所更新日志的识别, 以确定所更新日志的日志 类型的步骤中, 所述处理器被配置为: [Claim 20] The computer non-volatile storage medium according to claim 19, wherein in the step of identifying the updated log by the neural network model to determine the log type of the updated log, The processor is configured to:
构建所更新日志的特征向量; Construct the feature vector of the updated log;
对所述特征向量进行分类预测, 得到所述所更新日志对应的类型标签 根据所述类型标签确定所述所更新日志的日志类型。 Classify and predict the feature vector to obtain a type tag corresponding to the updated log. Determine the log type of the updated log according to the type tag.
[权利要求 21] 根据权利要求 19所述的计算机非易失性可读存储介质, 其中, 在所述 通过神经网络模型进行所更新日志的识别, 以确定所更新日志的日志 类型的步骤之前, 所述处理器还被配置为: [Claim 21] The computer non-volatile readable storage medium according to claim 19, wherein before the step of identifying the updated log by the neural network model to determine the log type of the updated log, The processor is also configured to:
获取若干样本日志, 以及获取对每一所述样本日志所标注的样本标签 通过所述若干样本日志和所对应的类型标签进行所述神经网络模型的 训练; Obtaining a number of sample logs, and obtaining the sample label marked for each of the sample logs Training the neural network model through the several sample logs and corresponding type tags;
当所述神经网络模型收敛, 结束所述神经网络模型的训练。 When the neural network model converges, the training of the neural network model ends.
[权利要求 22] 根据权利要求 19所述的计算机非易失性可读存储介质, 其中, 在所述 在配置文件中进行所述日志类型所对应数据提取信息查找的步骤之前 , 所述处理器还被配置为: [Claim 22] The computer non-volatile readable storage medium according to claim 19, wherein before the step of searching for data extraction information corresponding to the log type in the configuration file, the processor It is also configured to:
获取与待进行数据提取的日志所对应日志类型相同的模板日志; 在所述模板日志中, 以为所述数据项所配置的变量替换所述数据项所 对应的数据, 并根据替换后的所述模板日志配置得到所述日志类型所 对应的数据提取信息; Obtain a template log of the same type as the log corresponding to the log to be extracted; in the template log, replace the data corresponding to the data item with the variable configured for the data item, and according to the replaced Template log configuration to obtain data extraction information corresponding to the log type;
由每一所述日志类型所对应的数据提取信息构成所述配置文件。 The configuration file is constituted by data extraction information corresponding to each of the log types.
[权利要求 23] 根据权利要求 19所述的计算机非易失性可读存储介质, 其中, 在所述 根据所查找到的数据提取信息从所述所更新日志中提取所述数据项对 应的数据的步骤之后, 所述处理器还被配置为: 进行所述日志类型所对应数据表的查找; [Claim 23] The computer non-volatile readable storage medium according to claim 19, wherein, in the extraction log, data corresponding to the data item is extracted from the updated log according to the found data extraction information After the step, the processor is further configured to: perform a search of the data table corresponding to the log type;
将所提取的所述数据写入所述数据表, 以进行所述数据的存储。 Write the extracted data to the data table to store the data.
[权利要求 24] 根据权利要求 23所述的计算机非易失性可读存储介质, 其中, 在所述 将所提取的所述数据写入所述数据表的步骤中, 所述处理器被配置为 在所述数据表中进行所述数据项所关联数据字段的定位; [Claim 24] The computer nonvolatile readable storage medium according to claim 23, wherein in the step of writing the extracted data into the data table, the processor is configured To locate the data field associated with the data item in the data table;
将所述数据项所对应的数据写入为所述数据字段配置的表单元中。 Writing data corresponding to the data item into a table cell configured for the data field.
PCT/CN2019/118038 2019-01-04 2019-11-13 Method for extracting data from log, and related device WO2020140624A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910007431.3A CN109783459A (en) 2019-01-04 2019-01-04 The method, apparatus and computer readable storage medium of data are extracted from log
CN201910007431.3 2019-01-04

Publications (1)

Publication Number Publication Date
WO2020140624A1 true WO2020140624A1 (en) 2020-07-09

Family

ID=66500036

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118038 WO2020140624A1 (en) 2019-01-04 2019-11-13 Method for extracting data from log, and related device

Country Status (2)

Country Link
CN (1) CN109783459A (en)
WO (1) WO2020140624A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783459A (en) * 2019-01-04 2019-05-21 平安科技(深圳)有限公司 The method, apparatus and computer readable storage medium of data are extracted from log
CN110347653A (en) * 2019-07-10 2019-10-18 中国工商银行股份有限公司 Data processing method and device, electronic equipment and readable storage medium storing program for executing
CN110990353B (en) * 2019-12-11 2023-10-13 深圳证券交易所 Log extraction method, log extraction device and storage medium
CN112182193B (en) * 2020-10-19 2023-01-13 山东旗帜信息有限公司 Log obtaining method, device and medium in traffic industry

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104717085A (en) * 2013-12-16 2015-06-17 中国移动通信集团湖南有限公司 Log parsing method and device
US20170228265A1 (en) * 2014-08-25 2017-08-10 Nippon Telegraph And Telephone Corporation Log analysis apparatus, log analysis system, log analysis method and computer program
CN107992490A (en) * 2016-10-26 2018-05-04 华为技术有限公司 A kind of data processing method and data processing equipment
CN109002534A (en) * 2018-07-18 2018-12-14 杭州安恒信息技术股份有限公司 A kind of log analysis method, system, equipment and computer readable storage medium
CN109783459A (en) * 2019-01-04 2019-05-21 平安科技(深圳)有限公司 The method, apparatus and computer readable storage medium of data are extracted from log

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101382891A (en) * 2008-09-19 2009-03-11 中兴通讯股份有限公司 Statistical method and apparatus for constructing log output every day

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104717085A (en) * 2013-12-16 2015-06-17 中国移动通信集团湖南有限公司 Log parsing method and device
US20170228265A1 (en) * 2014-08-25 2017-08-10 Nippon Telegraph And Telephone Corporation Log analysis apparatus, log analysis system, log analysis method and computer program
CN107992490A (en) * 2016-10-26 2018-05-04 华为技术有限公司 A kind of data processing method and data processing equipment
CN109002534A (en) * 2018-07-18 2018-12-14 杭州安恒信息技术股份有限公司 A kind of log analysis method, system, equipment and computer readable storage medium
CN109783459A (en) * 2019-01-04 2019-05-21 平安科技(深圳)有限公司 The method, apparatus and computer readable storage medium of data are extracted from log

Also Published As

Publication number Publication date
CN109783459A (en) 2019-05-21

Similar Documents

Publication Publication Date Title
US10459954B1 (en) Dataset connector and crawler to identify data lineage and segment data
AU2019200046B2 (en) Utilizing artificial intelligence to test cloud applications
US10769228B2 (en) Systems and methods for web analytics testing and web development
US11526799B2 (en) Identification and application of hyperparameters for machine learning
US11640563B2 (en) Automated data processing and machine learning model generation
JP6643211B2 (en) Anomaly detection system and anomaly detection method
Bodik et al. Fingerprinting the datacenter: automated classification of performance crises
WO2020140624A1 (en) Method for extracting data from log, and related device
US10878335B1 (en) Scalable text analysis using probabilistic data structures
US20190311114A1 (en) Man-machine identification method and device for captcha
US9104709B2 (en) Cleansing a database system to improve data quality
WO2019061664A1 (en) Electronic device, user's internet surfing data-based product recommendation method, and storage medium
CN113656254A (en) Abnormity detection method and system based on log information and computer equipment
US20220076157A1 (en) Data analysis system using artificial intelligence
Jain et al. A review of unstructured data analysis and parsing methods
CN114416573A (en) Defect analysis method, device, equipment and medium for application program
US11953979B2 (en) Using workload data to train error classification model
CN116225848A (en) Log monitoring method, device, equipment and medium
Turgeman et al. Context-aware incremental clustering of alerts in monitoring systems
CN111368864A (en) Identification method, availability evaluation method and device, electronic equipment and storage medium
US11188405B1 (en) Similar alert identification based on application fingerprints
CN113627514A (en) Data processing method and device of knowledge graph, electronic equipment and storage medium
EP3671467A1 (en) Gui application testing using bots
US11818227B1 (en) Application usage analysis-based experiment generation
US11822578B2 (en) Matching machine generated data entries to pattern clusters

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19908046

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19908046

Country of ref document: EP

Kind code of ref document: A1