CN115437813A - Application program log analysis method and device - Google Patents

Application program log analysis method and device Download PDF

Info

Publication number
CN115437813A
CN115437813A CN202110608693.2A CN202110608693A CN115437813A CN 115437813 A CN115437813 A CN 115437813A CN 202110608693 A CN202110608693 A CN 202110608693A CN 115437813 A CN115437813 A CN 115437813A
Authority
CN
China
Prior art keywords
time window
relation
risk value
vector
current time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110608693.2A
Other languages
Chinese (zh)
Inventor
余航
金华敏
王帅
谢文聪
刘锦泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202110608693.2A priority Critical patent/CN115437813A/en
Publication of CN115437813A publication Critical patent/CN115437813A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0775Content or structure details of the error report, e.g. specific table structure, specific error fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)

Abstract

One aspect of the present disclosure relates to an application log analysis method, the method comprising: collecting log data from the inside of an application program, and recording a function name, function output and a calling function in a triple storage structure; constructing a knowledge graph by taking a function name as an entity, function output as a relation and a calling relation of the function as a direction; extracting the relation and the direction of the knowledge graph, and respectively converting the relation and the direction into relation word vectors and direction word vectors; and performing log analysis based on the relation word vector and the direction word vector.

Description

Application program log analysis method and device
Technical Field
The present disclosure relates to the field of security, and more particularly to the field of log analysis techniques.
Background
In a traditional Web application log analysis system, log data come from application access records, user operation records, program error reporting records, alarm information of security equipment and the like, the data sources are complex, the formats are different, the logic is disordered, and a great amount of time and energy are spent on log data collection, cleaning and format conversion so as to analyze logs. The cost and difficulty of tracing the abnormal log are also improved by processing the data for multiple times.
The log data with complex sources has weak logicality and relevance, noise is brought to log analysis, and difficulty is brought to the context of the log in the relevant analysis.
Disclosure of Invention
The following presents a simplified summary of the disclosure in order to provide a basic understanding of some aspects of the disclosure. However, it should be understood that this summary is not an exhaustive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its sole purpose is to present some concepts of the disclosure in a simplified form as a prelude to the more detailed description that is presented later.
In order to overcome the defects in the prior art, the invention provides a Web application program log analysis method and system based on a knowledge graph.
According to an aspect of the present disclosure, there is provided an application log analysis method, the method including: collecting log data from the inside of an application program, and recording a function name, function output and a calling function in a triple storage structure; constructing a knowledge graph by taking a function name as an entity, function output as a relation and a calling relation of the function as a direction; extracting the relation and the direction of the knowledge map, and respectively converting the relation and the direction into relation word vectors and direction word vectors; and performing log analysis based on the relation word vector and the direction word vector.
According to another aspect of the present disclosure, there is provided an apparatus for application log analysis, comprising: a memory having instructions stored thereon; and a processor configured to execute instructions stored on the memory to perform a method for application log analysis according to the above aspects of the present disclosure.
According to yet another aspect of the present disclosure, there is provided a computer program product comprising computer executable instructions which, when executed by one or more processors, cause the one or more processors to perform a method for application log analysis according to the above-mentioned aspect of the present disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:
FIG. 1 shows a schematic diagram of a prior art application log analysis method;
FIG. 2 illustrates a flow diagram of an application log analysis method according to some embodiments of the present disclosure;
FIG. 3 illustrates a flow diagram of an application log analysis method according to further embodiments of the present disclosure;
FIG. 4 illustrates a flow diagram of an application log analysis method according to further embodiments of the present disclosure;
FIG. 5 illustrates an example of a triple storage structure in accordance with some embodiments of the present disclosure;
FIG. 6 illustrates an example of a knowledge-graph in accordance with some embodiments of the present disclosure;
FIG. 7 illustrates an example of a relationship word vector and a direction word vector in accordance with some embodiments of the present disclosure; and
fig. 8 illustrates an exemplary hardware configuration diagram of an apparatus for application log analysis according to some embodiments of the present disclosure.
Detailed Description
The following detailed description is made with reference to the accompanying drawings and is provided to assist in a comprehensive understanding of various exemplary embodiments of the disclosure. The following description includes various details to aid understanding, but these details are to be regarded as examples only and are not intended to limit the disclosure, which is defined by the appended claims and their equivalents. The words and phrases used in the following description are used only to provide a clear and consistent understanding of the disclosure. In addition, descriptions of well-known structures, functions, and configurations may be omitted for clarity and conciseness. Those of ordinary skill in the art will recognize that various changes and modifications of the examples described herein can be made without departing from the spirit and scope of the disclosure.
Fig. 1 illustrates a common application log analysis method in the prior art. Current log collectors typically collect logs from a variety of data sources, for example, the various sources of logs are shown in the bottom most log source block in fig. 1: such as a Web application access log, middleware log, traffic log, and the like. The format of these log data is not uniform and thus a variety of processes are required after collection and before analysis. As shown in the log processing block of fig. 1, these processes include at least: log parsing, format conversion, data cleansing, and the like.
It can be seen that, because the log collection module in the prior art is arranged at a plurality of log sources outside the application program, the collected log sources are complex, the logicality and the relevance are not strong, the processing difficulty is high, noise is brought to the log analysis, and difficulty is brought to the context of the log in the relevant analysis.
In order to solve the above problems, the present disclosure provides a method and a system for analyzing an application log based on a knowledge graph.
FIG. 2 illustrates a flow diagram of an application log analysis method according to some embodiments of the present disclosure.
First, in step 201, a log is collected from inside the application, and the name of the function, the function output, and the calling function are recorded and stored in the triple storage structure.
According to some embodiments, the log collection module can be deployed in a design language interpreter layer of the application program, and the call path of the internal function of the program and the input and output of the function are continuously monitored and recorded when the application program runs. Compared with the prior art, the data format recorded in the way is uniform, and the relevance and the logicality are strong.
According to some embodiments, logs may be collected from within an application using Runtime application self-protection (RASP) techniques. RASP is a security technology that is implanted inside an application or its runtime environment. The method can control the execution flow of the application program, and can detect and prevent vulnerability attack behaviors in real time. The techniques may be used to address specific network problems (threats, faults, etc.) by providing self-protection measures to thwart relevant network attacks or automatically reconfigure the environment without human intervention.
However, the above embodiments are not limiting, and those skilled in the art can adopt other techniques to collect the log as long as the information about the function call path can be recorded inside the application program.
A triple storage structure herein refers to a data storage structure, such as a database or a table, that stores three parameters (herein, function name, function output, and call function name) in association with each other. This way of storing log data facilitates the construction of a knowledge-graph of the log to be performed in the next step.
An example of a triple storage structure according to some embodiments is shown in fig. 5.
A large number of function calls may occur during the running of an application. The function name and the calling function name in the triple storage structure respectively represent the names of the calling function and the called function. The function output represents an output of a calling function corresponding to a certain input (e.g., a user input). The three parameters of function name, function output and calling function name can completely express the function calling path in the running process of the application program. Taking fig. 4 as an example, after the function "front end" receives the user input, the function output is "token/1.Jpg" and calls the function "download _ photo ()". Thereafter, the function "download _ photo ()" outputs "token" and calls the functions "token _ check ()" and "select _ path ()".
The process then proceeds to step 202. In step 202, a knowledge graph is constructed with the function name as an entity, the function output as a relation, and the call relation of the function as a direction.
Knowledge Graph (KG) is intended to describe various entities or concepts existing in the real world and their relationships, and it constitutes a huge semantic network Graph, nodes represent entities or concepts, and edges are constituted by attributes or relationships. In the prior art, knowledge maps are mainly used to enhance search engine functionality.
Entities, relationships and directions are the basic parameters of a knowledge-graph. Knowledge maps can be constructed by knowing the three. Thus, a knowledge graph can be constructed based on data stored in a triple structure by having the function name as an entity, the function output as a relationship, and the calling relationship of the function as a direction. The calling relation of the function refers to the calling and called relation between two functions. FIG. 6 is an example of a knowledge spectrogram constructed based on the triple storage structure shown in FIG. 5. The arrow direction therein represents a call relationship. For example, the function "front end" calls the function "download _ photo ()".
Then, the process proceeds to step 203. In step 203, the relations and directions of the knowledge-graph are extracted and converted into word vectors, respectively.
The relationships and orientations in the knowledge-graph represent logical relationships between data, which may be represented by natural language. To facilitate analysis of the data, it is converted into a word vector.
According to some embodiments, natural language processing, such as word embedding, may be applied to relationships and directions to convert words into word vectors in numerical form.
According to some embodiments, the conversion into a word vector may be done over word2 vec. Each word vector converted by word2vec has the same number of dimensions. Those skilled in the art will appreciate that other word embedding methods are within the scope of the present invention.
FIG. 7 shows an example of a transformed relation word vector and direction word vector, where the relation is represented by R and the direction is represented by D.
The process then proceeds to step 204. In step 204, a log analysis is performed based on the relationship word vectors and the direction word vectors. For example, log analysis may include determining anomalous log data.
Any suitable analysis method may be employed by those skilled in the art to perform log analysis based on the relationship word vectors and the direction word vectors. Because the relation word vector and the direction word vector have strong logic relation and accurate numerical value, the analysis accuracy can be improved.
The above-described embodiments have the advantages of collecting log data from the inside of the application program, unifying data formats and having strong logicality and relevance. Based on the data with uniform format and strong logicality and relevance, the knowledge graph can be constructed, and the quantitative analysis of log data is realized based on the vectorization expression of the specific parameters of the knowledge graph, so that the log analysis accuracy is improved.
A method of analyzing log data based on a time window according to further embodiments of the present invention will be described below. That is, the extracted relationship and direction are divided by time window, and the log is analyzed in units of time window. Compared with the log analysis method in the prior art, the log analysis accuracy can be improved, and program bugs can be quickly positioned and traced.
FIG. 3 illustrates a time window based log analysis method according to some embodiments of the invention. The method is a preferred embodiment of step 204 in the log analysis method described in fig. 2, based on the relationship word vectors and direction word vectors obtained through steps 201-203 of fig. 2.
First, in step 301, the relationship and direction in the current time window are obtained and converted into a relationship word vector and a direction word vector. Then, in step 302, a relationship risk value and a direction risk value of the current time window are calculated based on the obtained relationship word vector and direction word vector, respectively.
According to some embodiments, a relationship risk value for a current time window is determined based on a difference between a relationship word vector in the current time window and a relationship word vector in a previous time window; and determining a directional risk value for the current time window based on a difference between the directional word vector in the current time window and the directional word vector in the previous time window.
In general, if there is no anomaly, the values of the relationship or direction within each time window do not change much; in the case of an abnormality, the value of the relationship or direction in the current time window changes greatly relative to the previous time window. Therefore, whether there is an abnormality in the log can be determined using a risk value indicating the magnitude of the difference between the relationship or direction in the current time window and the relationship or direction in the previous time window.
In some embodiments, the relationship word vectors and direction word vectors within the previous time window may be stored in memory when the log of the previous time window is analyzed and read when needed.
According to a preferred embodiment, the relational risk value is determined by calculating the angle between the relation word vector in the current time window and the average vector of the relation word vectors in the previous time window, and the directional risk value is determined by calculating the angle between the direction word vector in the current time window and the average vector of the direction word vectors in the previous time window.
The included angle ranges between 0 degrees and 180 degrees. Specifically, if the angle between the relation word vector in the current time window and the average vector of the relation word vectors in the previous time window is 0 degrees (i.e., the direction is the same), it indicates that the two vectors are least different. If the angle between the relation word vector in the current time window and the average vector of the relation word vectors in the last time window is 180 degrees (i.e. opposite direction), it indicates that the two vectors are the most different.
Preferably, the relational risk value may be expressed as a cosine of an angle between the relation word vector in the current time window and an average vector of the relation word vectors in the previous time window, and the directional risk value may be expressed as a cosine of an angle between the direction word vector in the current time window and an average vector of the direction word vectors in the previous time window. The risk value thus calculated ranges between-1 and 1. When the included angle is 0 degrees, the risk value is 1; and when the included angle is 180 degrees, the risk value is-1. That is, the closer the risk value is to-1, the higher the likelihood that the log data is anomalous.
The risk value expressed in the cosine of the angle can be calculated by the following equation:
relation risk value r:
Figure BDA0003095098210000071
wherein Z ri All dimensions of the mean vector of the relation word vectors in the last time window, R i K is the number of dimensions of the relation word vector for all dimensions of the current relation word vector in the current time window.
Directional risk value d:
Figure BDA0003095098210000072
wherein Z di All dimensions of the mean vector of the direction word vectors in the last time window, D i K is the number of dimensions of the direction word vector for all dimensions of the current direction word vector within the current time window.
Since both the relation word vector and the direction word vector are converted using the same method (e.g., word2 vec), each word vector has the same number of dimensions.
Wherein, an average vector of the relation word vectors can be calculated by performing average pooling on the relation word vectors in the previous time window; and calculating an average vector of the direction word vectors by average pooling of the direction word vectors within the previous time window.
The average vector can be calculated by means of average pooling by the following equation:
average vector of direction word vectors in the last time window Zr:
Figure BDA0003095098210000081
average vector Z of direction word vectors in last time window d
Figure BDA0003095098210000082
Wherein Q is the number of related word vectors or the number of directional word vectors obtained in the last time window.
Through the calculation, the final result of the risk value is expressed as the cosine of the included angle [ -1,1], and the larger the risk value is, the safer the risk value is, and the smaller the risk value is, the more possible the abnormality exists. The calculation method calculates the mean value of r and d according to the dimension, and considers the information of each word, so that high-precision judgment can be realized.
Then, in step 303, the relationship risk value and the direction risk value are respectively compared with their respective thresholds, and it is determined whether at least one of the relationship risk value and the direction risk value is lower than the corresponding threshold. The threshold value may be preset according to the calculation manner of the risk value. For example, in an algorithm that calculates r and d according to the cosine of the included angle, the value range of r and d is [ -1,1]. As the value approaches 1, the change following a time window is smaller, and the value approaches-1, the change following a time window is larger. Therefore, it is possible to define that an abnormality is smaller than 0 with 0 as a boundary point. However, the present invention is not limited thereto, and the threshold value may be changed according to the need.
And if at least one of the relationship risk value and the direction risk value is lower than the corresponding threshold value, indicating that the log in the current time window period has abnormity. At this point, flow proceeds to step 304.
At step 304, anomalies are noted and displayed in the knowledge graph. For example, the outliers may be marked by rendering markers of different colors. By marking in the knowledge graph, the abnormal logs can be traced quickly, and developers are assisted in positioning program bugs. Those skilled in the art will appreciate that the labeling approach is not limited thereto, and other labeling approaches are possible.
After the exception is noted, the process proceeds to step 305.
If both the relationship risk value and the orientation risk value are above the corresponding thresholds, i.e., the application is operating normally in the current time window, the process proceeds directly from step 303 to step 305.
In step 305, it is determined whether a log analysis end condition is satisfied. E.g. whether the log has been logged for a sufficiently long time. If so, the flow may end. If the end condition is not satisfied, proceed to step 306. At step 306, the relationships and directions within the next time window are obtained. The process then returns to step 302 and begins the calculation of the relationship risk value and the orientation risk value within the next time window.
An adaptive time window period based log data analysis method according to further embodiments of the present disclosure is described below with reference to fig. 4. The time window period consists of a predetermined number of time windows. The time windows within the same time window period have the same length. Under the condition that the change of the log data among different time windows is not obvious, the analysis period can be properly prolonged, and the log data is divided and analyzed by taking the time window period as a unit, so that the resources are saved, and the analysis efficiency is improved.
According to one embodiment, the length of the next time window period is adaptively adjusted based on the relationship risk value and the direction risk value within each time window of the current time window period. That is, the time window periods with different sizes may be divided according to the change of the risk value, so that the time window period may be shortened to increase the detection granularity when an unexpected exception occurs, and the time window period may be lengthened to avoid unnecessary waste of resources when the application program operates normally.
Steps 401-402 of fig. 4 are the same as steps 301-302 of fig. 3 and will not be described in detail here.
In step 403, it is determined whether the relationship risk values and the direction risk values for all time windows in a time window period have been calculated.
According to one embodiment, whether the calculation of all time windows in a cycle has been completed may be determined by counting with a counter i. And assuming that N time windows exist in the current time window period, when the value of i reaches N, judging that the relation risk value and the direction risk value of all the time windows in the current time window period are calculated. At this point, the process proceeds to step 405. If the value of i is not equal to N, the process proceeds to step 404, and the relationship and direction of the next time window are continuously obtained.
At step 405, it is determined whether an anomaly exists in the current time window period based on the calculated relationship risk values and direction risk values for all time windows in the current time window period. Those skilled in the art can adopt different judgment rules according to specific situations. For example, it may be arranged that when the risk values of a predetermined number of time windows are below a threshold value, it is determined that there is an abnormality in the current time window period. For example, if N is 5, it may be set that when the risk values of 3 current time windows are lower than the threshold, it is determined that there is an abnormality in the current time window period. The setting of the threshold and the calculation of the risk value may be as described above with respect to step 302 of fig. 3.
If an anomaly exists, the process proceeds to step 406 to mark the anomaly on the knowledge-graph. Thereafter, the process proceeds to step 407. If there are no exceptions, the process proceeds directly to step 407.
In step 407, it is determined whether a log analysis end condition is satisfied. E.g. whether the log has been logged for a sufficiently long time. If so, the flow may end. If the end condition is not satisfied, proceed to step 408.
At step 408, the length of the next time window period is adaptively adjusted based on the relationship risk value and the direction risk value within each time window of the current time window period.
The length of the time window period is the product of the length of each time window in the time window period and the number of time windows, and thus the length of the time window period can be adjusted by adjusting both the length of the time window and the number of time windows.
According to one embodiment, the number of time windows in the next time window period is adaptively adjusted based on the relational risk values within each time window of the current time window period.
When an attacker sends an attack payload, the first thing that affects is the eigenvalues of the relationship word vectors. Thus, the number n of time windows in the next time window period may be calculated based on the relationship risk value r within each time window of the current time window period as follows:
Figure BDA0003095098210000101
where N is the number of time windows in the current time window period, r i Is the risk value of the current time window in the current time window period.
According to one embodiment, the length of each time window in the next time window period is adaptively adjusted based on the relationship risk value and the direction risk value within each time window of the current time window period.
When the attack is received, the change of the feature value of the relation word vector and the feature value of the direction word vector is obvious, so the length t (in millisecond) of each time window in the next time window period can be calculated as follows based on the relation risk value and the direction risk value in each time window of the current time window period:
Figure BDA0003095098210000111
wherein the size of the time window in the current time period is T (in milliseconds), N is the number of the time windows in the current time window period, r i For the value of the risk of the relationship of the current time window in the period of the current time window, d i Is the current time window periodDirectional risk value of previous time window.
Thus, the total length of the next time window period L = n × t.
With the method shown in fig. 4, if an anomaly was found in the last time window period, i.e. the risk value is less than the threshold, the length of the next time window period can be shortened. This may increase the detection granularity. If no abnormality is found in the previous time window period, i.e. the risk value is higher than the threshold, the length of the next time window period can be increased, and unnecessary resource consumption is reduced.
After the length of the next time window period is obtained, the counter i is cleared. The process returns to step 404 to begin log analysis for the next time window period based on the adjusted period length.
Fig. 8 illustrates an exemplary hardware configuration diagram of an apparatus 800 for application log analysis that may implement embodiments according to the present disclosure.
The application log analysis apparatus 800 is an example of a hardware device to which the above-described aspect of the present disclosure can be applied. The application log analysis device 800 may be any machine configured to perform processing and/or computing. The application log analysis device 800 may be, but is not limited to, a switch, router, workstation, server, desktop computer, laptop computer, tablet computer, personal Data Assistant (PDA), smart phone, in-vehicle computer, or a combination thereof.
As shown in fig. 8, application log analysis device 800 may include one or more elements that may be connected to or in communication with bus 502 via one or more interfaces. Bus 802 can include, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus, among others. The application log analysis apparatus 800 may include, for example, one or more processors 804, one or more input devices 806, and one or more output devices 808. The one or more processors 804 may be any kind of processor and may include, but are not limited to, one or more general purpose processors or special purpose processors (such as special purpose processing chips). The processor 802 is configured, for example, to implement the application log analysis method in the present disclosure. Input device 806 may be any type of input device capable of inputting information to a computing device and may include, but is not limited to, a mouse, a keyboard, a touch screen, a microphone, and/or a remote control. Output device 808 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer.
The application log analysis apparatus 800 may also include or be connected to a non-transitory storage device 814, which non-transitory storage device 814 may be any non-transitory and may implement a data storage, and may include, but is not limited to, a disk drive, an optical storage device, a solid state memory, a floppy disk, a flexible disk, a hard disk, a magnetic tape, or any other magnetic medium, a compact disk or any other optical medium, a cache memory, and/or any other memory chip or module, and/or any other medium from which a computer may read data, instructions, and/or code. The application log analysis device 800 may further include a Random Access Memory (RAM) 810 and a Read Only Memory (ROM) 812. The ROM 812 may store programs, utilities or processes to be executed in a nonvolatile manner. RAM 810 may provide volatile data storage and store instructions related to the operation of application log analysis device 800. The application log analysis device 800 can also include a network/bus interface 816 that is coupled to a data link 818. The network/bus interface 816 may be any kind of device or system capable of enabling communication with external devices and/or networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication device, and/or a chipset (such as bluetooth) TM Devices, 802.11 devices, wiFi devices, wiMax devices, cellular communications facilities, etc.).
The present disclosure may be implemented as any combination of apparatus, systems, integrated circuits, and computer programs on non-transitory computer readable media. One or more processors may be implemented as an Integrated Circuit (IC), an Application Specific Integrated Circuit (ASIC), or a large scale integrated circuit (LSI), a system LSI, or a super LSI, or as an ultra LSI package that performs some or all of the functions described in this disclosure.
The present disclosure includes the use of software, applications, computer programs or algorithms. Software, applications, computer programs, or algorithms may be stored on a non-transitory computer readable medium to cause a computer, such as one or more processors, to perform the steps described above and depicted in the figures. For example, one or more memories store software or algorithms in executable instructions and one or more processors can associate a set of instructions to execute the software or algorithms to provide various functionality in accordance with embodiments described in this disclosure.
Software and computer programs (which may also be referred to as programs, software applications, components, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural, object-oriented, functional, logical, or assembly or machine language. The term "computer-readable medium" refers to any computer program product, apparatus or device, such as magnetic disks, optical disks, solid state storage devices, memories, and Programmable Logic Devices (PLDs), used to provide machine instructions or data to a programmable data processor, including a computer-readable medium that receives machine instructions as a computer-readable signal.
The subject matter of the present disclosure is provided as examples of apparatus, systems, methods, and programs for performing the features described in the present disclosure. However, other features or variations are contemplated in addition to the features described above. It is contemplated that the implementation of the components and functions of the present disclosure may be accomplished with any emerging technology that may replace the technology of any of the implementations described above.
Additionally, the above description provides examples, and does not limit the scope, applicability, or configuration set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the spirit and scope of the disclosure. Various embodiments may omit, substitute, or add various procedures or components as appropriate. For example, features described with respect to certain embodiments may be combined in other embodiments.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.

Claims (15)

1. A method of application log analysis, the method comprising:
collecting log data from the interior of an application program, and recording a function name, function output and a calling function in a triple storage structure;
constructing a knowledge graph by taking a function name as an entity, function output as a relation and a calling relation of the function as a direction;
extracting the relation and the direction of the knowledge graph, and respectively converting the relation and the direction into relation word vectors and direction word vectors; and
log analysis is performed based on the relation word vectors and the direction word vectors.
2. The method of claim 1, further comprising extracting relationships and orientations of the knowledge-graph based on a time window.
3. The method of claim 2, comprising:
determining a relation risk value of the current time window based on the difference between the relation word vector in the current time window and the relation word vector in the previous time window; and
determining a directional risk value for the current time window based on a difference between the directional word vector in the current time window and the directional word vector in the previous time window.
4. The method of claim 2, wherein,
determining a relation risk value by calculating an included angle between the relation word vector in the current time window and the average vector of the relation word vectors in the previous time window; and
and determining the direction risk value by calculating the included angle between the direction word vector in the current time window and the average vector of the direction word vectors in the previous time window.
5. The method of claim 4, wherein,
calculating an average vector of the relation word vectors by performing average pooling on the relation word vectors in the last time window; and
calculating an average vector of direction word vectors by average pooling of direction word vectors within the last time window.
6. The method of claim 3, wherein,
the relational risk value r is calculated as follows:
Figure FDA0003095098200000021
wherein Z ri All dimensions of the mean vector of the relation word vectors in the last time window, R i All dimensions of the current relation word vector in the current time window are obtained, and k is the dimension number of the relation word vector; and
the directional risk value d is calculated as follows:
Figure FDA0003095098200000022
wherein Z di All dimensions of the mean vector of the direction word vectors in the last time window, D i K is the number of dimensions of the direction word vector for all dimensions of the current direction word vector within the current time window.
7. The method of claim 3, wherein the log within the current time window is determined to be abnormal when at least one of the relationship risk value and the orientation risk value is below a preset threshold.
8. The method of claim 3, further comprising: the relationships and orientations of the knowledge-graph are extracted based on time window periods, wherein one time window period includes a predetermined number of time windows.
9. The method of claim 8, further comprising: the length of the next time window period is adaptively adjusted based on the relationship risk value and the direction risk value within each time window of the current time window period.
10. The method of claim 8, wherein the length of each time window in the next time window period is adaptively adjusted based on the relationship risk value and the direction risk value within each time window of the current time window period.
11. The method of claim 8, wherein the number of time windows in the next time window period is adaptively adjusted based on a relational risk value within each time window of the current time window period.
12. The method of claim 1, collecting logs by deploying a log collection module at a language interpretation layer of an application design.
13. The method of claim 1, wherein the relationship and direction represented by the character are converted into a relationship word vector and a direction word vector, respectively, by word2 vec.
14. An apparatus for application log analysis, the apparatus comprising:
a memory having instructions stored thereon; and
a processor configured to execute instructions stored on the memory to perform the method of any of claims 1 to 13.
15. A computer program product comprising computer-executable instructions that, when executed by one or more processors, implement the method of any one of claims 1 to 13.
CN202110608693.2A 2021-06-01 2021-06-01 Application program log analysis method and device Pending CN115437813A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110608693.2A CN115437813A (en) 2021-06-01 2021-06-01 Application program log analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110608693.2A CN115437813A (en) 2021-06-01 2021-06-01 Application program log analysis method and device

Publications (1)

Publication Number Publication Date
CN115437813A true CN115437813A (en) 2022-12-06

Family

ID=84271650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110608693.2A Pending CN115437813A (en) 2021-06-01 2021-06-01 Application program log analysis method and device

Country Status (1)

Country Link
CN (1) CN115437813A (en)

Similar Documents

Publication Publication Date Title
CN109905385B (en) Webshell detection method, device and system
CN112148772A (en) Alarm root cause identification method, device, equipment and storage medium
CN110941951B (en) Text similarity calculation method, text similarity calculation device, text similarity calculation medium and electronic equipment
CN109144879B (en) Test analysis method and device
CN113158189A (en) Method, device, equipment and medium for generating malicious software analysis report
CN106789871A (en) Attack detection method, device, the network equipment and terminal device
CN113360300B (en) Interface call link generation method, device, equipment and readable storage medium
CN114564947A (en) Rail transit signal fault operation and maintenance method and device and electronic equipment
CN112817877B (en) Abnormal script detection method and device, computer equipment and storage medium
CN112367222B (en) Network anomaly detection method and device
CN115437813A (en) Application program log analysis method and device
CN110941828A (en) Android malicious software static detection method based on android GRU
CN110598115A (en) Sensitive webpage identification method and system based on artificial intelligence multi-engine
CN114143074B (en) webshell attack recognition device and method
CN109359295A (en) Semantic analytic method, device, computer equipment and the storage medium of natural language
CN114117419A (en) Template injection attack detection method, device, equipment and storage medium
CN113836297A (en) Training method and device for text emotion analysis model
CN113139184A (en) Method for detecting Binder communication overload vulnerability based on static analysis
CN111291186A (en) Context mining method and device based on clustering algorithm and electronic equipment
KR20210085694A (en) Apparatus for image captioning and method thereof
US11669314B2 (en) Method and system to enable print functionality in high-level synthesis (HLS) design platforms
CN117093715B (en) Word stock expansion method, system, computer equipment and storage medium
CN107526842A (en) A kind of batch monitors multiple Website page method and devices
Xu et al. Software Vulnerabilities Detection Based on a Pre-trained Language Model
CN115719423A (en) Similarity-based malicious information detection method and device and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination