CN112804196A

CN112804196A - Log data processing method and device

Info

Publication number: CN112804196A
Application number: CN202011567623.9A
Authority: CN
Inventors: 梁宏宇; 喻波; 王志海; 安鹏
Original assignee: Beijing Wondersoft Technology Co Ltd
Current assignee: Beijing Wondersoft Technology Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-05-14

Abstract

The application discloses a method and a device for processing log data. Wherein, the method comprises the following steps: collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user representation of a logged-in user of the client terminal, and behavior data generated when an operation occurs on the client terminal; and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists. The method and the device solve the technical problems that the traditional information security technology carries out protection detection by artificially setting a threshold based on rules and expert experience, and false alarm is caused due to the fact that malicious attacks in a legal process have a security visibility blind zone and unknown attacks cannot be detected.

Description

Log data processing method and device

Technical Field

The present application relates to the field of information security, and in particular, to a method and an apparatus for processing log data.

Background

With the deep advance of enterprise digital transformation, information leakage events occur at times, and the confidentiality, availability and integrity of enterprise data assets are threatened, which is now the main threat of each enterprise in security management. How to improve the visibility of internal threats and external attacks by utilizing big data and machine learning becomes an important development trend of security practitioners and becomes a key point of enterprise attention. Network attack technologies for enterprise sensitive data are continuously upgraded in recent years, and the network attack technologies can be divided into external attack behaviors and internal threat behaviors according to attack sources. The external attack behavior is hidden in a legal process, so that the monitoring and searching and killing of a safety protection system can be avoided, and the target system can be rapidly invaded; the internal threat behavior can be disguised as a legal user, so as to break through the network boundary, steal the network certificate and cause the internal information security threat.

According to the statistics survey of Cisco, the method comprises the following steps: the enterprise is attacked, more than 70% of the attacking behaviors are difficult to give out security alarms, the true attack percentage in the given security alarms is less than 40%, and less than 10% of the attack percentage can be effectively treated. The traditional information security technology is based on rules and expert experience, protection detection is carried out by artificially setting a threshold, a security visibility blind area exists in the face of malicious attacks in a legal process, and the unknown attacks cannot be detected to escape and bypass or cause false alarm.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the application provides a log data processing method and device, and the method and device at least solve the technical problems that in the conventional information security technology, protection detection is carried out by manually setting a threshold based on rules and expert experience, and false alarm is caused because malicious attacks in a legal process have a security visibility blind area and unknown attacks cannot be detected.

According to an aspect of an embodiment of the present application, there is provided a method for processing log data, including: collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user representation of a logged-in user of the client terminal, and behavior data generated when an operation occurs on the client terminal; and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists.

Optionally, after collecting the log data, the method further includes: preprocessing the log data, wherein the preprocessing comprises at least one of the following steps: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.

Optionally, acquiring data to be detected based on the log data includes: performing feature extraction on the log data based on the type of the extraction parameter, and extracting to obtain a feature vector set, wherein the type of the extraction parameter comprises at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic; and constructing to-be-detected data based on the characteristic vector set.

Optionally, before analyzing the data to be detected based on the safety standard data and determining whether there is abnormal data, the method further includes: constructing safety standard data, which comprises the following steps: acquiring sample data, wherein the sample data comprises behavior data generated on a main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and training a neural network model by adopting sample data to generate an abnormal behavior detection model, wherein the abnormal behavior detection model is used for representing a base line of the safety standard data.

Optionally, analyzing the data to be detected based on the safety standard data to determine whether there is abnormal data, including: inputting the data to be detected into an abnormal behavior detection model, and acquiring a comparison result of the data to be detected and safety standard data; if the comparison result is that the data to be detected and the safety standard data are within the error range, abnormal data do not exist; otherwise, determining that abnormal data exists in the data to be detected.

Optionally, after determining that abnormal data exists in the data to be detected, the method further includes: carrying out abnormal behavior scoring on abnormal data in the data to be detected; and calling the abnormal data subjected to the abnormal behavior scoring to adjust the safety standard data.

According to another aspect of the embodiments of the present application, there is also provided a processing apparatus of log data, including: the acquisition module is used for acquiring log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; the acquisition module is used for acquiring data to be detected based on the log data, wherein the data to be detected comprises: the method comprises the steps that a user picture of a login user of a client terminal and behavior data generated when operation occurs on the client terminal are displayed; and the analysis module is used for analyzing the data to be detected based on the safety standard data and determining whether abnormal data exists.

Optionally, the apparatus further comprises: the preprocessing module is used for preprocessing the log data, wherein the preprocessing comprises at least one of the following steps: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.

Optionally, the obtaining module includes: the extraction module is used for extracting the characteristics of the log data based on the types of the extraction parameters, and extracting to obtain a characteristic vector set, wherein the types of the extraction parameters comprise at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic; and the first construction module is used for constructing and obtaining the data to be detected based on the feature vector set.

Optionally, the apparatus further comprises: the second construction module is used for constructing the safety standard data; wherein the second building block comprises: the sub-acquisition module is used for acquiring sample data, wherein the sample data comprises behavior data generated on the main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and the generating module is used for training the neural network model by adopting the sample data to generate an abnormal behavior detection model, and the abnormal behavior detection model is used for representing the baseline of the safety standard data.

Optionally, the analysis module comprises: the behavior detection module is used for inputting the data to be detected into the abnormal behavior detection model and acquiring a comparison result of the data to be detected and the safety standard data; the first determining module is used for determining that abnormal data does not exist if the comparison result shows that the data to be detected and the safety standard data are within the error range; and the second determining module is used for determining that abnormal data exists in the data to be detected if the comparison result shows that the data to be detected and the safety standard data are not in the error range.

Optionally, the apparatus further comprises: the scoring module is used for scoring abnormal behaviors of abnormal data in the data to be detected; and the adjusting module is used for calling the abnormal data which executes the abnormal behavior scoring to adjust the safety standard data.

According to still another aspect of the embodiments of the present application, there is provided a non-volatile storage medium, where the non-volatile storage medium includes a stored program, and when the program runs, a device in which the non-volatile storage medium is located is controlled to execute the above processing method of log data.

According to still another aspect of the embodiments of the present application, there is also provided a processor configured to execute a program stored in a memory, where the program executes the above processing method of log data.

In the embodiment of the present application, collecting log data is adopted, wherein the source of the log data includes: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user image of a login user of the client terminal and behavior data generated when an operation occurs on the client terminal; the method comprises the steps of analyzing data to be detected based on safety standard data, determining whether abnormal data exist or not, conducting log collection and big data analysis modeling through a cloud computing and big data platform based on a User and Entity Behavior Analysis (UEBA) theory, constructing a correlation analysis and tracking traceability system based on the UEBA, and conducting continuous risk monitoring, so that the technical effect of rapidly finding and locating abnormal information in an enterprise network is achieved, and the technical problems that a traditional information safety technology is based on rules and expert experience, protection detection is conducted through manual setting of a threshold value, a safety visibility blind area exists in malicious attacks in a legal process, and false alarm is caused due to the fact that unknown attacks cannot be detected are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a flowchart of a method for processing log data according to an embodiment of the present application;

FIG. 2 is a schematic illustration of a constructed baseline according to an embodiment of the present application;

FIG. 3 is a schematic comparison of a baseline according to an embodiment of the present application;

fig. 4 is a UEBA architecture diagram according to an embodiment of the present application;

FIG. 5 is a schematic illustration of a continuous monitoring of abnormal behavior in accordance with an embodiment of the present application;

FIG. 6 is a schematic diagram of adaptive dynamic risk identification using a machine learning algorithm according to an embodiment of the present application;

fig. 7 is a block diagram of a log data processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, partial terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

analyzing user entity behaviors: user and entity behavior analysis, UEBA, is a representative of a new information security technology, and the technology takes a user as a view point, and is converted from traditional rule analysis into association analysis, behavior modeling and anomaly analysis.

And (3) a logistic regression algorithm: the model is also called logistic regression analysis, is a generalized linear regression analysis model and is commonly used in the fields of data mining, automatic disease diagnosis, economic prediction and the like. In essence, the abnormal electricity utilization identification belongs to a two-classification problem, so a logistic regression algorithm can be adopted for classification.

According to an embodiment of the present application, there is provided an embodiment of a method for processing log data, it should be noted that the steps shown in the flowchart of the figure may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from the order shown.

Fig. 1 is a flowchart of a method for processing log data according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:

step S102, collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server;

it should be noted that the log data in step S102 is multi-source log data, and the acquisition of the log is realized by performing multi-source log acquisition in step S102, and the acquisition system mainly realizes functions of data acquisition, data preprocessing, baseline construction, model prediction, abnormal scoring, disposal response, and the like.

And multi-source log collection is carried out, and collection, classification and receiving of relevant terminal logs and host logs are mainly completed.

Step S104, acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user representation of a logged-in user of the client terminal, and behavior data generated when an operation occurs on the client terminal;

and S106, analyzing the data to be detected based on the safety standard data, and determining whether abnormal data exists.

Through the steps, log collection and big data analysis modeling are carried out through a cloud computing and big data platform based on a User and Entity Behavior Analysis (UEBA) theory, a correlation analysis and tracking traceability system based on the UEBA is constructed, and risks are continuously monitored, so that the technical effect of rapidly finding and positioning abnormal information in an enterprise network is achieved.

According to an optional embodiment of the present application, after the completion of the step S102, the log data is preprocessed, wherein the preprocessing includes at least one of the following: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.

In this step, the normalization processing of the non-normalized data and the non-uniform dimension data is mainly completed.

According to another alternative embodiment of the present application, step S104 is implemented by: performing feature extraction on the log data based on the type of the extraction parameter, and extracting to obtain a feature vector set, wherein the type of the extraction parameter comprises at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic; and constructing to-be-detected data based on the feature vector set.

In this step, UEBA is driven by big data, and based on multi-source logs of traffic, file operations, web access and mailing, and the like, feature vectors are extracted from behavior logs in combination with 5W1H model (Who, What, When, Where, solution (Why) and How (How)) criteria, so as to construct normal user, entity behavior baselines and user profiles.

In some optional embodiments of the present application, before performing step S106, it is further required to construct security standard data, and the step includes: collecting sample data, wherein the sample data comprises behavior data generated on a main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and training a neural network model by adopting sample data to generate an abnormal behavior detection model, wherein the abnormal behavior detection model is used for representing a base line of the safety standard data.

Fig. 2 is a schematic diagram of establishing a baseline according to an embodiment of the present application, and as shown in fig. 2, the baseline shows individual behavior of 3 users, including three business operation scenarios, i.e., mailing, file operation, and web access. And performing longitudinal analysis and calculation on the user behavior to obtain a department behavior baseline.

The multi-dimensional safety baseline is composed of an individual behavior baseline, a department behavior baseline, a scene behavior baseline and the like, wherein the individual behavior baseline reflects individual behavior characteristics of users, the department baseline reflects group characteristics of departments to which the users belong, and the scene baseline reflects operation behavior characteristics of the users and the entities. A dynamic baseline of real-time access behaviors of sensitive data is generated based on learning algorithms such as random forests, SVM and the like, a full-time context environment is constructed through group baseline analysis, the limitation of single behaviors is avoided, a safety baseline is updated in real time through distributed real-time data calculation, and a complete dynamic behavior baseline is realized.

According to an alternative embodiment of the present application, step S106 is implemented by: inputting the data to be detected into an abnormal behavior detection model, and acquiring a comparison result of the data to be detected and safety standard data; if the comparison result is that the data to be detected and the safety standard data are within the error range, abnormal data do not exist; otherwise, determining that abnormal data exists in the data to be detected.

Fig. 3 is a schematic diagram of comparing baselines according to an embodiment of the present application, and as shown in fig. 3, the abnormal behavior detection mainly performs abnormal behavior detection from multidimensional angles such as account login and logout times, IP call times, access time intervals, and the like through deep learning algorithms such as CNN, RNN, and the like for a statistical indicator, a pattern sequence, a time sequence, and a behavior logic sequence. And finally, based on an iterative evaluation mechanism, performing weighted combination of various behavior alarms, abnormal detection, group comparison analysis and the like, continuously optimizing iteration to obtain abnormal behavior scores, and reflecting the evaluation behaviors to an actual scene according to an actual service scene.

In an optional embodiment of the present application, after determining that there is abnormal data in the data to be detected, performing abnormal behavior scoring on the abnormal data in the data to be detected; and calling the abnormal data subjected to the abnormal behavior scoring to adjust the safety standard data.

By continuously tracking the behaviors of the user and the entity, continuously performing risk assessment, constructing a complete time line, performing comprehensive risk assessment on the user and the entity, and pre-judging abnormal behaviors, the number of false alarm alarms can be greatly reduced.

The business scene of the log data processing method provided by the embodiment of the application mainly provides internal and external threat detection for enterprises, and mainly comprises intentional data collection, abnormal and illegal access such as permission overrun and the like, user login abnormality and user source address behavior abnormality; the method mainly comprises the steps of externally judging abnormal behaviors through correlation analysis, wherein data leakage, continuous data external transmission, abnormal website access, discovery of a lost host and the like exist; through linkage of business strategies, user behaviors are controlled in time, tracing and response handling are carried out, centralized display of the whole life cycle of sensitive data is achieved, and data resource distribution is clear at a glance.

Fig. 4 is a schematic diagram of a UEBA architecture according to an embodiment of the present application, where the functions mainly implemented by the method include: multi-source log data, feature extraction, baseline construction, abnormal behavior detection, risk assessment and service scenes.

And through designing a UEBA overall architecture, user behaviors are controlled in time through business strategy linkage, and tracing and source tracing and response handling are performed.

And (4) continuously monitoring the risk by constructing a UEBA processing flow.

And (4) adopting machine learning self-adaptive dynamic risk identification by constructing a UEBA processing flow.

The abnormal behavior detection is mainly based on the user abnormal behavior analysis of the logistic regression algorithm, and the user abnormal behavior analysis method based on the logistic regression algorithm is provided, wherein the dependent variable is the abnormal behavior attribute, and the independent variable is the natural characteristic attribute of the user. And (3) carrying out real-time data prediction through a unitary linear regression model, carrying out data trend prediction by adopting curve models such as quadratic parabola and the like, and preliminarily processing the characteristics of the N-dimensional characteristic data sample to obtain an effective threshold value. And preliminarily calculating a data base line, such as a unitary linear relation between the times of operating the file and the times of using the sensitive data, and predicting the probability of the occurrence of the specific abnormal behavior characteristics according to the characteristic attributes.

Fig. 5 is a schematic diagram of continuous monitoring of abnormal behaviors according to an embodiment of the application, and as shown in fig. 5, the method is based on big data driving, has the characteristics of long effective association, analysis and mining, continuous learning and continuous iterative optimization, and continuously performs feedback and tuning by adopting different data characteristics and detection algorithms according to different service scenes, so that a model meets the detection requirements of abnormal behaviors and large-scale user behaviors, realizes rapid discovery and abnormal positioning, and makes judgment and response in time.

Fig. 6 is a schematic diagram of adaptive dynamic risk identification by using a machine learning algorithm according to an embodiment of the present application, and as shown in fig. 6, details that cannot be perceived by human beings can be captured from behavior data by using unsupervised and supervised machine learning techniques and artificial intelligence techniques, so that anomalies are monitored without relying on human analysis too much, and a lot of time and experience are reduced. Meanwhile, the difficulty and the invalidity of manually constructing the characteristic rule and setting the threshold are avoided.

The method provided by the embodiment of the application realizes the associated binding of the user, the entity and the abnormal behavior, finds the internal illegal behavior such as data leakage, account abuse and the like through the analysis of the rule engine and the machine learning engine, and outputs the alarm through the multi-dimensional data analysis; sensitive data is found by detecting whether the data bank contains key fields of the sensitive data; based on the accurate identification of the rules, the security administrator is relieved from a large number of invalid events; the risk portrait of the important user can be taken, and the unknown risk can be prevented.

Fig. 7 is a block diagram of a log data processing apparatus according to an embodiment of the present application, where as shown in fig. 7, the apparatus includes:

an acquisition module 70, configured to acquire log data, where the source of the log data includes: a master server log, and a client log of at least one client terminal associated with the master server;

an obtaining module 72, configured to obtain data to be detected based on the log data, where the data to be detected includes: a user portrait of a login user of the client terminal and behavior data generated when an operation occurs on the client terminal;

and the analysis module 74 is configured to analyze the data to be detected based on the safety standard data to determine whether abnormal data exists.

It should be noted that, reference may be made to the description related to the embodiment shown in fig. 1 for a preferred implementation of the embodiment shown in fig. 7, and details are not repeated here.

According to an alternative embodiment of the present application, the apparatus further comprises: the preprocessing module is used for preprocessing the log data, wherein the preprocessing comprises at least one of the following steps: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.

According to another alternative embodiment of the present application, the obtaining module 72 includes: the extraction module is used for extracting the characteristics of the log data based on the types of the extraction parameters, and extracting to obtain a characteristic vector set, wherein the types of the extraction parameters comprise at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic; and the first construction module is used for constructing and obtaining the data to be detected based on the characteristic vector set.

In some optional embodiments of the present application, the apparatus further comprises: the second construction module is used for constructing safety standard data; wherein the second building block comprises: the sub-acquisition module is used for acquiring sample data, wherein the sample data comprises behavior data generated on a main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and the generating module is used for training the neural network model by adopting the sample data and generating an abnormal behavior detection model, and the abnormal behavior detection model is used for representing the baseline of the safety standard data.

In other alternative embodiments of the present application, the analysis module 74 includes: the behavior detection module is used for inputting data to be detected into the abnormal behavior detection model and acquiring a comparison result of the data to be detected and the safety standard data; the first determining module is used for determining that abnormal data does not exist if the comparison result shows that the data to be detected and the safety standard data are within the error range; and the second determining module is used for determining that abnormal data exists in the data to be detected if the comparison result shows that the data to be detected and the safety standard data are not in the error range.

According to an alternative embodiment of the present application, the apparatus further comprises: the scoring module is used for scoring abnormal behaviors of abnormal data in the data to be detected; and the adjusting module is used for calling the abnormal data which executes the abnormal behavior scoring to adjust the safety standard data.

The embodiment of the application also provides a nonvolatile storage medium, which comprises a stored program, wherein the device where the nonvolatile storage medium is located is controlled to execute the processing method of the log data when the program runs.

The nonvolatile storage medium is used for storing a program for executing the following functions: collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user representation of a logged-in user of the client terminal, and behavior data generated when an operation occurs on the client terminal; and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists.

The embodiment of the application also provides a processor, wherein the processor is used for running the program stored in the memory, and the program is used for executing the processing method of the log data when running.

The processor is used for running a program for executing the following functions: collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user image of a login user of the client terminal and behavior data generated when an operation occurs on the client terminal; and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a read-Only Memory (ROM), a random access Memory (RMCWDM, RMCWDndom MCWDccess Memory), a removable hard disk, a magnetic disk, or an optical disk, etc. which can store program codes.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method for processing log data is characterized by comprising the following steps:

collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server;

acquiring data to be detected based on the log data, wherein the data to be detected comprises: the user image of the login user of the client terminal and behavior data generated when the operation occurs on the client terminal;

and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists.

2. The method of claim 1, wherein after collecting log data, the method further comprises:

preprocessing the log data, wherein the preprocessing comprises at least one of: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.

3. The method of claim 1, wherein obtaining data to be detected based on the log data comprises:

based on the type of the extraction parameter, performing feature extraction on the log data, and extracting to obtain a feature vector set, wherein the type of the extraction parameter comprises at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic;

and constructing and obtaining the data to be detected based on the characteristic vector set.

4. The method according to any one of claims 1 to 3, wherein before analyzing the data to be detected based on safety standard data to determine whether abnormal data exists, the method further comprises:

constructing the safety standard data, wherein the steps comprise:

collecting sample data, wherein the sample data comprises behavior data generated on the main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal;

and training a neural network model by adopting the sample data to generate an abnormal behavior detection model, wherein the abnormal behavior detection model is used for representing the baseline of the safety standard data.

5. The method of claim 4, wherein analyzing the data to be detected based on safety standard data to determine whether abnormal data exists comprises:

inputting the data to be detected into the abnormal behavior detection model, and acquiring a comparison result of the data to be detected and the safety standard data;

if the comparison result is that the data to be detected and the safety standard data are within an error range, abnormal data do not exist;

otherwise, determining that the abnormal data exists in the data to be detected.

6. The method according to claim 5, wherein after determining that the abnormal data exists in the data to be detected, the method further comprises:

carrying out abnormal behavior scoring on abnormal data in the data to be detected;

and calling the abnormal data subjected to the abnormal behavior scoring to adjust the safety standard data.

7. An apparatus for processing log data, comprising:

the acquisition module is used for acquiring log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server;

an obtaining module, configured to obtain data to be detected based on the log data, where the data to be detected includes: the user image of the login user of the client terminal and behavior data generated when the operation occurs on the client terminal;

and the analysis module is used for analyzing the data to be detected based on the safety standard data and determining whether abnormal data exists or not.

8. The apparatus of claim 7, further comprising:

a preprocessing module, configured to preprocess the log data, where the preprocessing includes at least one of: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.

9. The apparatus of claim 7, wherein the obtaining module comprises:

an extraction module, configured to perform feature extraction on the log data based on a type of an extraction parameter, and extract to obtain a feature vector set, where the type of the extraction parameter includes at least one of: access flow, operation file attribute, transceiving data type and access characteristic;

and the first construction module is used for constructing and obtaining the data to be detected based on the characteristic vector set.

10. The apparatus of any one of claims 7 to 9, further comprising:

the second construction module is used for constructing the safety standard data;

wherein the second building block comprises: the sub-acquisition module is used for acquiring sample data, wherein the sample data comprises behavior data generated on the main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and the generating module is used for training a neural network model by adopting the sample data to generate an abnormal behavior detection model, and the abnormal behavior detection model is used for representing the baseline of the safety standard data.

11. The apparatus of claim 10, wherein the analysis module comprises:

the behavior detection module is used for inputting the data to be detected into the abnormal behavior detection model and acquiring a comparison result of the data to be detected and the safety standard data;

the first determining module is used for determining that abnormal data does not exist if the comparison result shows that the data to be detected and the safety standard data are within an error range;

and the second determining module is used for determining that the abnormal data exists in the data to be detected if the comparison result indicates that the data to be detected and the safety standard data are not within the error range.

12. The apparatus of claim 11, further comprising:

the scoring module is used for scoring abnormal behaviors of the abnormal data in the data to be detected;

and the adjusting module is used for calling the abnormal data which executes the abnormal behavior scoring to adjust the safety standard data.

13. A non-volatile storage medium, comprising a stored program, wherein a device in which the non-volatile storage medium is located is controlled to execute the processing method of log data according to any one of claims 1 to 6 when the program runs.

14. A processor, characterized in that the processor is configured to run a program stored in a memory, wherein the program is configured to execute the method for processing log data according to any one of claims 1 to 6 when running.