CN112804196A - Log data processing method and device - Google Patents

Log data processing method and device Download PDF

Info

Publication number
CN112804196A
CN112804196A CN202011567623.9A CN202011567623A CN112804196A CN 112804196 A CN112804196 A CN 112804196A CN 202011567623 A CN202011567623 A CN 202011567623A CN 112804196 A CN112804196 A CN 112804196A
Authority
CN
China
Prior art keywords
data
log
detected
abnormal
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011567623.9A
Other languages
Chinese (zh)
Inventor
梁宏宇
喻波
王志海
安鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wondersoft Technology Co Ltd
Original Assignee
Beijing Wondersoft Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wondersoft Technology Co Ltd filed Critical Beijing Wondersoft Technology Co Ltd
Priority to CN202011567623.9A priority Critical patent/CN112804196A/en
Publication of CN112804196A publication Critical patent/CN112804196A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Abstract

The application discloses a method and a device for processing log data. Wherein, the method comprises the following steps: collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user representation of a logged-in user of the client terminal, and behavior data generated when an operation occurs on the client terminal; and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists. The method and the device solve the technical problems that the traditional information security technology carries out protection detection by artificially setting a threshold based on rules and expert experience, and false alarm is caused due to the fact that malicious attacks in a legal process have a security visibility blind zone and unknown attacks cannot be detected.

Description

Log data processing method and device
Technical Field
The present application relates to the field of information security, and in particular, to a method and an apparatus for processing log data.
Background
With the deep advance of enterprise digital transformation, information leakage events occur at times, and the confidentiality, availability and integrity of enterprise data assets are threatened, which is now the main threat of each enterprise in security management. How to improve the visibility of internal threats and external attacks by utilizing big data and machine learning becomes an important development trend of security practitioners and becomes a key point of enterprise attention. Network attack technologies for enterprise sensitive data are continuously upgraded in recent years, and the network attack technologies can be divided into external attack behaviors and internal threat behaviors according to attack sources. The external attack behavior is hidden in a legal process, so that the monitoring and searching and killing of a safety protection system can be avoided, and the target system can be rapidly invaded; the internal threat behavior can be disguised as a legal user, so as to break through the network boundary, steal the network certificate and cause the internal information security threat.
According to the statistics survey of Cisco, the method comprises the following steps: the enterprise is attacked, more than 70% of the attacking behaviors are difficult to give out security alarms, the true attack percentage in the given security alarms is less than 40%, and less than 10% of the attack percentage can be effectively treated. The traditional information security technology is based on rules and expert experience, protection detection is carried out by artificially setting a threshold, a security visibility blind area exists in the face of malicious attacks in a legal process, and the unknown attacks cannot be detected to escape and bypass or cause false alarm.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a log data processing method and device, and the method and device at least solve the technical problems that in the conventional information security technology, protection detection is carried out by manually setting a threshold based on rules and expert experience, and false alarm is caused because malicious attacks in a legal process have a security visibility blind area and unknown attacks cannot be detected.
According to an aspect of an embodiment of the present application, there is provided a method for processing log data, including: collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user representation of a logged-in user of the client terminal, and behavior data generated when an operation occurs on the client terminal; and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists.
Optionally, after collecting the log data, the method further includes: preprocessing the log data, wherein the preprocessing comprises at least one of the following steps: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.
Optionally, acquiring data to be detected based on the log data includes: performing feature extraction on the log data based on the type of the extraction parameter, and extracting to obtain a feature vector set, wherein the type of the extraction parameter comprises at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic; and constructing to-be-detected data based on the characteristic vector set.
Optionally, before analyzing the data to be detected based on the safety standard data and determining whether there is abnormal data, the method further includes: constructing safety standard data, which comprises the following steps: acquiring sample data, wherein the sample data comprises behavior data generated on a main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and training a neural network model by adopting sample data to generate an abnormal behavior detection model, wherein the abnormal behavior detection model is used for representing a base line of the safety standard data.
Optionally, analyzing the data to be detected based on the safety standard data to determine whether there is abnormal data, including: inputting the data to be detected into an abnormal behavior detection model, and acquiring a comparison result of the data to be detected and safety standard data; if the comparison result is that the data to be detected and the safety standard data are within the error range, abnormal data do not exist; otherwise, determining that abnormal data exists in the data to be detected.
Optionally, after determining that abnormal data exists in the data to be detected, the method further includes: carrying out abnormal behavior scoring on abnormal data in the data to be detected; and calling the abnormal data subjected to the abnormal behavior scoring to adjust the safety standard data.
According to another aspect of the embodiments of the present application, there is also provided a processing apparatus of log data, including: the acquisition module is used for acquiring log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; the acquisition module is used for acquiring data to be detected based on the log data, wherein the data to be detected comprises: the method comprises the steps that a user picture of a login user of a client terminal and behavior data generated when operation occurs on the client terminal are displayed; and the analysis module is used for analyzing the data to be detected based on the safety standard data and determining whether abnormal data exists.
Optionally, the apparatus further comprises: the preprocessing module is used for preprocessing the log data, wherein the preprocessing comprises at least one of the following steps: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.
Optionally, the obtaining module includes: the extraction module is used for extracting the characteristics of the log data based on the types of the extraction parameters, and extracting to obtain a characteristic vector set, wherein the types of the extraction parameters comprise at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic; and the first construction module is used for constructing and obtaining the data to be detected based on the feature vector set.
Optionally, the apparatus further comprises: the second construction module is used for constructing the safety standard data; wherein the second building block comprises: the sub-acquisition module is used for acquiring sample data, wherein the sample data comprises behavior data generated on the main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and the generating module is used for training the neural network model by adopting the sample data to generate an abnormal behavior detection model, and the abnormal behavior detection model is used for representing the baseline of the safety standard data.
Optionally, the analysis module comprises: the behavior detection module is used for inputting the data to be detected into the abnormal behavior detection model and acquiring a comparison result of the data to be detected and the safety standard data; the first determining module is used for determining that abnormal data does not exist if the comparison result shows that the data to be detected and the safety standard data are within the error range; and the second determining module is used for determining that abnormal data exists in the data to be detected if the comparison result shows that the data to be detected and the safety standard data are not in the error range.
Optionally, the apparatus further comprises: the scoring module is used for scoring abnormal behaviors of abnormal data in the data to be detected; and the adjusting module is used for calling the abnormal data which executes the abnormal behavior scoring to adjust the safety standard data.
According to still another aspect of the embodiments of the present application, there is provided a non-volatile storage medium, where the non-volatile storage medium includes a stored program, and when the program runs, a device in which the non-volatile storage medium is located is controlled to execute the above processing method of log data.
According to still another aspect of the embodiments of the present application, there is also provided a processor configured to execute a program stored in a memory, where the program executes the above processing method of log data.
In the embodiment of the present application, collecting log data is adopted, wherein the source of the log data includes: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user image of a login user of the client terminal and behavior data generated when an operation occurs on the client terminal; the method comprises the steps of analyzing data to be detected based on safety standard data, determining whether abnormal data exist or not, conducting log collection and big data analysis modeling through a cloud computing and big data platform based on a User and Entity Behavior Analysis (UEBA) theory, constructing a correlation analysis and tracking traceability system based on the UEBA, and conducting continuous risk monitoring, so that the technical effect of rapidly finding and locating abnormal information in an enterprise network is achieved, and the technical problems that a traditional information safety technology is based on rules and expert experience, protection detection is conducted through manual setting of a threshold value, a safety visibility blind area exists in malicious attacks in a legal process, and false alarm is caused due to the fact that unknown attacks cannot be detected are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a method for processing log data according to an embodiment of the present application;
FIG. 2 is a schematic illustration of a constructed baseline according to an embodiment of the present application;
FIG. 3 is a schematic comparison of a baseline according to an embodiment of the present application;
fig. 4 is a UEBA architecture diagram according to an embodiment of the present application;
FIG. 5 is a schematic illustration of a continuous monitoring of abnormal behavior in accordance with an embodiment of the present application;
FIG. 6 is a schematic diagram of adaptive dynamic risk identification using a machine learning algorithm according to an embodiment of the present application;
fig. 7 is a block diagram of a log data processing apparatus according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
analyzing user entity behaviors: user and entity behavior analysis, UEBA, is a representative of a new information security technology, and the technology takes a user as a view point, and is converted from traditional rule analysis into association analysis, behavior modeling and anomaly analysis.
And (3) a logistic regression algorithm: the model is also called logistic regression analysis, is a generalized linear regression analysis model and is commonly used in the fields of data mining, automatic disease diagnosis, economic prediction and the like. In essence, the abnormal electricity utilization identification belongs to a two-classification problem, so a logistic regression algorithm can be adopted for classification.
According to an embodiment of the present application, there is provided an embodiment of a method for processing log data, it should be noted that the steps shown in the flowchart of the figure may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from the order shown.
Fig. 1 is a flowchart of a method for processing log data according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S102, collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server;
it should be noted that the log data in step S102 is multi-source log data, and the acquisition of the log is realized by performing multi-source log acquisition in step S102, and the acquisition system mainly realizes functions of data acquisition, data preprocessing, baseline construction, model prediction, abnormal scoring, disposal response, and the like.
And multi-source log collection is carried out, and collection, classification and receiving of relevant terminal logs and host logs are mainly completed.
Step S104, acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user representation of a logged-in user of the client terminal, and behavior data generated when an operation occurs on the client terminal;
and S106, analyzing the data to be detected based on the safety standard data, and determining whether abnormal data exists.
Through the steps, log collection and big data analysis modeling are carried out through a cloud computing and big data platform based on a User and Entity Behavior Analysis (UEBA) theory, a correlation analysis and tracking traceability system based on the UEBA is constructed, and risks are continuously monitored, so that the technical effect of rapidly finding and positioning abnormal information in an enterprise network is achieved.
According to an optional embodiment of the present application, after the completion of the step S102, the log data is preprocessed, wherein the preprocessing includes at least one of the following: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.
In this step, the normalization processing of the non-normalized data and the non-uniform dimension data is mainly completed.
According to another alternative embodiment of the present application, step S104 is implemented by: performing feature extraction on the log data based on the type of the extraction parameter, and extracting to obtain a feature vector set, wherein the type of the extraction parameter comprises at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic; and constructing to-be-detected data based on the feature vector set.
In this step, UEBA is driven by big data, and based on multi-source logs of traffic, file operations, web access and mailing, and the like, feature vectors are extracted from behavior logs in combination with 5W1H model (Who, What, When, Where, solution (Why) and How (How)) criteria, so as to construct normal user, entity behavior baselines and user profiles.
In some optional embodiments of the present application, before performing step S106, it is further required to construct security standard data, and the step includes: collecting sample data, wherein the sample data comprises behavior data generated on a main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and training a neural network model by adopting sample data to generate an abnormal behavior detection model, wherein the abnormal behavior detection model is used for representing a base line of the safety standard data.
Fig. 2 is a schematic diagram of establishing a baseline according to an embodiment of the present application, and as shown in fig. 2, the baseline shows individual behavior of 3 users, including three business operation scenarios, i.e., mailing, file operation, and web access. And performing longitudinal analysis and calculation on the user behavior to obtain a department behavior baseline.
The multi-dimensional safety baseline is composed of an individual behavior baseline, a department behavior baseline, a scene behavior baseline and the like, wherein the individual behavior baseline reflects individual behavior characteristics of users, the department baseline reflects group characteristics of departments to which the users belong, and the scene baseline reflects operation behavior characteristics of the users and the entities. A dynamic baseline of real-time access behaviors of sensitive data is generated based on learning algorithms such as random forests, SVM and the like, a full-time context environment is constructed through group baseline analysis, the limitation of single behaviors is avoided, a safety baseline is updated in real time through distributed real-time data calculation, and a complete dynamic behavior baseline is realized.
According to an alternative embodiment of the present application, step S106 is implemented by: inputting the data to be detected into an abnormal behavior detection model, and acquiring a comparison result of the data to be detected and safety standard data; if the comparison result is that the data to be detected and the safety standard data are within the error range, abnormal data do not exist; otherwise, determining that abnormal data exists in the data to be detected.
Fig. 3 is a schematic diagram of comparing baselines according to an embodiment of the present application, and as shown in fig. 3, the abnormal behavior detection mainly performs abnormal behavior detection from multidimensional angles such as account login and logout times, IP call times, access time intervals, and the like through deep learning algorithms such as CNN, RNN, and the like for a statistical indicator, a pattern sequence, a time sequence, and a behavior logic sequence. And finally, based on an iterative evaluation mechanism, performing weighted combination of various behavior alarms, abnormal detection, group comparison analysis and the like, continuously optimizing iteration to obtain abnormal behavior scores, and reflecting the evaluation behaviors to an actual scene according to an actual service scene.
In an optional embodiment of the present application, after determining that there is abnormal data in the data to be detected, performing abnormal behavior scoring on the abnormal data in the data to be detected; and calling the abnormal data subjected to the abnormal behavior scoring to adjust the safety standard data.
By continuously tracking the behaviors of the user and the entity, continuously performing risk assessment, constructing a complete time line, performing comprehensive risk assessment on the user and the entity, and pre-judging abnormal behaviors, the number of false alarm alarms can be greatly reduced.
The business scene of the log data processing method provided by the embodiment of the application mainly provides internal and external threat detection for enterprises, and mainly comprises intentional data collection, abnormal and illegal access such as permission overrun and the like, user login abnormality and user source address behavior abnormality; the method mainly comprises the steps of externally judging abnormal behaviors through correlation analysis, wherein data leakage, continuous data external transmission, abnormal website access, discovery of a lost host and the like exist; through linkage of business strategies, user behaviors are controlled in time, tracing and response handling are carried out, centralized display of the whole life cycle of sensitive data is achieved, and data resource distribution is clear at a glance.
Fig. 4 is a schematic diagram of a UEBA architecture according to an embodiment of the present application, where the functions mainly implemented by the method include: multi-source log data, feature extraction, baseline construction, abnormal behavior detection, risk assessment and service scenes.
And through designing a UEBA overall architecture, user behaviors are controlled in time through business strategy linkage, and tracing and source tracing and response handling are performed.
And (4) continuously monitoring the risk by constructing a UEBA processing flow.
And (4) adopting machine learning self-adaptive dynamic risk identification by constructing a UEBA processing flow.
The abnormal behavior detection is mainly based on the user abnormal behavior analysis of the logistic regression algorithm, and the user abnormal behavior analysis method based on the logistic regression algorithm is provided, wherein the dependent variable is the abnormal behavior attribute, and the independent variable is the natural characteristic attribute of the user. And (3) carrying out real-time data prediction through a unitary linear regression model, carrying out data trend prediction by adopting curve models such as quadratic parabola and the like, and preliminarily processing the characteristics of the N-dimensional characteristic data sample to obtain an effective threshold value. And preliminarily calculating a data base line, such as a unitary linear relation between the times of operating the file and the times of using the sensitive data, and predicting the probability of the occurrence of the specific abnormal behavior characteristics according to the characteristic attributes.
Fig. 5 is a schematic diagram of continuous monitoring of abnormal behaviors according to an embodiment of the application, and as shown in fig. 5, the method is based on big data driving, has the characteristics of long effective association, analysis and mining, continuous learning and continuous iterative optimization, and continuously performs feedback and tuning by adopting different data characteristics and detection algorithms according to different service scenes, so that a model meets the detection requirements of abnormal behaviors and large-scale user behaviors, realizes rapid discovery and abnormal positioning, and makes judgment and response in time.
Fig. 6 is a schematic diagram of adaptive dynamic risk identification by using a machine learning algorithm according to an embodiment of the present application, and as shown in fig. 6, details that cannot be perceived by human beings can be captured from behavior data by using unsupervised and supervised machine learning techniques and artificial intelligence techniques, so that anomalies are monitored without relying on human analysis too much, and a lot of time and experience are reduced. Meanwhile, the difficulty and the invalidity of manually constructing the characteristic rule and setting the threshold are avoided.
The method provided by the embodiment of the application realizes the associated binding of the user, the entity and the abnormal behavior, finds the internal illegal behavior such as data leakage, account abuse and the like through the analysis of the rule engine and the machine learning engine, and outputs the alarm through the multi-dimensional data analysis; sensitive data is found by detecting whether the data bank contains key fields of the sensitive data; based on the accurate identification of the rules, the security administrator is relieved from a large number of invalid events; the risk portrait of the important user can be taken, and the unknown risk can be prevented.
Fig. 7 is a block diagram of a log data processing apparatus according to an embodiment of the present application, where as shown in fig. 7, the apparatus includes:
an acquisition module 70, configured to acquire log data, where the source of the log data includes: a master server log, and a client log of at least one client terminal associated with the master server;
an obtaining module 72, configured to obtain data to be detected based on the log data, where the data to be detected includes: a user portrait of a login user of the client terminal and behavior data generated when an operation occurs on the client terminal;
and the analysis module 74 is configured to analyze the data to be detected based on the safety standard data to determine whether abnormal data exists.
It should be noted that, reference may be made to the description related to the embodiment shown in fig. 1 for a preferred implementation of the embodiment shown in fig. 7, and details are not repeated here.
According to an alternative embodiment of the present application, the apparatus further comprises: the preprocessing module is used for preprocessing the log data, wherein the preprocessing comprises at least one of the following steps: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.
According to another alternative embodiment of the present application, the obtaining module 72 includes: the extraction module is used for extracting the characteristics of the log data based on the types of the extraction parameters, and extracting to obtain a characteristic vector set, wherein the types of the extraction parameters comprise at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic; and the first construction module is used for constructing and obtaining the data to be detected based on the characteristic vector set.
In some optional embodiments of the present application, the apparatus further comprises: the second construction module is used for constructing safety standard data; wherein the second building block comprises: the sub-acquisition module is used for acquiring sample data, wherein the sample data comprises behavior data generated on a main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and the generating module is used for training the neural network model by adopting the sample data and generating an abnormal behavior detection model, and the abnormal behavior detection model is used for representing the baseline of the safety standard data.
In other alternative embodiments of the present application, the analysis module 74 includes: the behavior detection module is used for inputting data to be detected into the abnormal behavior detection model and acquiring a comparison result of the data to be detected and the safety standard data; the first determining module is used for determining that abnormal data does not exist if the comparison result shows that the data to be detected and the safety standard data are within the error range; and the second determining module is used for determining that abnormal data exists in the data to be detected if the comparison result shows that the data to be detected and the safety standard data are not in the error range.
According to an alternative embodiment of the present application, the apparatus further comprises: the scoring module is used for scoring abnormal behaviors of abnormal data in the data to be detected; and the adjusting module is used for calling the abnormal data which executes the abnormal behavior scoring to adjust the safety standard data.
The embodiment of the application also provides a nonvolatile storage medium, which comprises a stored program, wherein the device where the nonvolatile storage medium is located is controlled to execute the processing method of the log data when the program runs.
The nonvolatile storage medium is used for storing a program for executing the following functions: collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user representation of a logged-in user of the client terminal, and behavior data generated when an operation occurs on the client terminal; and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists.
The embodiment of the application also provides a processor, wherein the processor is used for running the program stored in the memory, and the program is used for executing the processing method of the log data when running.
The processor is used for running a program for executing the following functions: collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server; acquiring data to be detected based on the log data, wherein the data to be detected comprises: a user image of a login user of the client terminal and behavior data generated when an operation occurs on the client terminal; and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a read-Only Memory (ROM), a random access Memory (RMCWDM, RMCWDndom MCWDccess Memory), a removable hard disk, a magnetic disk, or an optical disk, etc. which can store program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (14)

1. A method for processing log data is characterized by comprising the following steps:
collecting log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server;
acquiring data to be detected based on the log data, wherein the data to be detected comprises: the user image of the login user of the client terminal and behavior data generated when the operation occurs on the client terminal;
and analyzing the data to be detected based on the safety standard data to determine whether abnormal data exists.
2. The method of claim 1, wherein after collecting log data, the method further comprises:
preprocessing the log data, wherein the preprocessing comprises at least one of: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.
3. The method of claim 1, wherein obtaining data to be detected based on the log data comprises:
based on the type of the extraction parameter, performing feature extraction on the log data, and extracting to obtain a feature vector set, wherein the type of the extraction parameter comprises at least one of the following types: access flow, operation file attribute, transceiving data type and access characteristic;
and constructing and obtaining the data to be detected based on the characteristic vector set.
4. The method according to any one of claims 1 to 3, wherein before analyzing the data to be detected based on safety standard data to determine whether abnormal data exists, the method further comprises:
constructing the safety standard data, wherein the steps comprise:
collecting sample data, wherein the sample data comprises behavior data generated on the main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal;
and training a neural network model by adopting the sample data to generate an abnormal behavior detection model, wherein the abnormal behavior detection model is used for representing the baseline of the safety standard data.
5. The method of claim 4, wherein analyzing the data to be detected based on safety standard data to determine whether abnormal data exists comprises:
inputting the data to be detected into the abnormal behavior detection model, and acquiring a comparison result of the data to be detected and the safety standard data;
if the comparison result is that the data to be detected and the safety standard data are within an error range, abnormal data do not exist;
otherwise, determining that the abnormal data exists in the data to be detected.
6. The method according to claim 5, wherein after determining that the abnormal data exists in the data to be detected, the method further comprises:
carrying out abnormal behavior scoring on abnormal data in the data to be detected;
and calling the abnormal data subjected to the abnormal behavior scoring to adjust the safety standard data.
7. An apparatus for processing log data, comprising:
the acquisition module is used for acquiring log data, wherein the source of the log data comprises: a master server log, and a client log of at least one client terminal associated with the master server;
an obtaining module, configured to obtain data to be detected based on the log data, where the data to be detected includes: the user image of the login user of the client terminal and behavior data generated when the operation occurs on the client terminal;
and the analysis module is used for analyzing the data to be detected based on the safety standard data and determining whether abnormal data exists or not.
8. The apparatus of claim 7, further comprising:
a preprocessing module, configured to preprocess the log data, where the preprocessing includes at least one of: classification processing, redundancy removal processing, non-normalized data processing and unification processing of non-uniform dimension data.
9. The apparatus of claim 7, wherein the obtaining module comprises:
an extraction module, configured to perform feature extraction on the log data based on a type of an extraction parameter, and extract to obtain a feature vector set, where the type of the extraction parameter includes at least one of: access flow, operation file attribute, transceiving data type and access characteristic;
and the first construction module is used for constructing and obtaining the data to be detected based on the characteristic vector set.
10. The apparatus of any one of claims 7 to 9, further comprising:
the second construction module is used for constructing the safety standard data;
wherein the second building block comprises: the sub-acquisition module is used for acquiring sample data, wherein the sample data comprises behavior data generated on the main server and at least one client terminal associated with the main server in a historical time period, and the behavior data represents behavior characteristics generated after a user operates on the main server and the client terminal; and the generating module is used for training a neural network model by adopting the sample data to generate an abnormal behavior detection model, and the abnormal behavior detection model is used for representing the baseline of the safety standard data.
11. The apparatus of claim 10, wherein the analysis module comprises:
the behavior detection module is used for inputting the data to be detected into the abnormal behavior detection model and acquiring a comparison result of the data to be detected and the safety standard data;
the first determining module is used for determining that abnormal data does not exist if the comparison result shows that the data to be detected and the safety standard data are within an error range;
and the second determining module is used for determining that the abnormal data exists in the data to be detected if the comparison result indicates that the data to be detected and the safety standard data are not within the error range.
12. The apparatus of claim 11, further comprising:
the scoring module is used for scoring abnormal behaviors of the abnormal data in the data to be detected;
and the adjusting module is used for calling the abnormal data which executes the abnormal behavior scoring to adjust the safety standard data.
13. A non-volatile storage medium, comprising a stored program, wherein a device in which the non-volatile storage medium is located is controlled to execute the processing method of log data according to any one of claims 1 to 6 when the program runs.
14. A processor, characterized in that the processor is configured to run a program stored in a memory, wherein the program is configured to execute the method for processing log data according to any one of claims 1 to 6 when running.
CN202011567623.9A 2020-12-25 2020-12-25 Log data processing method and device Pending CN112804196A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011567623.9A CN112804196A (en) 2020-12-25 2020-12-25 Log data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011567623.9A CN112804196A (en) 2020-12-25 2020-12-25 Log data processing method and device

Publications (1)

Publication Number Publication Date
CN112804196A true CN112804196A (en) 2021-05-14

Family

ID=75805251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011567623.9A Pending CN112804196A (en) 2020-12-25 2020-12-25 Log data processing method and device

Country Status (1)

Country Link
CN (1) CN112804196A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113259398A (en) * 2021-07-07 2021-08-13 杭州大乘智能科技有限公司 Account security detection method based on mail log data
CN113297576A (en) * 2021-06-16 2021-08-24 深信服科技股份有限公司 Threat detection method and device, behavior portrait method and device and electronic equipment
CN113360354A (en) * 2021-05-27 2021-09-07 广州品粤信息科技有限公司 User operation behavior monitoring method, device, equipment and readable storage medium
CN113377718A (en) * 2021-05-24 2021-09-10 石化盈科信息技术有限责任公司 Log information processing method and device, computer equipment and storage medium
CN113434404A (en) * 2021-06-24 2021-09-24 北京同创永益科技发展有限公司 Automatic service verification method and device for verifying reliability of disaster recovery backup system
CN114553720A (en) * 2022-02-28 2022-05-27 中国工商银行股份有限公司 User operation abnormity detection method and device
CN114844831A (en) * 2022-03-18 2022-08-02 奇安信科技集团股份有限公司 Method, device and equipment for routing edit data of behavior safety baseline
CN114866276A (en) * 2022-03-21 2022-08-05 杭州薮猫科技有限公司 Terminal detection method and device for abnormal transmission file, storage medium and equipment
CN115051833A (en) * 2022-05-12 2022-09-13 中国电子科技集团公司电子科学研究院 Intercommunication network abnormity detection method based on terminal process
CN115118525A (en) * 2022-08-23 2022-09-27 天津天元海科技开发有限公司 Internet of things safety protection system and protection method thereof
CN115146263A (en) * 2022-09-05 2022-10-04 北京微步在线科技有限公司 User account collapse detection method and device, electronic equipment and storage medium
CN115941265A (en) * 2022-11-01 2023-04-07 南京鼎山信息科技有限公司 Big data attack processing method and system applied to cloud service
CN115906160B (en) * 2022-11-16 2023-07-18 荣科科技股份有限公司 Information processing method and system based on artificial intelligence analysis
KR102627813B1 (en) * 2023-10-27 2024-01-23 김대훈 Anomaly detection method and device based on artifitiral neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268354A (en) * 2016-12-30 2018-07-10 腾讯科技(深圳)有限公司 Data safety monitoring method, background server, terminal and system
CN109241461A (en) * 2018-08-10 2019-01-18 新华三信息安全技术有限公司 A kind of user draws a portrait construction method and device
CN110781930A (en) * 2019-10-14 2020-02-11 西安交通大学 User portrait grouping and behavior analysis method and system based on log data of network security equipment
CN110855461A (en) * 2018-08-20 2020-02-28 北京航天长峰科技工业集团有限公司 Log analysis method based on association analysis and rule base
CN112000806A (en) * 2020-08-25 2020-11-27 携程旅游信息技术(上海)有限公司 Abnormal log monitoring and analyzing method, system, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268354A (en) * 2016-12-30 2018-07-10 腾讯科技(深圳)有限公司 Data safety monitoring method, background server, terminal and system
CN109241461A (en) * 2018-08-10 2019-01-18 新华三信息安全技术有限公司 A kind of user draws a portrait construction method and device
CN110855461A (en) * 2018-08-20 2020-02-28 北京航天长峰科技工业集团有限公司 Log analysis method based on association analysis and rule base
CN110781930A (en) * 2019-10-14 2020-02-11 西安交通大学 User portrait grouping and behavior analysis method and system based on log data of network security equipment
CN112000806A (en) * 2020-08-25 2020-11-27 携程旅游信息技术(上海)有限公司 Abnormal log monitoring and analyzing method, system, equipment and storage medium

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377718A (en) * 2021-05-24 2021-09-10 石化盈科信息技术有限责任公司 Log information processing method and device, computer equipment and storage medium
CN113360354A (en) * 2021-05-27 2021-09-07 广州品粤信息科技有限公司 User operation behavior monitoring method, device, equipment and readable storage medium
CN113297576A (en) * 2021-06-16 2021-08-24 深信服科技股份有限公司 Threat detection method and device, behavior portrait method and device and electronic equipment
CN113434404A (en) * 2021-06-24 2021-09-24 北京同创永益科技发展有限公司 Automatic service verification method and device for verifying reliability of disaster recovery backup system
CN113434404B (en) * 2021-06-24 2024-03-19 北京同创永益科技发展有限公司 Automatic service verification method and device for verifying reliability of disaster recovery system
CN113259398A (en) * 2021-07-07 2021-08-13 杭州大乘智能科技有限公司 Account security detection method based on mail log data
CN114553720A (en) * 2022-02-28 2022-05-27 中国工商银行股份有限公司 User operation abnormity detection method and device
CN114844831B (en) * 2022-03-18 2024-02-27 奇安信科技集团股份有限公司 Editing data routing method, device and equipment for behavior security base line
CN114844831A (en) * 2022-03-18 2022-08-02 奇安信科技集团股份有限公司 Method, device and equipment for routing edit data of behavior safety baseline
CN114866276A (en) * 2022-03-21 2022-08-05 杭州薮猫科技有限公司 Terminal detection method and device for abnormal transmission file, storage medium and equipment
CN115051833A (en) * 2022-05-12 2022-09-13 中国电子科技集团公司电子科学研究院 Intercommunication network abnormity detection method based on terminal process
CN115051833B (en) * 2022-05-12 2023-12-15 中国电子科技集团公司电子科学研究院 Intercommunication network anomaly detection method based on terminal process
CN115118525A (en) * 2022-08-23 2022-09-27 天津天元海科技开发有限公司 Internet of things safety protection system and protection method thereof
CN115146263A (en) * 2022-09-05 2022-10-04 北京微步在线科技有限公司 User account collapse detection method and device, electronic equipment and storage medium
CN115941265A (en) * 2022-11-01 2023-04-07 南京鼎山信息科技有限公司 Big data attack processing method and system applied to cloud service
CN115941265B (en) * 2022-11-01 2023-10-03 南京鼎山信息科技有限公司 Big data attack processing method and system applied to cloud service
CN115906160B (en) * 2022-11-16 2023-07-18 荣科科技股份有限公司 Information processing method and system based on artificial intelligence analysis
KR102627813B1 (en) * 2023-10-27 2024-01-23 김대훈 Anomaly detection method and device based on artifitiral neural network

Similar Documents

Publication Publication Date Title
CN112804196A (en) Log data processing method and device
CN109347801B (en) Vulnerability exploitation risk assessment method based on multi-source word embedding and knowledge graph
Sun et al. Detecting anomalous user behavior using an extended isolation forest algorithm: an enterprise case study
Kotenko et al. Systematic literature review of security event correlation methods
CN116680704B (en) Data security protection method and system for client
CN117220978B (en) Quantitative evaluation system and evaluation method for network security operation model
CN116366376B (en) APT attack traceability graph analysis method
CN112560029A (en) Website content monitoring and automatic response protection method based on intelligent analysis technology
CN110830467A (en) Network suspicious asset identification method based on fuzzy prediction
Sakr et al. Filter versus wrapper feature selection for network intrusion detection system
CN115001934A (en) Industrial control safety risk analysis system and method
CN112637108B (en) Internal threat analysis method and system based on anomaly detection and emotion analysis
Chen et al. An effective metaheuristic algorithm for intrusion detection system
CN110598959A (en) Asset risk assessment method and device, electronic equipment and storage medium
CN110290101B (en) Deep trust network-based associated attack behavior identification method in smart grid environment
CN116545679A (en) Industrial situation security basic framework and network attack behavior feature analysis method
CN114039837B (en) Alarm data processing method, device, system, equipment and storage medium
CN115567241A (en) Multi-site network perception detection system
CN115225359A (en) Honeypot data tracing method and device, computer equipment and storage medium
Shin et al. Alert correlation using diamond model for cyber threat intelligence
CN117807590B (en) Information security prediction and monitoring system and method based on artificial intelligence
Chaudhari et al. A study on data mining & machine learning for intrusion detection system
Mihailescu et al. Unveiling Threats: Leveraging User Behavior Analysis for Enhanced Cybersecurity
KR102592624B1 (en) Threat hunting system and method for against social issue-based advanced persistent threat using artificial intelligence
CN115051833B (en) Intercommunication network anomaly detection method based on terminal process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination