CN106778259A - A kind of abnormal behaviour based on big data machine learning finds method and system - Google Patents

A kind of abnormal behaviour based on big data machine learning finds method and system Download PDF

Info

Publication number
CN106778259A
CN106778259A CN201611232408.7A CN201611232408A CN106778259A CN 106778259 A CN106778259 A CN 106778259A CN 201611232408 A CN201611232408 A CN 201611232408A CN 106778259 A CN106778259 A CN 106778259A
Authority
CN
China
Prior art keywords
storehouse
behaviour
data
abnormal behaviour
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611232408.7A
Other languages
Chinese (zh)
Other versions
CN106778259B (en
Inventor
李学进
王志海
魏力
喻波
何晋昊
蒲鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wondersoft Technology Co Ltd
Original Assignee
Beijing Wondersoft Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wondersoft Technology Co Ltd filed Critical Beijing Wondersoft Technology Co Ltd
Priority to CN201611232408.7A priority Critical patent/CN106778259B/en
Publication of CN106778259A publication Critical patent/CN106778259A/en
Application granted granted Critical
Publication of CN106778259B publication Critical patent/CN106778259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Method and system are found the invention discloses a kind of abnormal behaviour based on big data machine learning, the method includes:Raw security daily record data is pre-processed;Characteristic is extracted from the result by pretreatment;The characteristic is clustered, abnormal behaviour storehouse and normal behaviour storehouse is determined;New behavior sample data in acquisition new safe day in data, is compared by with the normal behaviour storehouse, abnormal behaviour storehouse, determines that it is normal behaviour or abnormal behaviour, and the normal behaviour storehouse or abnormal behaviour storehouse are updated with the new behavior sample data;Repeat previous step, when the normal behaviour storehouse and abnormal behaviour storehouse have enough normal behaviour and abnormal behaviour sample datas, Random Forest model is trained with the sample data in the normal behaviour storehouse and abnormal behaviour storehouse, abnormal behaviour judgement is carried out using the Random Forest model obtained by training.By the scheme of the invention, solve the problems, such as that initial stage quantity containing exemplar is very few, improve determination rate of accuracy, effectively prevent the occurrence of judging by accident.

Description

A kind of abnormal behaviour based on big data machine learning finds method and system
Technical field
The present invention relates to data security arts, and in particular to a kind of abnormal behaviour discovery side based on big data machine learning Method and system.
Background technology
Legacy network safety, Technology On Data Encryption, such as all kinds of soft and hardware fire walls, generally protect plan using " fence type " Slightly, many limitations are with the addition of to network and application system are artificial, any data access action is required for by all preset rules Filtering, not only influence system Consumer's Experience, also increase system operation burden.Additionally, in existing fail-safe software, generating one Built-in rule, generally requires multiple stages such as leak discovery, attack simulating, message analysis, feature extraction and rule generation.With Attack meanses are constantly updated, and such regular generating process is also required to constantly repeat, and expends a large amount of human costs.It is prior It is that traditional defense cannot tackle big data.Based on this, a kind of abnormal behaviour discovery side based on big data machine learning is now provided Method, becomes Passive Defence into active is examined, relaxes user's access and strengthens behavior monitoring, replaces artificial by machine.
Accompanying drawing 1 is that the management user abnormal behaviour based on big data log analysis finds flow, specific bag in the prior art Include:
(1) during daily record pond is deposited in daily record to be analyzed.
(2) daily record pond is connected into pretreatment module by interface module.
(3) by pretreatment module linking parsing module, statistical analysis is manually carried out, sets up rule.
(4) user behaviors log is judged according to the rule set up, will be judged to that the daily record of abnormal behaviour is stored in knowledge base In.
(5) by visualization model Connection Service module, visualization model is by the abnormal behaviour track of log analysis in user Interface carries out visualization and represents.
Prior art has the following disadvantages:
(1) data source is single, and treatment is analyzed just for daily record.
(2) cannot real-time judgment abnormal behaviour and user.
(3) artificial statistical analysis, the relatively costly and easy mistake judgement occurred to behavior are all relied on.
Accordingly, it would be desirable to solve following technical problem:
(1) realize excavating structural data, semi-structured data, the reception of unstructured data, storage, treatment.
(2) modeled using machine learning and replace artificial, improved determination rate of accuracy and save labour turnover.Additionally, training The model for going out can be not only used for the offline behavior of batch and judge, online quasi real time behavior is can be used for again and is judged.
(3) identification to abnormal behaviour eliminates the reliance on the powerful security rule base of system intialization, but by self adaptation Mode constantly carries out self-perfection.
The content of the invention
In order to solve the above technical problems, the invention provides a kind of abnormal behaviour discovery side based on big data machine learning Method, comprises the following steps:
1) raw security daily record data is pre-processed;
2) characteristic is extracted from the result by pretreatment;
3) characteristic is clustered, determines that each behavior sample in the raw security daily record data is different Normal behavior sample or normal behaviour sample, and it is respectively put into abnormal behaviour storehouse or normal behaviour storehouse;
4) obtain the new behavior sample data in data in new safe day, by with the normal behaviour storehouse, abnormal row For storehouse is compared, normal behaviour or abnormal behaviour are determined that it is;
5) the normal behaviour storehouse or abnormal behaviour storehouse are updated with the new behavior sample data;
6) when the normal behaviour storehouse and abnormal behaviour storehouse have enough normal behaviour and abnormal behaviour sample datas, Jump to step 7), otherwise jump to step 4);
7) Random Forest model is trained with the sample data in the normal behaviour storehouse and abnormal behaviour storehouse, using by instructing The Random Forest model for getting is deployed in real-time processing module and processed offline module respectively, with to follow-up new behavior Sample data carries out abnormal behaviour judgement, jumps to step 5).
Preferably, the step 2) the middle characteristic extracted, including:The time of user's using terminal, operation behavior class Not, the file type of operation;Characteristic to being extracted carries out vectorization.
Preferably, the step 3) include:The characteristic is clustered using Mllib, is specifically included:First use Canopy algorithms determine K cluster centre, then carry out K-Means clusters, will be less than certain threshold value or reality containing example after cluster The class that example is considerably less than other classes is labeled as exception class, and the example markers in class are abnormal behaviour, and other classes are normal class, wherein Example markers be normal behaviour.
Preferably, the step 4) include:Part sample data is randomly selected in the normal behaviour storehouse to be calculated for KNN Method notes abnormalities behavior, if the new behavior sample data is all higher than setting with the distance of the sample data randomly selected Threshold value, then the behavior of the new behavior sample data is abnormal behaviour, is otherwise normal behaviour;If the abnormal behaviour is through remarkable It is normal behaviour that work is studied and judged, then be normal behaviour;Respectively with the normal behaviour or the corresponding sample data of abnormal behaviour more The new normal behaviour storehouse or abnormal behaviour storehouse.
Preferably, the real-time processing module provides streaming computing capability, and carrying out user behavior in the way of quasi real time sentences Fixed, result of determination is stored in the high-performance data storehouse for providing user Real-time Data Service;
The batch processing module provides the batch processing ability of mass data, judges for training pattern and batch are offline, institute State batch processing module and include multiple timed tasks, with full dose or incremental mode processing data set, result of determination is stored in the height Performance database.
In order to solve the above technical problems, finding system the invention provides a kind of abnormal behaviour based on big data machine learning System, including:
Pretreatment module, pre-processes to raw security daily record data;
Characteristic extraction module, characteristic is extracted from the result by pretreatment;
Cluster module, clusters to the characteristic, determines each behavior in the raw security daily record data Sample is abnormal behaviour sample or normal behaviour sample, and is respectively put into abnormal behaviour storehouse or normal behaviour storehouse;
Behavior storehouse generation module, the new behavior sample data in acquisition new safe day in data, by normal with described Behavior storehouse, abnormal behaviour storehouse are compared, and determine that it is normal behaviour or abnormal behaviour;
Update module, the normal behaviour storehouse or abnormal behaviour storehouse are updated with the new behavior sample data;
Behavior determination module, random forest mould is trained with the sample data in the normal behaviour storehouse and abnormal behaviour storehouse Type, real-time processing module and processed offline module are deployed in using the Random Forest model obtained by training respectively, with Follow-up new behavior sample data carries out abnormal behaviour judgement.
Preferably, the characteristic of the extraction, including:The time of user's using terminal, class of operation, the file of operation Type;Characteristic to being extracted carries out vectorization.
Preferably, the cluster module is clustered using Mllib to the characteristic, is specifically included:First use Canopy algorithms determine K cluster centre, then carry out K-Means clusters, will be less than certain threshold value or reality containing example after cluster The class that example is considerably less than other classes is labeled as exception class, and the example markers in class are abnormal behaviour, and other classes are normal class, wherein Example markers be normal behaviour.
Preferably, behavior storehouse generation module, in the normal behaviour storehouse randomly selecting part sample data is used for KNN algorithms note abnormalities behavior, if the new behavior sample data is big with the distance of the sample data randomly selected In the threshold value of setting, then the behavior of the new behavior sample data is abnormal behaviour, is otherwise normal behaviour;If the abnormal behaviour It is normal behaviour by manually studying and judging, then is normal behaviour;Respectively with the normal behaviour or the corresponding sample of abnormal behaviour Data update the normal behaviour storehouse or abnormal behaviour storehouse.
In order to solve the above technical problems, the invention provides a kind of abnormal behaviour processing system based on big data machine learning System, the system includes:Data service module, real-time processing module and batch processing module;
The data service module is based on the preceding claim method and forms normal behaviour storehouse and abnormal behaviour storehouse;
Random Forest model is trained with the sample data in the normal behaviour storehouse and abnormal behaviour storehouse, using by training The Random Forest model for obtaining is deployed in real-time processing module and processed offline module respectively;
After the new samples data input system, two parts of identical sample datas are copied as, the real-time processing is input into respectively Module and processed offline module, abnormal behaviour judgement is carried out with to the sample data;
Wherein, the real-time processing module provides streaming computing capability, and user behavior judgement is carried out in the way of quasi real time, Result of determination is stored in the high-performance data storehouse for providing user Real-time Data Service;
The batch processing module provides the batch processing ability of mass data, judges for training pattern and batch are offline, institute State batch processing module and include multiple timed tasks, with full dose or incremental mode processing data set, result of determination is stored in the height Performance database.
Following technique effect is achieved by technical scheme:
1st, solve the problems, such as that initial stage quantity containing exemplar is very few.
2nd, replace artificial using machine learning algorithm, save cost of labor and time cost, and it is accurate to improve judgement Rate, effectively prevents the occurrence of judging by accident.
3rd, the running of platform is both abnormal behaviour discovery procedure, is again the process that self adjusts and updates, right The identification of abnormal behaviour eliminates the reliance on the powerful security rule base of system intialization, but is constantly carried out certainly by way of self adaptation I am perfect.
Brief description of the drawings
Fig. 1 is that prior art user abnormal behaviour finds flow chart
Fig. 2 is overview flow chart of the present invention
Fig. 3 is present system general frame figure
Fig. 4 is present invention specific implementation flow chart
Specific embodiment
Explanation of nouns:
Hadoop:Distributed system architecture, core design is HDFS and MapReduce.HDFS is the data of magnanimity There is provided storage, MapReduce provides calculating for the data of magnanimity.
Spark:The universal parallel Computational frame of similar Hadoop MapReduce, is Job different from MapReduce Between output result can be stored in internal memory, such calculating speed faster, and can preferably be applied to data mining and engineering Habit etc. needs the algorithm of iteration.
Lambda frameworks:The real-time big data treatment framework that Nathan Marz are proposed, integrates off-line calculation and real-time Calculate, merge immutableness, a series of framework principles such as read and write abruption and complicated sexual isolation, can integrated Hadoop, Spark etc. is each Class big data component.
Sqoop:Big data component, the transmission for carrying out data between big data platform and traditional relevant database.
MLlib:The machine learning storehouse of Spark.
Canopy:One kind of the clustering algorithm of unsupervised learning, is mainly used to determine the class number of cluster.
KMeans:K mean algorithms, one kind of the clustering algorithm of unsupervised learning.
KNN:K arest neighbors (K-NearestNeighbor) algorithm, one kind of the sorting algorithm of supervised learning.
Random Forest:Random forest, a kind of algorithm for being trained to sample and being predicted using many decision trees, Fall within the sorting algorithm of supervised learning.
Fig. 2 illustrates abnormal behaviour of the invention and finds flow chart.
(1) initial data is pre-processed
Cleaning conversion is carried out to initial data to extract.
(2) Feature Engineering
Rule of thumb obtain representing the feature of these data with analysis initial data after the pre-treatment.
(3) clustered using MLlib, obtained sample
Determine K cluster centre first with Canopy algorithms, then carry out K-Means algorithm clusters, will contain real after cluster Example is very few or is considerably less than the class of other classes labeled as exception class, and the example markers in class are abnormal behaviour, the reality in other classes Example is labeled as normal behaviour.
(4) manually study and judge, regeneration behavior storehouse
By cluster it is tagged to example after abnormal behaviour example is manually studied and judged again, will manually be judged to exception The data of behavior are stored in unlawful practice storehouse, and remaining is put into normal behaviour storehouse.Early stage is smaller due to sample size, is manually studied and judged It is, in order to improve sample quality, after certain sample size is run up to, then no longer manually to be studied and judged.
(5) classified using MLlib, regeneration behavior storehouse and training pattern
Preliminary classification, regeneration behavior storehouse, afterwards with the sample training in behavior storehouse are carried out to sample using KNN algorithms RandomForest models, the artificial rule base formulated is combined both for quasi real time behavior judgement after training model, is used for again The offline behavior of batch judges.
Cluster in step (3) is unsupervised learning, it is not necessary to sample data, and classification is then supervised learning, needs sample, The input exported as classification that will be clustered, so as to improve study and judge accuracy rate.
(6) during the result that offline behavior judges by real-time behavior judgement and in batches restores behavior storehouse, behavior storehouse exists always Update and perfect.
Accompanying drawing 3 is system architecture diagram of the invention.
The system has used for reference Lambda frameworks, is divided into real-time processing layer, batch processing layer and data service layer.Initial data connects Two parts are copied as after entering platform, real-time processing layer and batch processing layer is respectively enterd.
Real-time processing layer provides streaming computing capability, and user's judgement is carried out in the way of quasi real time, and the result of judgement is stored in The high-performance data storehouse of Real-time Data Service is provided user.
Batch processing layer provides the batch processing ability of mass data, judges for training pattern and batch are offline.Batch processing layer Comprising multiple timed tasks, with full dose or incremental mode processing data set, result of determination is also stored in database.
Accompanying drawing 4 is that abnormal behaviour of the invention finds specific embodiment.
1 data prediction
Security management and control terminal daily record is stored in traditional database, there is equipment unique identifier, user's unique identifier, operation The fields such as behavior.These data are imported in the data warehouse of big data platform using sqoop first, cleaning conversion is then carried out, Meaningless field is removed, missing values are filled.
2 Feature Engineerings
(1) for security management and control terminal daily record, following feature is rule of thumb extracted in initial data with statistical analysis:
1. user uses the time of security management and control terminal:The time period for occurring is operated, it is early, middle and late.
2. the type for operating:Supervision is reported, mail outgoing, outgoing office, the outer exchange of row.
3. the file type for operating:Office documents, compressed file, picture.
4. the data traffic of operation is accessed.
5. different terminals number, IP is used to change number of times, LoginLogout number of times.
(2) vectorization is carried out again become the accessible data of machine learning model.
3 modelings and judgement
Slightly clustered using Canopy algorithms, obtained the categorical measure of data set polymerization.
The cluster of higher precision is carried out using K-Means clustering methods, will after cluster containing example it is very few or be considerably less than its The class of its class is designated as exception class, and the example in class is designated as abnormal behaviour, and the example in other classes is designated as normal behaviour.K-Means It is tagged rear for classifying in cluster result figure, hence it is evident that deviate and the few class of amount containing example is labeled as exception class.
(3) rely on and manually study and judge one normal behaviour storehouse of small range of generation.
Specific method is:Manually check to be clustered and be designated as whether abnormal example has abnormal operation, be if it is labeled as Abnormal behaviour, abnormal behaviour storehouse is formed by the corresponding example of all of abnormal behaviour.
(4) randomly select part sample in normal behaviour storehouse to be noted abnormalities behavior for KNN algorithms, if new behavior With the threshold value that the Euclidean distance of each sample instance in the storehouse is both greater than setting, then the behavior is abnormal behaviour;Abnormal row It by manually studying and judging is normal behaviour to be, then update normal behaviour storehouse with the behavior.It is abnormal to use in KNN classification results figures Family all marks, but some non-abnormal users are also designated as exception.
(5) it is gloomy at random as sample training with these data when with enough normal behaviours and after abnormal behaviour data Woods model, the model that training is obtained is deployed in real-time processing module and processed offline module respectively carries out abnormal behaviour judgement. In RandomForest classification results figures, mislabeled user and significantly reduced.
Above example, sample standard deviation represent that identical is looked like, and indicate a security management and control terminal daily record.
By the present invention, solve the problems, such as that initial stage quantity containing exemplar is very few;Replace people using machine learning algorithm Work, saves cost of labor and time cost, and improves determination rate of accuracy, effectively prevents the occurrence of judging by accident;Platform Running is both abnormal behaviour discovery procedure, is again self adjustment and the process updated, to the identification of abnormal behaviour not The powerful security rule base of system intialization is relied on again, but self-perfection is constantly carried out by way of self adaptation.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent and improvement for being made etc. all should be protected in guarantor of the invention Within the scope of shield.

Claims (10)

1. a kind of abnormal behaviour based on big data machine learning finds method, comprises the following steps:
1) raw security daily record data is pre-processed;
2) characteristic is extracted from the result by pretreatment;
3) characteristic is clustered, determines that each behavior sample in the raw security daily record data is abnormal row It is sample or normal behaviour sample, and is respectively put into abnormal behaviour storehouse or normal behaviour storehouse;
4) obtain the new behavior sample data in data in new safe day, by with the normal behaviour storehouse, abnormal behaviour storehouse It is compared, determines that it is normal behaviour or abnormal behaviour;
5) the normal behaviour storehouse or abnormal behaviour storehouse are updated with the new behavior sample data;
6) when the normal behaviour storehouse and abnormal behaviour storehouse have enough normal behaviour and abnormal behaviour sample datas, redirect To step 7), otherwise jump to step 4);
7) Random Forest model is trained with the sample data in the normal behaviour storehouse and abnormal behaviour storehouse, using by training To the Random Forest model be deployed in real-time processing module and processed offline module respectively, with to follow-up new behavior sample Data carry out abnormal behaviour judgement, jump to step 5).
2. method according to claim 1, the step 2) in the characteristic extracted, including:User's using terminal Time, operation behavior classification, the file type of operation;Characteristic to being extracted carries out vectorization.
3. method according to claim 1, the step 3) include:The characteristic is clustered using Mllib, Specifically include:First determine K cluster centre with Canopy algorithms, then carry out K-Means clusters, will be less than containing example after cluster Certain threshold value or example are considerably less than the class of other classes labeled as exception class, and the example markers in class are abnormal behaviour, other classes It is normal class, example markers therein are normal behaviour.
4. method according to claim 1, the step 4) include:Part sample is randomly selected in the normal behaviour storehouse Notebook data notes abnormalities behavior for KNN algorithms, if the new behavior sample data and the sample data randomly selected Distance be all higher than setting threshold value, then the behavior of the new behavior sample data is abnormal behaviour, is otherwise normal behaviour;If The abnormal behaviour is normal behaviour by manually studying and judging, then be normal behaviour;Respectively with the normal behaviour or abnormal behaviour Corresponding sample data updates the normal behaviour storehouse or abnormal behaviour storehouse.
5. method according to claim 1, the real-time processing module provides streaming computing capability, in the way of quasi real time User behavior judgement is carried out, and result of determination is stored in the high-performance data storehouse that Real-time Data Service is provided user;
The batch processing module provides the batch processing ability of mass data, judges for training pattern and batch are offline, described batch Processing module includes multiple timed tasks, and with full dose or incremental mode processing data set, result of determination is stored in the high-performance Database.
6. a kind of abnormal behaviour based on big data machine learning finds system, including:
Pretreatment module, pre-processes to raw security daily record data;
Characteristic extraction module, characteristic is extracted from the result by pretreatment;
Cluster module, clusters to the characteristic, determines each behavior sample in the raw security daily record data It is abnormal behaviour sample or normal behaviour sample, and is respectively put into abnormal behaviour storehouse or normal behaviour storehouse;
Behavior storehouse generation module, obtains the new behavior sample data in data in new safe day, by with the normal behaviour Storehouse, abnormal behaviour storehouse are compared, and determine that it is normal behaviour or abnormal behaviour;
Update module, the normal behaviour storehouse or abnormal behaviour storehouse are updated with the new behavior sample data;
Behavior determination module, Random Forest model, profit are trained with the sample data in the normal behaviour storehouse and abnormal behaviour storehouse Real-time processing module and processed offline module are deployed in respectively with the Random Forest model obtained by training, with follow-up New behavior sample data carries out abnormal behaviour judgement.
7. system according to claim 6, the characteristic of the extraction, including:The time of user's using terminal, operation Classification, the file type of operation;Characteristic to being extracted carries out vectorization.
8. system according to claim 6, the cluster module is clustered using Mllib to the characteristic, is had Body includes:First determine K cluster centre with Canopy algorithms, then carry out K-Means clusters, certain will be less than containing example after cluster Individual threshold value or example are considerably less than the class of other classes labeled as exception class, and the example markers in class are abnormal behaviour, and other classes are Normal class, example markers therein are normal behaviour.
9. system according to claim 6, behavior storehouse generation module, portion is randomly selected in the normal behaviour storehouse Point sample data notes abnormalities behavior for KNN algorithms, if the new behavior sample data and the sample randomly selected The distance of data is all higher than the threshold value of setting, then the behavior of the new behavior sample data is abnormal behaviour, is otherwise normal behaviour; It is normal behaviour if the abnormal behaviour is normal behaviour by manually studying and judging;Respectively with the normal behaviour or exception The corresponding sample data of behavior updates the normal behaviour storehouse or abnormal behaviour storehouse.
10. a kind of abnormal behaviour processing system based on big data machine learning, the system includes:Data service module, in real time Processing module and batch processing module;
The data service module is based on any described methods of the claim 1-5 and forms normal behaviour storehouse and abnormal behaviour Storehouse;
Random Forest model is trained with the sample data in the normal behaviour storehouse and abnormal behaviour storehouse, is obtained using by training The Random Forest model be deployed in real-time processing module and processed offline module respectively;
After the new samples data input system, two parts of identical sample datas are copied as, the real-time processing module is input into respectively With processed offline module, abnormal behaviour judgement is carried out with to the sample data;
Wherein, the real-time processing module provides streaming computing capability, and user behavior judgement is carried out in the way of quasi real time, judges Result is stored in the high-performance data storehouse for providing user Real-time Data Service;
The batch processing module provides the batch processing ability of mass data, judges for training pattern and batch are offline, described batch Processing module includes multiple timed tasks, and with full dose or incremental mode processing data set, result of determination is stored in the high-performance Database.
CN201611232408.7A 2016-12-28 2016-12-28 Abnormal behavior discovery method and system based on big data machine learning Active CN106778259B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611232408.7A CN106778259B (en) 2016-12-28 2016-12-28 Abnormal behavior discovery method and system based on big data machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611232408.7A CN106778259B (en) 2016-12-28 2016-12-28 Abnormal behavior discovery method and system based on big data machine learning

Publications (2)

Publication Number Publication Date
CN106778259A true CN106778259A (en) 2017-05-31
CN106778259B CN106778259B (en) 2020-01-10

Family

ID=58921432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611232408.7A Active CN106778259B (en) 2016-12-28 2016-12-28 Abnormal behavior discovery method and system based on big data machine learning

Country Status (1)

Country Link
CN (1) CN106778259B (en)

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107204991A (en) * 2017-07-06 2017-09-26 深信服科技股份有限公司 A kind of server exception detection method and system
CN107291911A (en) * 2017-06-26 2017-10-24 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN107341095A (en) * 2017-06-27 2017-11-10 北京优特捷信息技术有限公司 A kind of method and device of intellectual analysis daily record data
CN107404473A (en) * 2017-06-06 2017-11-28 西安电子科技大学 Based on Mshield machine learning multi-mode Web application means of defences
CN107707541A (en) * 2017-09-28 2018-02-16 小花互联网金融服务(深圳)有限公司 A kind of attack daily record real-time detection method based on machine learning of streaming
CN107968840A (en) * 2017-12-15 2018-04-27 华北电力大学(保定) A kind of extensive power equipment monitoring, alarming Real-time Data Processing Method and system
CN108011809A (en) * 2017-12-04 2018-05-08 北京明朝万达科技股份有限公司 Anti-data-leakage analysis method and system based on user behavior and document content
CN108040052A (en) * 2017-12-13 2018-05-15 北京明朝万达科技股份有限公司 A kind of network security threats analysis method and system based on Netflow daily record datas
CN108319851A (en) * 2017-12-12 2018-07-24 中国电子科技集团公司电子科学研究院 A kind of abnormal behaviour active detecting method, equipment and storage medium
CN108416376A (en) * 2018-02-27 2018-08-17 北京东方天得科技有限公司 It is a kind of that system and method is managed in way logistics people's vehicle tracing and monitoring based on SVM
CN108512841A (en) * 2018-03-23 2018-09-07 四川长虹电器股份有限公司 A kind of intelligent system of defense and defence method based on machine learning
CN108614895A (en) * 2018-05-10 2018-10-02 中国移动通信集团海南有限公司 The recognition methods of abnormal data access behavior and data processing equipment
CN108718296A (en) * 2018-04-27 2018-10-30 广州西麦科技股份有限公司 Network management-control method, device and computer readable storage medium based on SDN network
CN108737222A (en) * 2018-06-29 2018-11-02 山东汇贸电子口岸有限公司 A kind of server exception method of real-time based on data extraction
CN108769079A (en) * 2018-07-09 2018-11-06 四川大学 A kind of Web Intrusion Detection Techniques based on machine learning
CN109034140A (en) * 2018-09-13 2018-12-18 哈尔滨工业大学 Industrial control network abnormal signal detection method based on deep learning structure
CN109189819A (en) * 2018-07-12 2019-01-11 华南师范大学 A kind of mobile k neighbour differentiation querying method, system and device
CN109246072A (en) * 2017-07-11 2019-01-18 波音公司 Network safety system with adaptive machine learning feature
CN109255001A (en) * 2018-08-31 2019-01-22 阿里巴巴集团控股有限公司 Maintaining method and device, the electronic equipment in interface instance library
CN109359098A (en) * 2018-10-31 2019-02-19 云南电网有限责任公司 A kind of dispatch data net behavior monitoring system and method
CN109472293A (en) * 2018-10-12 2019-03-15 国家电网有限公司 A kind of grid equipment file data error correction method based on machine learning
CN109472484A (en) * 2018-11-01 2019-03-15 凌云光技术集团有限责任公司 A kind of production process exception record method based on flow chart
CN109739846A (en) * 2018-12-27 2019-05-10 国电南瑞科技股份有限公司 A kind of electric network data mass analysis method
CN109871954A (en) * 2018-12-24 2019-06-11 腾讯科技(深圳)有限公司 Training sample generation method, method for detecting abnormality and device
CN110210512A (en) * 2019-04-19 2019-09-06 北京亿阳信通科技有限公司 A kind of automation daily record method for detecting abnormality and system
CN110430068A (en) * 2018-04-28 2019-11-08 华为技术有限公司 A kind of Feature Engineering method of combination and device
WO2019214511A1 (en) * 2018-05-11 2019-11-14 深圳市联软科技股份有限公司 Method for analyzing abnormal file operation behavior through clustering, system and terminal
CN110517469A (en) * 2019-08-08 2019-11-29 武汉兴图新科电子股份有限公司 A kind of intelligent alarm convergence method suitable for audio-video convergence platform
WO2020010461A1 (en) * 2018-07-12 2020-01-16 Cyber Defence Qcd Corporation Systems and methods of cyber-monitoring which utilizes a knowledge database
CN110716868A (en) * 2019-09-16 2020-01-21 腾讯科技(深圳)有限公司 Abnormal program behavior detection method and device
CN110738827A (en) * 2018-07-20 2020-01-31 珠海格力电器股份有限公司 Abnormity early warning method, system, device and storage medium of electric appliance
WO2020034756A1 (en) * 2018-08-14 2020-02-20 阿里巴巴集团控股有限公司 Method and apparatus for predicting target device, and electronic device and storage medium
CN110889441A (en) * 2019-11-19 2020-03-17 海南电网有限责任公司海南输变电检修分公司 Distance and point density based substation equipment data anomaly identification method
CN110889451A (en) * 2019-11-26 2020-03-17 Oppo广东移动通信有限公司 Event auditing method and device, terminal equipment and storage medium
CN111597549A (en) * 2020-04-17 2020-08-28 国网浙江省电力有限公司湖州供电公司 Network security behavior identification method and system based on big data
CN107426199B (en) * 2017-07-05 2020-10-30 浙江鹏信信息科技股份有限公司 Method and system for detecting and analyzing network abnormal behaviors
CN112001533A (en) * 2020-08-06 2020-11-27 众安信息技术服务有限公司 Parameter detection method and device and computer system
CN112383575A (en) * 2021-01-18 2021-02-19 北京晶未科技有限公司 Method, electronic device and electronic equipment for information security
CN112488226A (en) * 2020-12-10 2021-03-12 中国电子科技集团公司第三十研究所 Terminal abnormal behavior identification method based on machine learning algorithm
CN112784862A (en) * 2019-11-07 2021-05-11 中国石油化工股份有限公司 Fault diagnosis and identification method for refining process of atmospheric and vacuum distillation unit
CN112926773A (en) * 2021-02-23 2021-06-08 深圳市北斗智能科技有限公司 Riding safety early warning method and device, electronic equipment and storage medium
CN113722707A (en) * 2021-11-02 2021-11-30 西安热工研究院有限公司 Database abnormal access detection method, system and equipment based on distance measurement
CN114491282A (en) * 2022-03-03 2022-05-13 哈尔滨市蓝标智能科技有限公司 Abnormal user behavior analysis method and system based on cloud computing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103095711A (en) * 2013-01-18 2013-05-08 重庆邮电大学 Application layer distributed denial of service (DDoS) attack detection method and defensive system aimed at website
CN104735074A (en) * 2015-03-31 2015-06-24 江苏通付盾信息科技有限公司 Malicious URL detection method and implement system thereof
CN104954453A (en) * 2015-06-02 2015-09-30 浙江工业大学 Data mining REST service platform based on cloud computing
CN105224872A (en) * 2015-09-30 2016-01-06 河南科技大学 A kind of user's anomaly detection method based on neural network clustering
CN105553998A (en) * 2015-12-23 2016-05-04 中国电子科技集团公司第三十研究所 Network attack abnormality detection method
CN105677615A (en) * 2016-01-04 2016-06-15 北京邮电大学 Distributed machine learning method based on weka interface

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103095711A (en) * 2013-01-18 2013-05-08 重庆邮电大学 Application layer distributed denial of service (DDoS) attack detection method and defensive system aimed at website
CN104735074A (en) * 2015-03-31 2015-06-24 江苏通付盾信息科技有限公司 Malicious URL detection method and implement system thereof
CN104954453A (en) * 2015-06-02 2015-09-30 浙江工业大学 Data mining REST service platform based on cloud computing
CN105224872A (en) * 2015-09-30 2016-01-06 河南科技大学 A kind of user's anomaly detection method based on neural network clustering
CN105553998A (en) * 2015-12-23 2016-05-04 中国电子科技集团公司第三十研究所 Network attack abnormality detection method
CN105677615A (en) * 2016-01-04 2016-06-15 北京邮电大学 Distributed machine learning method based on weka interface

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107404473A (en) * 2017-06-06 2017-11-28 西安电子科技大学 Based on Mshield machine learning multi-mode Web application means of defences
CN107291911A (en) * 2017-06-26 2017-10-24 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN107291911B (en) * 2017-06-26 2020-01-21 北京奇艺世纪科技有限公司 Anomaly detection method and device
CN107341095A (en) * 2017-06-27 2017-11-10 北京优特捷信息技术有限公司 A kind of method and device of intellectual analysis daily record data
CN107426199B (en) * 2017-07-05 2020-10-30 浙江鹏信信息科技股份有限公司 Method and system for detecting and analyzing network abnormal behaviors
CN107204991A (en) * 2017-07-06 2017-09-26 深信服科技股份有限公司 A kind of server exception detection method and system
CN109246072A (en) * 2017-07-11 2019-01-18 波音公司 Network safety system with adaptive machine learning feature
CN107707541A (en) * 2017-09-28 2018-02-16 小花互联网金融服务(深圳)有限公司 A kind of attack daily record real-time detection method based on machine learning of streaming
CN108011809A (en) * 2017-12-04 2018-05-08 北京明朝万达科技股份有限公司 Anti-data-leakage analysis method and system based on user behavior and document content
CN108319851A (en) * 2017-12-12 2018-07-24 中国电子科技集团公司电子科学研究院 A kind of abnormal behaviour active detecting method, equipment and storage medium
CN108319851B (en) * 2017-12-12 2022-03-11 中国电子科技集团公司电子科学研究院 Abnormal behavior active detection method, equipment and storage medium
CN108040052A (en) * 2017-12-13 2018-05-15 北京明朝万达科技股份有限公司 A kind of network security threats analysis method and system based on Netflow daily record datas
CN107968840B (en) * 2017-12-15 2020-10-09 华北电力大学(保定) Real-time processing method and system for monitoring alarm data of large-scale power equipment
CN107968840A (en) * 2017-12-15 2018-04-27 华北电力大学(保定) A kind of extensive power equipment monitoring, alarming Real-time Data Processing Method and system
CN108416376A (en) * 2018-02-27 2018-08-17 北京东方天得科技有限公司 It is a kind of that system and method is managed in way logistics people's vehicle tracing and monitoring based on SVM
CN108512841B (en) * 2018-03-23 2021-03-16 四川长虹电器股份有限公司 Intelligent defense system and method based on machine learning
CN108512841A (en) * 2018-03-23 2018-09-07 四川长虹电器股份有限公司 A kind of intelligent system of defense and defence method based on machine learning
CN108718296A (en) * 2018-04-27 2018-10-30 广州西麦科技股份有限公司 Network management-control method, device and computer readable storage medium based on SDN network
CN110430068A (en) * 2018-04-28 2019-11-08 华为技术有限公司 A kind of Feature Engineering method of combination and device
CN110430068B (en) * 2018-04-28 2021-04-09 华为技术有限公司 Characteristic engineering arrangement method and device
CN108614895A (en) * 2018-05-10 2018-10-02 中国移动通信集团海南有限公司 The recognition methods of abnormal data access behavior and data processing equipment
CN108614895B (en) * 2018-05-10 2020-09-29 中国移动通信集团海南有限公司 Abnormal data access behavior identification method and data processing device
WO2019214511A1 (en) * 2018-05-11 2019-11-14 深圳市联软科技股份有限公司 Method for analyzing abnormal file operation behavior through clustering, system and terminal
CN108737222A (en) * 2018-06-29 2018-11-02 山东汇贸电子口岸有限公司 A kind of server exception method of real-time based on data extraction
CN108769079A (en) * 2018-07-09 2018-11-06 四川大学 A kind of Web Intrusion Detection Techniques based on machine learning
CN109189819B (en) * 2018-07-12 2021-08-24 华南师范大学 Mobile k neighbor differential query method, system and device
CN109189819A (en) * 2018-07-12 2019-01-11 华南师范大学 A kind of mobile k neighbour differentiation querying method, system and device
US12003515B2 (en) 2018-07-12 2024-06-04 Cyber Defence Qcd Corporation Systems and method of cyber-monitoring which utilizes a knowledge database
WO2020010461A1 (en) * 2018-07-12 2020-01-16 Cyber Defence Qcd Corporation Systems and methods of cyber-monitoring which utilizes a knowledge database
CN110738827A (en) * 2018-07-20 2020-01-31 珠海格力电器股份有限公司 Abnormity early warning method, system, device and storage medium of electric appliance
WO2020034756A1 (en) * 2018-08-14 2020-02-20 阿里巴巴集团控股有限公司 Method and apparatus for predicting target device, and electronic device and storage medium
CN109255001A (en) * 2018-08-31 2019-01-22 阿里巴巴集团控股有限公司 Maintaining method and device, the electronic equipment in interface instance library
CN109034140A (en) * 2018-09-13 2018-12-18 哈尔滨工业大学 Industrial control network abnormal signal detection method based on deep learning structure
CN109034140B (en) * 2018-09-13 2021-05-04 哈尔滨工业大学 Industrial control network signal abnormity detection method based on deep learning structure
CN109472293A (en) * 2018-10-12 2019-03-15 国家电网有限公司 A kind of grid equipment file data error correction method based on machine learning
CN109359098A (en) * 2018-10-31 2019-02-19 云南电网有限责任公司 A kind of dispatch data net behavior monitoring system and method
CN109472484A (en) * 2018-11-01 2019-03-15 凌云光技术集团有限责任公司 A kind of production process exception record method based on flow chart
CN109472484B (en) * 2018-11-01 2021-08-03 凌云光技术股份有限公司 Production process abnormity recording method based on flow chart
CN109871954A (en) * 2018-12-24 2019-06-11 腾讯科技(深圳)有限公司 Training sample generation method, method for detecting abnormality and device
CN109871954B (en) * 2018-12-24 2022-12-02 腾讯科技(深圳)有限公司 Training sample generation method, abnormality detection method and apparatus
CN109739846A (en) * 2018-12-27 2019-05-10 国电南瑞科技股份有限公司 A kind of electric network data mass analysis method
CN110210512B (en) * 2019-04-19 2024-03-26 北京亿阳信通科技有限公司 Automatic log anomaly detection method and system
CN110210512A (en) * 2019-04-19 2019-09-06 北京亿阳信通科技有限公司 A kind of automation daily record method for detecting abnormality and system
CN110517469A (en) * 2019-08-08 2019-11-29 武汉兴图新科电子股份有限公司 A kind of intelligent alarm convergence method suitable for audio-video convergence platform
CN110716868B (en) * 2019-09-16 2022-02-25 腾讯科技(深圳)有限公司 Abnormal program behavior detection method and device
CN110716868A (en) * 2019-09-16 2020-01-21 腾讯科技(深圳)有限公司 Abnormal program behavior detection method and device
CN112784862A (en) * 2019-11-07 2021-05-11 中国石油化工股份有限公司 Fault diagnosis and identification method for refining process of atmospheric and vacuum distillation unit
CN110889441A (en) * 2019-11-19 2020-03-17 海南电网有限责任公司海南输变电检修分公司 Distance and point density based substation equipment data anomaly identification method
CN110889451A (en) * 2019-11-26 2020-03-17 Oppo广东移动通信有限公司 Event auditing method and device, terminal equipment and storage medium
CN110889451B (en) * 2019-11-26 2023-07-07 Oppo广东移动通信有限公司 Event auditing method, device, terminal equipment and storage medium
CN111597549A (en) * 2020-04-17 2020-08-28 国网浙江省电力有限公司湖州供电公司 Network security behavior identification method and system based on big data
CN112001533A (en) * 2020-08-06 2020-11-27 众安信息技术服务有限公司 Parameter detection method and device and computer system
CN112488226B (en) * 2020-12-10 2022-11-01 中国电子科技集团公司第三十研究所 Terminal abnormal behavior identification method based on machine learning algorithm
CN112488226A (en) * 2020-12-10 2021-03-12 中国电子科技集团公司第三十研究所 Terminal abnormal behavior identification method based on machine learning algorithm
CN112383575B (en) * 2021-01-18 2021-05-04 北京晶未科技有限公司 Method, electronic device and electronic equipment for information security
CN112383575A (en) * 2021-01-18 2021-02-19 北京晶未科技有限公司 Method, electronic device and electronic equipment for information security
CN112926773A (en) * 2021-02-23 2021-06-08 深圳市北斗智能科技有限公司 Riding safety early warning method and device, electronic equipment and storage medium
CN113722707A (en) * 2021-11-02 2021-11-30 西安热工研究院有限公司 Database abnormal access detection method, system and equipment based on distance measurement
CN114491282A (en) * 2022-03-03 2022-05-13 哈尔滨市蓝标智能科技有限公司 Abnormal user behavior analysis method and system based on cloud computing
CN114491282B (en) * 2022-03-03 2022-10-04 中软数智信息技术(武汉)有限公司 Abnormal user behavior analysis method and system based on cloud computing

Also Published As

Publication number Publication date
CN106778259B (en) 2020-01-10

Similar Documents

Publication Publication Date Title
CN106778259A (en) A kind of abnormal behaviour based on big data machine learning finds method and system
Zhong et al. Applying big data based deep learning system to intrusion detection
US10187415B2 (en) Cognitive information security using a behavioral recognition system
US12032909B2 (en) Perceptual associative memory for a neuro-linguistic behavior recognition system
CN112365171B (en) Knowledge graph-based risk prediction method, device, equipment and storage medium
US20240071037A1 (en) Mapper component for a neuro-linguistic behavior recognition system
CN107992746A (en) Malicious act method for digging and device
CN110263324A (en) Text handling method, model training method and device
CN107111609A (en) Lexical analyzer for neural language performance identifying system
CN116865994A (en) Network data security prediction method based on big data
Pratama et al. Scalable teacher forcing network for semi-supervised large scale data streams
Schreckenberger et al. Online random feature forests for learning in varying feature spaces
CN113011893B (en) Data processing method, device, computer equipment and storage medium
KR20220007395A (en) Apparatus and Method for Classifying Attack Tactics of Security Event in Industrial Control System
Babu et al. Improved Monarchy Butterfly Optimization Algorithm (IMBO): Intrusion Detection Using Mapreduce Framework Based Optimized ANU-Net.
CN112906722A (en) Data anomaly detection method, device and equipment
US10657434B2 (en) Anomaly score adjustment across anomaly generators
CN110705597B (en) Network early event detection method and system based on event cause and effect extraction
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
Alhumayyani et al. Smartphone-based Recognition of Human Activities using Shallow Machine Learning
CN116522918A (en) Model training method, address classification method, device, equipment and storage medium
Xia et al. A Gradient Boosting Based Classification Technique for Assisted Prediction Algorithm Research
Rana et al. A Pre-processing Model for Feature Extraction Based on K-mean, PSO and ABC
CN116055156A (en) Phishing address detection method and device
Dabass A Novel Neural Network Technique for handling challenges of Cyber security.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant