CN109391624A - A kind of terminal access data exception detection method and device based on machine learning - Google Patents
A kind of terminal access data exception detection method and device based on machine learning Download PDFInfo
- Publication number
- CN109391624A CN109391624A CN201811352235.1A CN201811352235A CN109391624A CN 109391624 A CN109391624 A CN 109391624A CN 201811352235 A CN201811352235 A CN 201811352235A CN 109391624 A CN109391624 A CN 109391624A
- Authority
- CN
- China
- Prior art keywords
- access data
- abnormal
- unit
- detection model
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 201
- 238000010801 machine learning Methods 0.000 title claims abstract description 55
- 230000002159 abnormal effect Effects 0.000 claims abstract description 169
- 239000013598 vector Substances 0.000 claims abstract description 45
- 238000000034 method Methods 0.000 claims abstract description 30
- 206010000117 Abnormal behaviour Diseases 0.000 claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000012360 testing method Methods 0.000 claims abstract description 6
- 230000001960 triggered effect Effects 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 8
- 238000007635 classification algorithm Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 5
- 238000013139 quantization Methods 0.000 claims description 3
- 230000005611 electricity Effects 0.000 abstract description 8
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 abstract description 8
- -1 heat Substances 0.000 abstract 1
- 230000005856 abnormality Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000002547 anomalous effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 241000700605 Viruses Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012954 risk control Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention provides a kind of terminal access data exception detection method based on machine learning, it include: according to predetermined multidimensional characteristic library, handle the present terminal access data received, to obtain the feature vector of every access data, it wherein, include at least one access data in present terminal access data;Input information of the feature vector that will acquire as machine learning model determines the testing result of present terminal access data, wherein the machine learning model is preparatory training and the abnormal access detection model by accuracy test.This method using the abnormal access detection model detection terminal access data established abnormal behaviour, ensured electricity, water, heat, gas heat energy metering integrated acquisition system safe and reliable operation.
Description
Technical Field
The invention relates to the technical field of information security, in particular to a method and a device for detecting abnormal terminal access data based on machine learning.
Background
There are many-sided safety risk in electricity, water, heat, the hot energy measurement integration collection system of gas, in order to guarantee to gather data transmission safety, not only need take safety protection modes such as encryption, network isolation to data, still need detect the malicious action that probably exists in the access data from aspect such as exchange protocol, if: data tampering or the implantation of malicious programs (trojans, viruses, malicious code, etc.).
At present, the safety risk control level of the electricity, water, heat and gas heat energy metering integrated acquisition system is lower, and the electricity, water, heat and gas heat energy metering integrated acquisition system is difficult to reliably operate at a higher safety level.
Disclosure of Invention
The invention provides a machine learning-based method for detecting abnormal terminal access data, which aims to solve the problems of insufficient safety level and difficult reliable operation of the conventional energy metering integrated acquisition system.
In a first aspect, the present invention provides a method for detecting abnormal terminal access data based on machine learning, which includes the following steps:
processing the received current terminal access data according to a predetermined multidimensional feature library to obtain a feature vector of each piece of access data, wherein the current terminal access data comprises at least one piece of access data;
and determining the detection result of the current terminal access data by taking the acquired feature vector as the input information of a machine learning model, wherein the machine learning model is an abnormal access detection model which is trained in advance and passes a correct rate test.
Further, the method further comprises:
establishing an abnormal access detection model:
analyzing key fields of each piece of access data from the stored terminal access data according to protocol specifications;
extracting a multidimensional feature library of each piece of access data according to protocol regulations and different characteristics of normal and abnormal messages;
and establishing an abnormal access detection model corresponding to the multidimensional feature library by adopting the training set, taking the feature vector as input, taking the abnormal behavior as output and adopting a machine learning classification algorithm.
Further, the method further comprises:
establishing a multi-dimensional feature library:
respectively analyzing the characteristic vector of each piece of access data according to the collected abnormal terminal access data and the collected normal terminal access data;
and comparing the length characteristics of each field, the quantity characteristics of each field and the type characteristics of different abnormal behaviors in the abnormal and normal terminal access data to obtain the abnormal characteristic parameters of the abnormal terminal access data and form a multi-dimensional characteristic library.
Further, the method comprises the following steps of,
when the training set is collected, a plurality of pieces of access data with identical feature vectors are combined into one piece of access data.
Further, the method comprises the following steps of,
after the step of establishing the abnormal access detection model, the method further comprises the following steps:
acquiring a characteristic vector of the access data of the terminal to be tested, detecting the access data of each terminal to be tested by using the abnormal access detection model, and generating a detection result corresponding to the access data of the terminal to be tested;
comparing the generated detection result with the abnormal behavior of the access data of the terminal to be tested to obtain the correct rate of the abnormal behavior of the access data of the terminal to be tested detected by the abnormal access detection model;
judging whether the accuracy is greater than a preset threshold value or not;
if yes, judging that the abnormal access detection model is effective;
and if not, counting the contribution degree of the abnormal access detection model to each characteristic parameter in the multi-dimensional characteristic library, and re-executing the step of establishing the abnormal access detection model according to the counted contribution degree.
In a second aspect, the present invention provides a device for detecting abnormal terminal access data based on machine learning, including:
the device comprises a first characteristic parameter obtaining unit, a detection result analyzing unit and a first detection result generating unit; wherein,
a first characteristic parameter obtaining unit, configured to obtain a characteristic parameter of current access data;
the detection result obtaining unit is used for obtaining the detection result of the current access data by taking the obtained characteristic parameters as the preset input information of the detection model, wherein the detection result is used for indicating that the current access data contains the characteristic value of the abnormal behavior; the detection model is a detection model aiming at the characteristic parameters, and the detection model is updated according to the obtained characteristic parameters under the condition that the detection model meets preset updating conditions;
the detection result analysis unit is used for analyzing the detection result, judging whether the current access data contains abnormal behaviors or not and triggering the first detection result generation unit;
and the first detection result generation unit is used for generating an abnormal detection result aiming at the current access data.
Further, the device further comprises:
the device comprises an access data obtaining unit, an abnormal characteristic parameter obtaining unit, an access data merging unit, an abnormal characteristic parameter quantifying unit and a detection model establishing unit; wherein,
the access data acquisition unit is used for acquiring the access data with the marks from the stored access data as a training data set;
an abnormal characteristic parameter obtaining unit, configured to analyze an abnormal characteristic parameter of each piece of abnormal access data according to the obtained abnormal access data and the obtained normal access data, where the abnormal characteristic parameter is a characteristic parameter of the abnormal access data;
the access data merging unit is used for merging the access data with the completely same abnormal characteristic parameters into one piece of access data;
the abnormal characteristic parameter quantization unit is used for quantizing the abnormal characteristic parameters of each piece of combined access data;
and the detection model establishing unit is used for establishing a detection model aiming at the characteristic parameters according to a preset machine learning-based classification algorithm.
Further, in the device, the first and second electrodes,
an abnormal characteristic parameter obtaining unit includes:
an abnormal field obtaining subunit, a feature library determining subunit and a feature vector obtaining subunit; wherein,
an abnormal field obtaining subunit, configured to compare the obtained available abnormal access data with each field in the obtained available normal access data to obtain an abnormal field of the abnormal access data;
the characteristic library determining subunit is used for determining a multidimensional characteristic library of the abnormal characteristics according to the abnormal fields, the length characteristics of the fields, the quantity characteristics of the fields and the type characteristics of the access data, wherein the multidimensional characteristic library is used for storing abnormal characteristic parameters of the abnormal characteristics;
and the abnormal characteristic parameter obtaining subunit is used for obtaining the characteristic vector of each piece of access data according to the abnormal characteristic parameters contained in the characteristic library.
Further, the device further comprises:
the device comprises a second feature vector obtaining unit, a second detection result generating unit, a correct rate obtaining unit, a correct rate judging unit, a success judging unit and a contribution degree counting unit; wherein,
the second characteristic parameter obtaining unit is used for obtaining a characteristic vector of the access data to be tested; the second detection result generation unit is used for detecting each piece of access data to be tested according to the detection model and the obtained characteristic vector of the access data to be tested and generating a detection result aiming at the access data to be tested;
a correctness obtaining unit, configured to obtain, according to each generated detection result, a detection correctness according to the access data of the detection model;
the accuracy judging unit is used for judging whether the accuracy is greater than a preset threshold value, if so, the success judging unit is triggered, and if not, the contribution degree counting unit is triggered;
the success judging unit is used for judging that the establishment of the detection model is successful; and the contribution degree counting unit is used for counting the contribution degree of each abnormal feature in the detection model, re-acquiring the abnormal feature vector of each abnormal access data according to the counted contribution degree, and triggering the access data merging unit to merge the access data with the same abnormal feature parameter into one access data.
Further, the device further comprises:
the device comprises a type judging unit and a detection model updating unit;
the type judging unit is used for judging whether the type of the current access data is an unknown type, and if so, the detection model updating unit is triggered;
and the detection model updating unit is used for updating the detection model according to the type of the current access data.
Compared with the prior art, the terminal access data abnormity detection method based on machine learning provided by the invention adopts a four-table-in-one terminal access data abnormity detection method based on machine learning on the basis of the existing boundary safety protection in the original electricity, water, gas and heat energy metering integrated acquisition system, and ensures the safe and reliable operation of the electricity, water, heat and gas heat energy metering integrated acquisition system by extracting the characteristic vector of the abnormal terminal access data and establishing an abnormal access detection model by using a machine learning algorithm.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
fig. 1 is a schematic flowchart of a method for detecting abnormal terminal access data based on machine learning according to a preferred embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a machine learning-based device for detecting abnormal access data of a terminal according to a preferred embodiment of the present invention;
fig. 3 is a schematic diagram of an anomaly detection method for terminal access data based on machine learning in the preferred embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Under the background of big data, the abnormal types of massive terminal access data and layered infinite access data of the electricity, water, heat and gas heat energy metering integrated acquisition system require higher adaptability and expandability of an abnormal detection technology.
The machine learning approach has unique advantages in this regard. Therefore, the embodiment of the invention applies machine learning to the field of abnormal detection of terminal access data so as to solve the problem of safe and reliable operation of the electricity, water, heat and gas heat energy metering integrated acquisition system.
In order to enhance the security of data transmission of the 'four-in-one' terminal access, the invention provides a method and a device for detecting data abnormality of the 'four-in-one' terminal access based on machine learning on the basis of the original security protection modes such as data encryption, network isolation and the like.
The method is used between the main station and the terminal and used for detecting abnormal behaviors in the access data uploaded to the main station by the terminal, so that intelligent safety protection of the main station equipment is realized.
As shown in fig. 1, the method for detecting abnormal access data of a "four-table-in-one" terminal based on machine learning in this embodiment is applicable to access data of a terminal conforming to the 376.1 primary station communication protocol, and includes:
step S10: processing the received current terminal access data according to a predetermined multidimensional feature library to obtain a feature vector of each piece of access data, wherein the current terminal access data comprises at least one piece of access data;
step S20: and determining the detection result of the current terminal access data by taking the acquired feature vector as the input information of a machine learning model, wherein the machine learning model is an abnormal access detection model which is trained in advance and passes a correct rate test.
It should be understood that the multidimensional feature library comprises a plurality of feature parameters for characterizing abnormal behavior contained in the access data; and each piece of access data forms a feature vector of each piece of access data corresponding to the values of the plurality of feature parameters. And the abnormal access detection model corresponds to the multi-dimensional feature library.
The machine learning model is an abnormal access detection model established based on machine learning. Specifically, the machine learning model classifies the input feature vectors, thereby determining whether the current access data includes abnormal behavior, and may generate an abnormal detection result for the current access data.
Specifically, establishing the abnormal access detection model based on machine learning comprises:
analyzing key fields of each piece of access data from the stored terminal access data according to protocol specifications; extracting a multidimensional feature library of each piece of access data according to protocol regulations and different characteristics of normal and abnormal messages;
the multi-dimensional feature library is used to characterize the degree of anomaly of the accessed data from multiple dimensions.
When the training set is collected, in order to reduce the total amount of access data, a plurality of pieces of access data with completely identical feature vectors are combined into one piece of access data.
And establishing an abnormal access detection model corresponding to most of feature libraries by adopting the training set, taking the feature vectors as input, taking abnormal behaviors as output and adopting a machine learning classification algorithm.
Specifically, according to the collected abnormal terminal access data and the collected normal terminal access data, respectively analyzing the feature vector of each piece of access data, thereby establishing a multidimensional feature library, comprising:
comparing each field in the abnormal and normal terminal access data to obtain abnormal characteristic parameters of the abnormal terminal access data;
specifically, a plurality of abnormal characteristic parameters are determined according to the length characteristics of each field, the number characteristics of each field and the type characteristics of different abnormal behaviors, and a multi-dimensional characteristic library is formed.
Preferably, the characteristic parameters include: the type, protocol specification abnormal characteristic and service abnormal characteristic of the current access data.
That is, the multi-dimensional feature library stores a plurality of abnormal feature parameters corresponding to different abnormal behaviors; during subsequent processing, the feature vector of each piece of access data can be obtained according to the abnormal feature parameters contained in the multidimensional feature library.
Further, in the case where a preset update condition is satisfied, the detection model may be updated according to the obtained detection result.
Specifically, after the establishment of the abnormal access detection model, the method further includes:
acquiring a characteristic vector of the access data of the terminal to be tested, detecting the access data of each terminal to be tested by using the abnormal access detection model, and generating a detection result corresponding to the access data of the terminal to be tested;
comparing the generated detection result with the abnormal behavior of the access data of the terminal to be tested to obtain the correct rate of the abnormal behavior of the access data of the terminal to be tested detected by the abnormal access detection model;
judging whether the accuracy is greater than a preset threshold value or not; if yes, judging that the abnormal access detection model is effective; and if not, counting the contribution degree of the abnormal access detection model to each characteristic parameter in the multi-dimensional characteristic library, and re-executing the step of establishing the abnormal access detection model according to the counted contribution degree.
In addition, if the abnormal type of the current terminal access data is judged to be an unknown type which is not included in the abnormal access detection model or the multidimensional feature library, the step of establishing the abnormal access detection model or determining the multidimensional feature library can be executed again according to the abnormal type of the current access data.
In specific implementation, as shown in fig. 3, the method for detecting the abnormality of the terminal access data applicable to the "four-table-in-one" mainly includes the following steps: a feature analysis step, a machine learning step and an abnormality detection step. The feature analysis step is used for abstracting the correlation attributes of the original data into feature vectors which can be identified by a machine learning model, and further improving the data processing speed and reducing the complexity of model training through a duplication removal process.
And in the machine learning step, training data with low redundancy and obvious characteristic weight obtained in the previous step are utilized to train and select a machine learning model suitable for terminal data anomaly detection and structural parameters thereof.
And in the anomaly detection step, the newly added data is classified and detected on line by loading the model file generated in the machine learning step, and a detection result is generated.
It should be understood that the model file generated in the training process is also the established abnormal access detection model.
In addition, the method continuously adjusts and updates the selection of specific fields in the feature library and the protocol according to the classification performance of the model on the newly added data, thereby improving the identification capability and the adaptability of the anomaly detection method to the newly added anomaly types.
Detailed operations included in the respective steps are described in detail below.
① parsing and extracting terminal access data characteristics
The extraction of the characteristic with strong representation capability from the original terminal access data is the key to complete the anomaly detection. Feature extraction is a process of abstracting the correlation properties of raw data into feature vectors that can be recognized by a machine learning model. By analyzing the characteristics of data exchange protocols of different electric meters and combining with the application of expert domain knowledge, a multidimensional feature library with strong characterization capability is extracted from a data packet, and the feature library is input into a machine learning model to train feature parameters.
Preferably, the main features are of the following four categories:
(1) message header format exception
The characteristic is extracted aiming at the condition that the data packet head does not conform to the ammeter protocol constraint, and mainly comprises the following four characteristics:
a) the second start character is not 68H;
b) the length of the message header is incorrect (greater or less than 6 bytes);
c) the first start character is not 68H;
d) the values of the protocol identification are not 01 and 10.
(2) End of message format exception
Such features are extracted for whether the end of message character is a prescribed value. Mainly comprises the following characteristics: the end character is not 16H.
(3) Partial exception of data unit
Such features are extracted for the data unit portion of the data packet. The method mainly comprises the following six characteristics:
a) the length of the data unit identification is greater than 0 and less than 4;
b) when AFN is 02, the length of the data unit is not 0 or 6;
c) the range of the BCD code value exceeds 0-9;
d) in the message in the uplink direction, the AFN codes are 01H, 04H and 05H
e) Fn is more than 0-248;
f) for some messages of a particular Fn, the length of the data unit is less than the minimum possible (e.g.: transparent forwarding or event reporting messages).
(4) Event content section
Such features extract features for the data of the event log portion. Mainly comprises the following characteristics: the difference between the end pointer and the start pointer of the event record (i.e., the number of uploaded events) is negative.
② establishing abnormal access detection model
When an abnormal access detection model of terminal access data is established, a machine learning classification algorithm commonly used in the abnormal detection field is selected as a candidate model, such as: a decision tree, a random forest and the like, and completing model training on a training data set established based on the feature library; and selecting a model with the optimal classification performance as a final abnormal access detection model by adopting a k-fold cross validation method.
③ anomaly detection of newly added data and incremental update of model
And for the newly added terminal access data, converting the newly added terminal access data into a feature vector by adopting a feature extraction technology in ①, loading a model file generated in a training process, predicting and detecting whether the newly added terminal access data contains abnormal behaviors or not, and generating a detection report.
It should be understood that the model file generated in the training process is also the established abnormal access detection model.
And further, continuously adjusting and updating the multidimensional feature library according to the detection result of the newly added data, continuously adjusting the selection of protocol attributes in the terminal access data, periodically training a new detection model, and updating the detection model by comparing the classification performance of the new detection model and the old detection model, thereby improving the identification capability and expandability of the abnormal access detection model of the terminal data on the newly generated abnormal type.
On the other hand, the anomaly detection device applied to the terminal access data of the four-table-in-one in one embodiment of the invention comprises the following components:
and the abnormal access detection model training unit comprises a first characteristic parameter obtaining subunit, a detection result analyzing subunit and a first detection result generating subunit.
In specific implementation, the first characteristic parameter obtaining subunit obtains the characteristic parameters of the current access data, and combines the terminal access data with the same characteristic parameters into one piece of access data;
the detection result obtaining subunit uses the characteristic parameters as input information of a preset machine learning model to obtain a detection result of the current access data;
the detection result analysis subunit analyzes the obtained detection result of the current access data and judges whether the current access data contains abnormal behaviors;
the first detection result generation subunit responds to the trigger of the detection result analysis subunit and generates an abnormal detection result aiming at the current access data;
newly added data abnormality detection and model update unit, which
The device comprises a second characteristic parameter obtaining subunit, a second detection result generating subunit, a correct rate obtaining subunit, a correct rate judging subunit, a success judging subunit and a contribution degree counting subunit;
in specific implementation, the second characteristic parameter obtaining subunit obtains the characteristic parameters of the access data to be tested, and combines the terminal access data with the same characteristic parameters into one piece of access data;
the second detection result generation subunit detects each piece of access data to be tested and generates a detection result aiming at the access data to be tested;
the accuracy obtaining subunit obtains the detection accuracy of the access data to be tested according to the generated detection result aiming at the access data to be tested;
the accuracy judgment subunit judges whether the obtained detection accuracy is greater than a preset threshold: if yes, triggering a success judgment subunit; the success judging subunit judges that the established detection model is successful in detecting the access data to be tested; if not, triggering a contribution degree counting subunit; the triggering contribution degree counting subunit counts the contribution degree of each abnormal characteristic parameter in the established detection model; and according to the counted contribution degree, recovering the abnormal characteristic parameters of each piece of abnormal access data, and executing the training step of the abnormal access detection model.
As can be seen from the above, in the anomaly detection apparatus for terminal access data based on machine learning according to the embodiment, the anomaly access detection model is a detection model for the characteristic parameters, and can be updated continuously according to the obtained characteristic parameters, so that in the detection of the anomalous behavior of the terminal access data, the amount and the type of the terminal data that grow rapidly can be adapted, and the detection rate of the anomalous behavior of the terminal access data can be improved.
Preferably, as shown in fig. 2, the apparatus for detecting an anomaly of access data of a "four-in-one" terminal based on machine learning includes:
a first characteristic parameter obtaining unit 30, a detection result obtaining unit 40, a detection result analyzing unit 50, and a first detection result generating unit 60; wherein,
a first characteristic parameter obtaining unit 30, configured to obtain a characteristic parameter of current access data;
a detection result obtaining unit 40, configured to obtain a detection result of the current access data by using the obtained characteristic parameter as input information of a preset detection model, where the detection result is used to indicate that the current access data includes a characteristic value of an abnormal behavior; the detection model is a detection model aiming at the characteristic parameters, and the detection model is updated according to the obtained characteristic parameters under the condition that the detection model meets preset updating conditions;
the detection result analysis unit 50 is used for analyzing the detection result, judging whether the current access data contains abnormal behaviors or not and triggering the first detection result generation unit;
a first detection result generating unit 60, configured to generate an abnormal detection result for the current access data.
Further preferably, the apparatus further comprises:
the device comprises an access data obtaining unit, an abnormal characteristic parameter obtaining unit, an access data merging unit, an abnormal characteristic parameter quantifying unit and a detection model establishing unit; wherein,
the access data acquisition unit is used for acquiring the access data with the marks from the stored access data as a training data set;
an abnormal characteristic parameter obtaining unit, configured to analyze an abnormal characteristic parameter of each piece of abnormal access data according to the obtained abnormal access data and the obtained normal access data, where the abnormal characteristic parameter is a characteristic parameter of the abnormal access data;
the access data merging unit is used for merging the access data with the completely same abnormal characteristic parameters into one piece of access data;
the abnormal characteristic parameter quantization unit is used for quantizing the abnormal characteristic parameters of each piece of combined access data;
and the detection model establishing unit is used for establishing a detection model aiming at the characteristic parameters according to a preset machine learning-based classification algorithm.
Further preferably, in the apparatus, the abnormal characteristic parameter obtaining unit includes:
an abnormal field obtaining subunit, a feature library determining subunit and a feature vector obtaining subunit; wherein,
an abnormal field obtaining subunit, configured to compare the obtained available abnormal access data with each field in the obtained available normal access data to obtain an abnormal field of the abnormal access data;
the characteristic library determining subunit is used for determining a multidimensional characteristic library of the abnormal characteristics according to the abnormal fields, the length characteristics of the fields, the quantity characteristics of the fields and the type characteristics of the access data, wherein the multidimensional characteristic library is used for storing abnormal characteristic parameters of the abnormal characteristics;
and the abnormal characteristic parameter obtaining subunit is used for obtaining the characteristic vector of each piece of access data according to the abnormal characteristic parameters contained in the characteristic library.
Further preferably, the apparatus further comprises:
the device comprises a second feature vector obtaining unit, a second detection result generating unit, a correct rate obtaining unit, a correct rate judging unit, a success judging unit and a contribution degree counting unit; wherein,
the second characteristic parameter obtaining unit is used for obtaining a characteristic vector of the access data to be tested; the second detection result generation unit is used for detecting each piece of access data to be tested according to the detection model and the obtained characteristic vector of the access data to be tested and generating a detection result aiming at the access data to be tested;
a correctness obtaining unit, configured to obtain, according to each generated detection result, a detection correctness according to the access data of the detection model;
the accuracy judging unit is used for judging whether the accuracy is greater than a preset threshold value, if so, the success judging unit is triggered, and if not, the contribution degree counting unit is triggered;
the success judging unit is used for judging that the establishment of the detection model is successful; and the contribution degree counting unit is used for counting the contribution degree of each abnormal feature in the detection model, re-acquiring the abnormal feature vector of each abnormal access data according to the counted contribution degree, and triggering the access data merging unit to merge the access data with the same abnormal feature parameter into one access data.
Further preferably, the apparatus further comprises:
the device comprises a type judging unit and a detection model updating unit; wherein,
the type judging unit is used for judging whether the type of the current access data is an unknown type, and if so, the detection model updating unit is triggered;
and the detection model updating unit is used for updating the detection model according to the type of the current access data.
In summary, the anomaly detection method and device for the 'four-in-one' terminal access data based on machine learning comprehensively use the characteristics of the machine learning technology and the protocol specification for the security risk possibly suffered by the terminal in the 376.1 protocol layer, and thus the intelligent anomaly detection of the terminal access data is completed. A feature library with low redundancy and strong representation capability is established by analyzing protocol characteristics and expert field knowledge, abnormal events in terminal access data are detected by adopting a machine learning technology, and incremental updating of a detection model is completed according to the detection effect of newly added data. The method and the device can be used for efficiently detecting the abnormal behavior of the attack on the main station by using the 376.1 protocol, so that the safe and efficient operation of data acquisition transmission is ensured.
In specific implementation, the device for detecting the abnormal terminal access data based on machine learning provided by the embodiment can be deployed on a big data platform. In actual deployment, a data analysis environment is mainly built by adopting a component of a Hadoop ecosystem, preprocessing work such as data collection and cleaning is completed, and complex analysis work such as machine learning and model building is completed by using advanced memory computing engines such as Spark and the like. The specific framework is as follows:
(1) data acquisition layer
The task of the data acquisition layer is to acquire and store various terminal data, for example, to write the acquired data into a Kafka queue.
(2) Data transmission layer
Kafka is a lightweight, distributed log collection component, which is usually integrated into an application system, and is used to collect data such as user behavior logs, and can use various terminals to store messages in other structured data storage systems such as HDFS.
(3) Data storage layer
The Distributed File storage HDFS (Hadoop Distributed File System, HDFS for short) is used for storing original terminal access data, user behavior logs and the like so as to play roles of backup and evidence obtaining.
(4) Programming model layer
Spark on Yarn is one of three deployment modes of Spark. The Yarn is positioned as a data operating system in the big data, and can better provide resource management and scheduling functions for various upper-layer application programs Spark. Data preprocessing and feature extraction are mainly performed at this level.
(5) Data analysis layer
The MLlib is a machine learning API provided by Spark, and provides mainstream machine learning algorithm implementation of random forests, support vector machines and the like. Based on Spark MLlib, the device uses the labeled data set to train the detection model and test its performance.
The invention has been described above by reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
Claims (10)
1. A terminal access data abnormity detection method based on machine learning is characterized by comprising the following steps:
processing the received current terminal access data according to a predetermined multidimensional feature library to obtain a feature vector of each piece of access data, wherein the current terminal access data comprises at least one piece of access data;
and determining the detection result of the current terminal access data by taking the acquired feature vector as the input information of a machine learning model, wherein the machine learning model is an abnormal access detection model which is trained in advance and passes a correct rate test.
2. The method of claim 1, further comprising:
establishing an abnormal access detection model:
analyzing key fields of each piece of access data from the stored terminal access data according to protocol specifications;
extracting a multidimensional feature library of each piece of access data according to protocol regulations and different characteristics of normal and abnormal messages;
and establishing an abnormal access detection model corresponding to the multidimensional feature library by adopting the training set, taking the feature vector as input, taking the abnormal behavior as output and adopting a machine learning classification algorithm.
3. The method of claim 1, further comprising:
establishing a multi-dimensional feature library:
respectively analyzing the characteristic vector of each piece of access data according to the collected abnormal terminal access data and the collected normal terminal access data;
and comparing the length characteristics of each field, the quantity characteristics of each field and the type characteristics of different abnormal behaviors in the abnormal and normal terminal access data to obtain the abnormal characteristic parameters of the abnormal terminal access data and form a multi-dimensional characteristic library.
4. The method of claim 1,
when the training set is collected, a plurality of pieces of access data with identical feature vectors are combined into one piece of access data.
5. The method of claim 1,
after the step of establishing the abnormal access detection model, the method further comprises the following steps:
acquiring a characteristic vector of the access data of the terminal to be tested, detecting the access data of each terminal to be tested by using the abnormal access detection model, and generating a detection result corresponding to the access data of the terminal to be tested;
comparing the generated detection result with the abnormal behavior of the access data of the terminal to be tested to obtain the correct rate of the abnormal behavior of the access data of the terminal to be tested detected by the abnormal access detection model;
judging whether the accuracy is greater than a preset threshold value or not;
if yes, judging that the abnormal access detection model is effective;
and if not, counting the contribution degree of the abnormal access detection model to each characteristic parameter in the multi-dimensional characteristic library, and re-executing the step of establishing the abnormal access detection model according to the counted contribution degree.
6. A device for detecting abnormal terminal access data based on machine learning is characterized by comprising:
the device comprises a first characteristic parameter obtaining unit, a detection result analyzing unit and a first detection result generating unit; wherein,
a first characteristic parameter obtaining unit, configured to obtain a characteristic parameter of current access data;
the detection result obtaining unit is used for obtaining the detection result of the current access data by taking the obtained characteristic parameters as the preset input information of the detection model, wherein the detection result is used for indicating that the current access data contains the characteristic value of the abnormal behavior; the detection model is a detection model aiming at the characteristic parameters, and the detection model is updated according to the obtained characteristic parameters under the condition that the detection model meets preset updating conditions;
the detection result analysis unit is used for analyzing the detection result, judging whether the current access data contains abnormal behaviors or not and triggering the first detection result generation unit;
and the first detection result generation unit is used for generating an abnormal detection result aiming at the current access data.
7. The apparatus of claim 6, further comprising:
the device comprises an access data obtaining unit, an abnormal characteristic parameter obtaining unit, an access data merging unit, an abnormal characteristic parameter quantifying unit and a detection model establishing unit; wherein,
the access data acquisition unit is used for acquiring the access data with the marks from the stored access data as a training data set;
an abnormal characteristic parameter obtaining unit, configured to analyze an abnormal characteristic parameter of each piece of abnormal access data according to the obtained abnormal access data and the obtained normal access data, where the abnormal characteristic parameter is a characteristic parameter of the abnormal access data;
the access data merging unit is used for merging the access data with the completely same abnormal characteristic parameters into one piece of access data;
the abnormal characteristic parameter quantization unit is used for quantizing the abnormal characteristic parameters of each piece of combined access data;
and the detection model establishing unit is used for establishing a detection model aiming at the characteristic parameters according to a preset machine learning-based classification algorithm.
8. The apparatus of claim 7,
an abnormal characteristic parameter obtaining unit includes:
an abnormal field obtaining subunit, a feature library determining subunit and a feature vector obtaining subunit; wherein,
an abnormal field obtaining subunit, configured to compare the obtained available abnormal access data with each field in the obtained available normal access data to obtain an abnormal field of the abnormal access data;
the characteristic library determining subunit is used for determining a multidimensional characteristic library of the abnormal characteristics according to the abnormal fields, the length characteristics of the fields, the quantity characteristics of the fields and the type characteristics of the access data, wherein the multidimensional characteristic library is used for storing abnormal characteristic parameters of the abnormal characteristics;
and the abnormal characteristic parameter obtaining subunit is used for obtaining the characteristic vector of each piece of access data according to the abnormal characteristic parameters contained in the characteristic library.
9. The apparatus of claim 6, further comprising:
the device comprises a second feature vector obtaining unit, a second detection result generating unit, a correct rate obtaining unit, a correct rate judging unit, a success judging unit and a contribution degree counting unit; wherein,
the second characteristic parameter obtaining unit is used for obtaining a characteristic vector of the access data to be tested; the second detection result generation unit is used for detecting each piece of access data to be tested according to the detection model and the obtained characteristic vector of the access data to be tested and generating a detection result aiming at the access data to be tested;
a correctness obtaining unit, configured to obtain, according to each generated detection result, a detection correctness according to the access data of the detection model;
the accuracy judging unit is used for judging whether the accuracy is greater than a preset threshold value, if so, the success judging unit is triggered, and if not, the contribution degree counting unit is triggered;
the success judging unit is used for judging that the establishment of the detection model is successful; and the contribution degree counting unit is used for counting the contribution degree of each abnormal feature in the detection model, re-acquiring the abnormal feature vector of each abnormal access data according to the counted contribution degree, and triggering the access data merging unit to merge the access data with the same abnormal feature parameter into one access data.
10. The apparatus of claim 6, further comprising:
the device comprises a type judging unit and a detection model updating unit;
the type judging unit is used for judging whether the type of the current access data is an unknown type, and if so, the detection model updating unit is triggered;
and the detection model updating unit is used for updating the detection model according to the type of the current access data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811352235.1A CN109391624A (en) | 2018-11-14 | 2018-11-14 | A kind of terminal access data exception detection method and device based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811352235.1A CN109391624A (en) | 2018-11-14 | 2018-11-14 | A kind of terminal access data exception detection method and device based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109391624A true CN109391624A (en) | 2019-02-26 |
Family
ID=65428618
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811352235.1A Pending CN109391624A (en) | 2018-11-14 | 2018-11-14 | A kind of terminal access data exception detection method and device based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109391624A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175200A (en) * | 2019-05-31 | 2019-08-27 | 国网上海市电力公司 | A kind of abnormal energy analysis method and system based on intelligent algorithm |
CN110177108A (en) * | 2019-06-02 | 2019-08-27 | 四川虹微技术有限公司 | A kind of anomaly detection method, device and verifying system |
CN110457896A (en) * | 2019-07-02 | 2019-11-15 | 北京人人云图信息技术有限公司 | The detection method and detection device of online access |
CN111259985A (en) * | 2020-02-19 | 2020-06-09 | 腾讯科技(深圳)有限公司 | Classification model training method and device based on business safety and storage medium |
CN114070899A (en) * | 2020-07-27 | 2022-02-18 | 深信服科技股份有限公司 | Message detection method, device and readable storage medium |
CN116458119A (en) * | 2020-11-19 | 2023-07-18 | 日本电信电话株式会社 | Estimation device, estimation method, and estimation program |
CN117972757A (en) * | 2024-03-25 | 2024-05-03 | 贵州大学 | Method and system for realizing safety analysis of mine data based on cloud platform |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104935600A (en) * | 2015-06-19 | 2015-09-23 | 中国电子科技集团公司第五十四研究所 | Mobile ad hoc network intrusion detection method and device based on deep learning |
CN105656886A (en) * | 2015-12-29 | 2016-06-08 | 北京邮电大学 | Method and device for detecting website attack behaviors based on machine learning |
US20160226894A1 (en) * | 2015-02-04 | 2016-08-04 | Electronics And Telecommunications Research Institute | System and method for detecting intrusion intelligently based on automatic detection of new attack type and update of attack type model |
-
2018
- 2018-11-14 CN CN201811352235.1A patent/CN109391624A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160226894A1 (en) * | 2015-02-04 | 2016-08-04 | Electronics And Telecommunications Research Institute | System and method for detecting intrusion intelligently based on automatic detection of new attack type and update of attack type model |
CN104935600A (en) * | 2015-06-19 | 2015-09-23 | 中国电子科技集团公司第五十四研究所 | Mobile ad hoc network intrusion detection method and device based on deep learning |
CN105656886A (en) * | 2015-12-29 | 2016-06-08 | 北京邮电大学 | Method and device for detecting website attack behaviors based on machine learning |
Non-Patent Citations (1)
Title |
---|
何珊珊: "基于机器学习的异常流量检测系统的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑2018年第03期》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175200A (en) * | 2019-05-31 | 2019-08-27 | 国网上海市电力公司 | A kind of abnormal energy analysis method and system based on intelligent algorithm |
CN110177108A (en) * | 2019-06-02 | 2019-08-27 | 四川虹微技术有限公司 | A kind of anomaly detection method, device and verifying system |
CN110457896A (en) * | 2019-07-02 | 2019-11-15 | 北京人人云图信息技术有限公司 | The detection method and detection device of online access |
CN111259985A (en) * | 2020-02-19 | 2020-06-09 | 腾讯科技(深圳)有限公司 | Classification model training method and device based on business safety and storage medium |
CN111259985B (en) * | 2020-02-19 | 2023-06-30 | 腾讯云计算(长沙)有限责任公司 | Classification model training method and device based on business safety and storage medium |
CN114070899A (en) * | 2020-07-27 | 2022-02-18 | 深信服科技股份有限公司 | Message detection method, device and readable storage medium |
CN114070899B (en) * | 2020-07-27 | 2023-05-12 | 深信服科技股份有限公司 | Message detection method, device and readable storage medium |
CN116458119A (en) * | 2020-11-19 | 2023-07-18 | 日本电信电话株式会社 | Estimation device, estimation method, and estimation program |
CN117972757A (en) * | 2024-03-25 | 2024-05-03 | 贵州大学 | Method and system for realizing safety analysis of mine data based on cloud platform |
CN117972757B (en) * | 2024-03-25 | 2024-06-14 | 贵州大学 | Method and system for realizing safety analysis of mine data based on cloud platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109391624A (en) | A kind of terminal access data exception detection method and device based on machine learning | |
CN108989150B (en) | Login abnormity detection method and device | |
JP5792654B2 (en) | Security monitoring system and security monitoring method | |
KR101538709B1 (en) | Anomaly detection system and method for industrial control network | |
CN110909811A (en) | OCSVM (online charging management system) -based power grid abnormal behavior detection and analysis method and system | |
CN111881452B (en) | Safety test system for industrial control equipment and working method thereof | |
CN111092862B (en) | Method and system for detecting communication traffic abnormality of power grid terminal | |
CN103905450B (en) | Intelligent grid embedded device network check and evaluation system and check and evaluation method | |
CN109308411B (en) | Method and system for hierarchically detecting software behavior defects based on artificial intelligence decision tree | |
CN116366374B (en) | Security assessment method, system and medium for power grid network management based on big data | |
CN105786702A (en) | Computer software analysis system | |
CN118264488B (en) | Data security management system based on Internet of things | |
CN117439916A (en) | Network security test evaluation system and method | |
CN116633689B (en) | Data storage risk early warning method and system based on network security analysis | |
CN112787984A (en) | Vehicle-mounted network anomaly detection method and system based on correlation analysis | |
CN117596119A (en) | Equipment data acquisition and monitoring method and system based on SNMP (simple network management protocol) | |
CN117336055A (en) | Network abnormal behavior detection method and device, electronic equipment and storage medium | |
CN115277229A (en) | Network security situation perception method and system | |
CN105825130A (en) | Information security early-warning method and device | |
CN112860549A (en) | Method and device for obtaining test sample | |
CN107085544B (en) | System error positioning method and device | |
CN113285847A (en) | Communication network anomaly detection method and system of intelligent converter station monitoring system | |
CN112073396A (en) | Method and device for detecting transverse movement attack behavior of intranet | |
CN116703207A (en) | Thermal power plant safety monitoring method and system based on artificial intelligence | |
CN113591909B (en) | Abnormality detection method for power system, abnormality detection device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190226 |