CN116723157A - Terminal behavior detection model construction method, device, equipment and storage medium - Google Patents

Terminal behavior detection model construction method, device, equipment and storage medium Download PDF

Info

Publication number
CN116723157A
CN116723157A CN202310624689.4A CN202310624689A CN116723157A CN 116723157 A CN116723157 A CN 116723157A CN 202310624689 A CN202310624689 A CN 202310624689A CN 116723157 A CN116723157 A CN 116723157A
Authority
CN
China
Prior art keywords
flow
characteristic
sample
time sequence
time interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310624689.4A
Other languages
Chinese (zh)
Inventor
李肯立
李頔
周旭
杨圣洪
蔡宇辉
余思洋
段明星
吴帆
秦云川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202310624689.4A priority Critical patent/CN116723157A/en
Publication of CN116723157A publication Critical patent/CN116723157A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application relates to a terminal behavior detection model construction method, a terminal behavior detection model construction device, computer equipment, a storage medium and a computer program product. The method comprises the following steps: acquiring historical flow data between first terminal equipment and second terminal equipment in each preset period time interval; respectively extracting features of the historical flow data to obtain flow statistical features in the preset period time intervals; constructing flow characteristic time sequence samples in each preset period time interval according to the period time characteristic value corresponding to each preset period time interval and each flow statistical characteristic; obtaining sample labels corresponding to the flow characteristic time sequence samples, and constructing a terminal behavior detection model corresponding to the preset periodic time interval according to the flow characteristic time sequence samples and the sample labels. By adopting the method, the training effect and the training efficiency of the terminal behavior detection model can be improved.

Description

Terminal behavior detection model construction method, device, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technology, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for constructing a terminal behavior detection model.
Background
With the development of artificial intelligence technology, an artificial intelligence-based flow analysis technology is developed, and flow data between terminals is analyzed through the artificial intelligence technology, so that whether abnormal behaviors exist in the terminals can be detected.
In the conventional technology, generally, a flow data packet between terminals is firstly captured, then the internal characteristics of the flow data are obtained by analyzing the flow data packet, and then a corresponding terminal behavior detection model is trained according to the internal characteristics of the flow data, wherein the terminal behavior detection model can be used for detecting whether abnormal behaviors exist in the terminals.
However, the terminal behavior detection model is usually obtained based on massive data training as a machine learning model, so that massive flow data packet analysis is required to obtain internal characteristics of enough flow data, training time of the terminal behavior detection model is greatly increased, training efficiency of the terminal behavior detection model is affected, and whether periodic terminal behaviors are abnormal or not is difficult to accurately detect by the trained terminal behavior detection model, and training effect of the terminal behavior detection model is poor.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device, a computer-readable storage medium, and a computer program product for constructing a terminal behavior detection model, which can improve both the training effect and the training efficiency of the terminal behavior detection model.
In a first aspect, the application provides a method for constructing a terminal behavior detection model. The method comprises the following steps:
acquiring historical flow data between first terminal equipment and second terminal equipment in each preset period time interval;
respectively extracting features of the historical flow data to obtain flow statistical features in the preset period time intervals;
constructing flow characteristic time sequence samples in each preset period time interval according to the period time characteristic value corresponding to each preset period time interval and each flow statistical characteristic;
obtaining sample labels corresponding to the flow characteristic time sequence samples, and constructing a terminal behavior detection model corresponding to the preset periodic time interval according to the flow characteristic time sequence samples and the sample labels.
In one embodiment, the obtaining a sample tag corresponding to each flow characteristic time sequence sample includes:
clustering the flow statistical characteristic values in each flow characteristic time sequence sample to obtain a clustering result corresponding to each flow characteristic time sequence sample; and labeling each flow characteristic time sequence sample according to each clustering result to obtain each sample label.
In one embodiment, the clustering result includes a cluster center; labeling each flow characteristic time sequence sample according to each clustering result to obtain each sample label, wherein the labeling comprises the following steps:
determining a target distance threshold corresponding to the clustering center according to the characteristic value interval of the clustering center; calculating the interval distance between each flow statistic characteristic value in the flow characteristic time sequence sample and the clustering center; and labeling the flow characteristic time sequence samples according to the interval distance and the target distance threshold value to obtain the sample labels.
In one embodiment, the sample tags include a positive sample tag and a negative sample tag; labeling the flow characteristic time sequence sample according to each interval distance and the target distance threshold value to obtain the sample label, wherein the labeling comprises the following steps:
if the interval distance is not greater than the target distance threshold, determining that no abnormal flow statistical characteristic value exists in the flow characteristic time sequence sample, and labeling a positive sample label for the flow characteristic time sequence sample; if the interval distance is larger than the target distance threshold, determining that a noise flow statistic feature value exists in the flow feature time sequence sample; determining a neighborhood flow characteristic time sequence sample corresponding to the flow characteristic time sequence sample, wherein the flow characteristic time sequence sample and the neighborhood flow characteristic time sequence sample are in the same preset period time interval in the same time period; if the noise flow statistical characteristic value exists in the neighborhood flow characteristic time sequence sample, determining that the noise flow statistical characteristic value is an abnormal flow statistical characteristic value, and labeling a negative sample label for the flow characteristic time sequence sample; if the noise flow statistical characteristic value does not exist in the neighborhood flow characteristic time sequence sample, determining that the noise flow statistical characteristic value is not an abnormal flow statistical characteristic value, and labeling a positive sample label for the flow characteristic time sequence sample.
In one embodiment, the feature extracting the historical flow data to obtain flow statistics features in the preset period time interval includes:
dividing the historical flow data in the preset period time interval according to the equipment address of each second terminal equipment to obtain flow dividing data corresponding to each equipment address; determining the effective load quantity between the first terminal equipment and each second terminal equipment according to the flow segmentation data; and taking the characteristic vector formed by the payload quantities as the flow statistical characteristic in the preset period time interval.
In one embodiment, after the terminal behavior detection model corresponding to each preset period time interval is constructed according to each flow characteristic time sequence sample and each sample label, the method further includes:
acquiring real-time flow data between the first terminal equipment and each second terminal equipment; constructing a real-time flow characteristic time sequence sample according to the real-time flow statistical characteristic corresponding to the real-time flow data and the current time interval corresponding to the real-time flow data; positioning a target detection model in each terminal behavior detection model based on a preset periodic time interval in which the current time interval is positioned; and detecting whether the first terminal equipment has abnormal behaviors in the current time interval according to the real-time flow characteristic time sequence sample based on the target detection model.
In a second aspect, the application further provides a terminal behavior detection model construction device. The device comprises:
the acquisition module is used for acquiring historical flow data between the first terminal equipment and each second terminal equipment in each preset period time interval;
the characteristic extraction module is used for respectively carrying out characteristic extraction on each historical flow data to obtain flow statistical characteristics in each preset period time interval;
the time sequence sample construction module is used for constructing flow characteristic time sequence samples in the preset period time intervals according to the period time characteristic values corresponding to the preset period time intervals and the flow statistical characteristics;
the model construction module is used for acquiring sample labels corresponding to the flow characteristic time sequence samples, and constructing a terminal behavior detection model corresponding to the preset periodic time interval according to the flow characteristic time sequence samples and the sample labels.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
Acquiring historical flow data between first terminal equipment and second terminal equipment in each preset period time interval; respectively extracting features of the historical flow data to obtain flow statistical features in the preset period time intervals; constructing flow characteristic time sequence samples in each preset period time interval according to the period time characteristic value corresponding to each preset period time interval and each flow statistical characteristic; obtaining sample labels corresponding to the flow characteristic time sequence samples, and constructing a terminal behavior detection model corresponding to the preset periodic time interval according to the flow characteristic time sequence samples and the sample labels.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring historical flow data between first terminal equipment and second terminal equipment in each preset period time interval; respectively extracting features of the historical flow data to obtain flow statistical features in the preset period time intervals; constructing flow characteristic time sequence samples in each preset period time interval according to the period time characteristic value corresponding to each preset period time interval and each flow statistical characteristic; obtaining sample labels corresponding to the flow characteristic time sequence samples, and constructing a terminal behavior detection model corresponding to the preset periodic time interval according to the flow characteristic time sequence samples and the sample labels.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
acquiring historical flow data between first terminal equipment and second terminal equipment in each preset period time interval; respectively extracting features of the historical flow data to obtain flow statistical features in the preset period time intervals; constructing flow characteristic time sequence samples in each preset period time interval according to the period time characteristic value corresponding to each preset period time interval and each flow statistical characteristic; obtaining sample labels corresponding to the flow characteristic time sequence samples, and constructing a terminal behavior detection model corresponding to the preset periodic time interval according to the flow characteristic time sequence samples and the sample labels.
The method, the device, the computer equipment, the storage medium and the computer program product for constructing the terminal behavior detection model acquire historical flow data between the first terminal equipment and the second terminal equipment in each preset period time interval; respectively extracting features of each historical flow data to obtain flow statistical features in each preset period time interval, wherein the flow statistical features are not internal features of the flow data, the flow statistical features can be obtained without analyzing the flow data, and further, according to the period time feature values corresponding to each preset period time interval and each flow statistical feature, flow feature time sequence samples in each preset period time interval are constructed; the method comprises the steps of obtaining sample labels corresponding to flow characteristic time sequence samples, constructing a terminal behavior detection model corresponding to each preset period time interval according to each flow characteristic time sequence sample and each sample label, training the terminal behavior detection model according to flow statistics characteristics and time characteristic values, adding a time dimension into a training input sample for training the terminal behavior detection model, enabling the trained terminal behavior detection model to have higher detection accuracy on periodic terminal behaviors, improving the training effect of the terminal behavior detection model, and saving time for obtaining input training samples for training the terminal behavior detection model because flow statistics characteristics can be obtained without analyzing flow data.
Drawings
FIG. 1 is a flow chart of a method for constructing a terminal behavior detection model in one embodiment;
FIG. 2 is a schematic flow chart of obtaining sample tags corresponding to time-series samples of each flow characteristic in one embodiment;
FIG. 3 is a schematic flow chart of labeling each flow characteristic time sequence sample in one embodiment;
FIG. 4 is a schematic flow chart of labeling each flow characteristic time sequence sample in another embodiment;
fig. 5 is a flow chart of flow analysis performed on a first terminal device and each second terminal device in an intranet in another embodiment;
FIG. 6 is a block diagram of a device for constructing a model of terminal behavior detection in one embodiment;
fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In one embodiment, as shown in fig. 1, a method for constructing a terminal behavior detection model is provided, and this embodiment is illustrated by applying the method to a server, where it is understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
Step 202, obtaining historical flow data between a first terminal device and each second terminal device in each preset period time interval.
In the conventional network security solution, the network is generally divided into an intranet and an extranet, the extranet defaults to be untrusted, and the intranet defaults to be trusted, so that the monitoring of the intranet is generally weaker, and once the intranet has a security accident, a very serious security accident can be caused.
As an example, the first terminal device and the second terminal device may be both intranet terminal devices, for example, may be industrial control terminal devices of an intranet, where the first terminal device is a target terminal device that needs to perform terminal behavior detection, and the second terminal device is an opposite terminal device that interacts with the first terminal device.
As an example, if the preset cycle time interval is a time interval within a preset time period, and the time period is one day, the day may be divided into 1440 minutes, and then the one minute time interval of each day may be set as one preset cycle time interval, so there are 1440 preset cycle time intervals in total, and the preset time period is each day.
As an example, the historical traffic data may be traffic data generated between the first terminal device and each of the second terminal devices within a preset period time interval, and the traffic data may be tcp stream data.
As an example, step 202 includes: the method comprises the steps of obtaining a first equipment address of a first terminal equipment, respectively grabbing flow data taking the first equipment address as a source address and taking the first equipment address as a destination address in each preset period time interval, and obtaining historical flow data in each preset period time interval, wherein the historical flow data can exist in the form of a grabbing packet file, the file name of the grabbing packet file can comprise interval identifiers of the preset period time intervals, and the interval identifiers are used for identifying which preset period time interval the grabbing packet file is.
And 204, respectively extracting the characteristics of each historical flow data to obtain flow statistical characteristics in each preset period time interval.
The flow statistics feature is a statistics feature of flow data, and the flow statistics feature can be obtained without analyzing historical flow data, for example, the flow statistics feature can be a total number of messages, a total number of flows, an effective load and the like.
As an example, step 204 includes: acquiring second equipment addresses corresponding to second terminal equipment, and respectively aggregating flow data taking the same second equipment addresses as source addresses in each historical flow data to obtain first flow data sets in each preset period time interval, wherein the first flow data sets comprise first flow sub-data sets corresponding to the second equipment addresses; respectively aggregating the flow data taking the same second equipment address as a destination address in each historical flow data to obtain second flow data sets in each preset period time interval, wherein the second flow data sets comprise second flow sub-data sets corresponding to each second equipment address; respectively obtaining first statistical characteristic values of a plurality of first flow sub-data sets and second statistical characteristic values of a plurality of second flow sub-data sets in each preset period time interval; and constructing flow statistical characteristics in each preset period time interval according to the first statistical characteristic value and the second statistical characteristic value in each preset period time interval.
As an example, the traffic statistics include a first traffic statistics of an ingress direction into the first terminal device and a second traffic statistics of an egress direction out of the first terminal device; constructing flow statistics features in each preset period time interval according to the first statistics feature value and the second statistics feature value in each preset period time interval, including:
according to the coding position sequence corresponding to each second terminal device, arranging first statistical characteristic values in a preset period time interval into first characteristic vectors to obtain first flow statistical characteristics in the preset period time interval; and arranging the second statistical characteristic value in the preset period time interval into a second characteristic vector to obtain a second flow statistical characteristic in the preset period time interval, wherein if the flow data flowing into the inlet direction of the first terminal equipment between the first terminal equipment and the second terminal equipment is empty, the corresponding first statistical characteristic value is 0, and if the flow data flowing out of the outlet direction of the first terminal equipment between the first terminal equipment and the second terminal equipment is empty, the corresponding second statistical characteristic value is 0, so that the flow statistical characteristic in each preset period time interval is constructed in an automatic characteristic engineering mode, manual intervention is not needed, the efficiency of the characteristic engineering is higher, and a foundation is laid for improving the training efficiency of the terminal behavior detection model.
And 206, constructing flow characteristic time sequence samples in each preset period time interval according to the period time characteristic value corresponding to each preset period time interval and each flow statistical characteristic.
The cycle time characteristic value is an identifier of a preset cycle time interval, and is used for identifying the preset cycle time interval, different preset cycle time intervals correspond to different cycle time characteristic values, wherein 2 corresponding cycle time characteristic values can be set for each preset cycle time interval, the first cycle time characteristic value corresponds to a first flow statistical characteristic value in the preset cycle time interval, the second cycle time characteristic value corresponds to a second flow statistical characteristic value in the preset cycle time interval, the first cycle time characteristic value can be a time characteristic value corresponding to a first flow statistical characteristic of an inflow direction of the first terminal device in the preset cycle time interval, and the second cycle time characteristic value can be a time characteristic value corresponding to a second flow statistical characteristic of an outflow direction of the first terminal device in the preset cycle time interval.
As an example, step 206 includes: acquiring a first cycle time characteristic value and a second cycle time characteristic value corresponding to each preset cycle time interval; splicing the first flow statistical characteristic in the preset period time interval and the first period time characteristic value corresponding to the preset period time interval to obtain a first training sample, and splicing the second flow statistical characteristic in the preset period time interval and the second period time characteristic value corresponding to the preset period time interval to obtain a second training sample; and taking the first training sample and the second training sample in each preset period time interval as each flow characteristic time sequence sample.
As an example, the specific calculation process for obtaining the first cycle time characteristic value and the second cycle time characteristic value corresponding to each preset cycle time interval is as follows:
P(pos,2i)=sin(pos1440)
P(pos,2i+1)=cos(pos1440)
wherein 2i indicates the incoming direction of the flow data flowing into the first terminal device, 2i+1 indicates the outgoing direction of the flow data flowing out of the first terminal device, pos is the arrangement order of the preset cycle time interval in one day, assuming that the preset cycle time interval is (0, 1), it indicates that the preset cycle time interval is the first minute in one day, pos=1, P (pos, 2 i) is the first cycle time feature value, P (pos, 2i+1) is the second cycle time feature value, and 1440 indicates that 1440 minutes exist in one day.
Step 208, obtaining sample labels corresponding to the flow characteristic time sequence samples, and constructing a terminal behavior detection model corresponding to each preset period time interval according to the flow characteristic time sequence samples and the sample labels.
The sample tag is used for identifying whether an abnormal flow characteristic value exists in the flow characteristic time sequence sample, if the abnormal flow characteristic value exists, the sample tag indicates that the first terminal equipment has abnormal behaviors, and if the abnormal flow characteristic value does not exist, the sample tag indicates that the first terminal equipment does not have abnormal behaviors.
As one example, step 208 includes: labeling each flow characteristic time sequence sample to obtain each sample label; and respectively training to obtain a terminal behavior detection model corresponding to each preset period time interval by utilizing the flow characteristic time sequence sample and the sample label in the preset period time interval.
As an example, the terminal behavior detection model corresponding to each preset period time interval may be divided into a first terminal behavior detection model and a second terminal behavior detection model, where the first terminal behavior detection model is obtained by training a first training sample in the preset period time interval and a sample tag corresponding to the first training sample, and is used for predicting whether the terminal behavior flowing into the entering direction of the first terminal device in the preset period time interval is abnormal; the second terminal behavior detection model is obtained by training a second training sample in a preset period time interval and a sample label corresponding to the second training sample and is used for predicting whether the terminal behavior flowing out of the first terminal equipment in the output direction in the preset period time interval is abnormal, wherein the terminal behavior can be communication interaction behavior between terminals.
In the method for constructing the terminal behavior detection model, historical flow data between the first terminal equipment and the second terminal equipment in each preset period time interval are obtained; respectively extracting features of each historical flow data to obtain flow statistical features in each preset period time interval, wherein the flow statistical features are not internal features of the flow data, the flow statistical features can be obtained without analyzing the flow data, and further, according to the period time feature values corresponding to each preset period time interval and each flow statistical feature, flow feature time sequence samples in each preset period time interval are constructed; the method comprises the steps of obtaining sample labels corresponding to flow characteristic time sequence samples, constructing a terminal behavior detection model corresponding to each preset period time interval according to each flow characteristic time sequence sample and each sample label, training the terminal behavior detection model according to flow statistics characteristics and time characteristic values, adding a time dimension into a training input sample for training the terminal behavior detection model, enabling the trained terminal behavior detection model to have higher detection accuracy for periodic terminal behaviors, improving the training effect of the terminal behavior detection model, and saving time for obtaining input training samples for training the terminal behavior detection model due to the fact that flow statistics characteristics can be obtained without analyzing flow data, improving the training efficiency of the terminal behavior detection model, and achieving the effects of improving the training effect and the training efficiency of the terminal behavior detection model.
In one embodiment, as shown in fig. 2, obtaining a sample tag corresponding to each flow characteristic timing sample includes:
step 302, clustering the flow statistical feature values in each flow feature time sequence sample to obtain a clustering result corresponding to each flow feature time sequence sample.
Wherein the clustering result includes a clustering center.
As an example, step 302 includes: and clustering the flow statistical characteristic values in each flow characteristic time sequence sample to obtain a clustering center corresponding to each flow characteristic time sequence sample. The clustering algorithm may be a DBSCAN algorithm.
And step 304, labeling each flow characteristic time sequence sample according to each clustering result to obtain each sample label.
The sample label may be a positive sample label or a negative sample label, where the positive sample label is used to identify that an abnormal traffic statistics feature value exists in the traffic feature time sequence sample, and at this time, the first terminal device has an abnormal behavior, and the negative sample label is used to identify that an abnormal traffic statistics feature value does not exist in the traffic feature time sequence sample, and at this time, the first terminal device does not have an abnormal behavior.
As an example, step 304 includes: respectively calculating the distance between each flow statistical characteristic value in the flow characteristic time sequence sample and the clustering center corresponding to the flow characteristic time sequence sample; if the distance is greater than a preset distance threshold, determining that an abnormal flow statistical characteristic value exists in the flow characteristic time sequence sample, and labeling a negative sample label for the flow characteristic time sequence sample; if the distance is not greater than the preset distance threshold, determining that the abnormal flow statistical characteristic value does not exist in the flow characteristic time sequence sample, and labeling the flow characteristic time sequence sample with a positive sample label.
In the embodiment, clustering results corresponding to each flow characteristic time sequence sample are obtained by clustering flow statistical characteristic values in each flow characteristic time sequence sample; according to each clustering result, label marking is carried out on each flow characteristic time sequence sample respectively, each sample label is obtained, label marking can be automatically carried out on each flow characteristic time sequence sample in a clustering mode of characteristic values in the samples, manual marking is not needed, therefore, label marking efficiency can be improved, and training efficiency of a terminal behavior detection model is improved.
In one embodiment, as shown in FIG. 3, the clustering result includes a cluster center; labeling each flow characteristic time sequence sample according to each clustering result to obtain each sample label, wherein the labeling comprises the following steps:
step 402, determining a target distance threshold corresponding to the cluster center according to a characteristic value interval in which the cluster center is located;
it should be noted that, when the value of the clustering center is larger, the flow characteristic time sequence sample is proved to belong to a large flow scene, when the value of the clustering center is smaller, the flow characteristic time sequence sample is proved to belong to a small flow scene, and in the large flow scene and the small flow scene, the judgment tolerance of the abnormal flow statistic characteristic value in the flow characteristic time sequence sample is different, the large flow scene is higher in general safety risk, so that the judgment tolerance of the abnormal flow statistic characteristic value in the flow characteristic time sequence sample in the large flow scene is lower, a smaller distance threshold value should be set, the small flow scene is lower in general safety risk, and therefore, the judgment tolerance of the abnormal flow statistic characteristic value in the flow characteristic time sequence sample in the small flow scene is higher, and a larger distance threshold value should be set.
As an example, step 402 includes: acquiring a characteristic value of the clustering center, and positioning a characteristic value interval in which the clustering center is positioned according to the magnitude of the characteristic value; and inquiring a target distance threshold corresponding to the clustering center according to the mapping relation between the characteristic value interval and the distance threshold.
Step 404, calculating the interval distance between each flow statistic feature value and the clustering center in the flow feature time sequence sample;
and step 406, labeling the flow characteristic time sequence samples according to the interval distances and the target distance threshold value to obtain sample labels.
As an example, steps 404 through 406 include: calculating the interval distance between each flow statistic characteristic value and the clustering center in the flow characteristic time sequence sample; if the interval distance is greater than the target distance threshold, determining that an abnormal flow statistical characteristic value exists in the flow characteristic time sequence sample, and labeling a negative sample label for the flow characteristic time sequence sample; if the interval distance is not greater than the target distance threshold, determining that no abnormal flow statistical characteristic value exists in the flow characteristic time sequence sample, and labeling the flow characteristic time sequence sample with a positive sample label.
In the above embodiment, the target distance threshold corresponding to the cluster center is determined according to the feature value interval where the cluster center is located, so that the corresponding target distance threshold is flexibly adapted to the flow feature time sequence sample according to the flow scene where the flow feature time sequence sample is located, so that the set target distance threshold is more accurate, and the interval distance between each flow statistic feature value in the flow feature time sequence sample and the cluster center is calculated; and labeling the flow characteristic time sequence samples according to each interval distance and the target distance threshold value to obtain sample labels, so that the label labeling accuracy can be improved, and a foundation is laid for improving the training effect of the terminal behavior detection model.
In one embodiment, as shown in FIG. 4, the sample tags include a positive sample tag and a negative sample tag; labeling the flow characteristic time sequence sample according to each interval distance and the target distance threshold value to obtain a sample label, wherein the method comprises the following steps:
step 502, if the interval distance is not greater than the target distance threshold, determining that no abnormal flow statistical feature value exists in the flow feature time sequence sample, and labeling a positive sample label for the flow feature time sequence sample;
as an example, step 502 includes: if the interval distance is not greater than the target distance threshold, determining that no abnormal flow statistical characteristic value exists in the flow characteristic time sequence sample, namely the first terminal equipment does not have abnormal behavior, and labeling a positive sample label for the flow characteristic time sequence sample.
Step 504, if the interval distance is greater than the target distance threshold, determining that a noise flow statistic feature value exists in the flow feature time sequence sample;
step 506, determining a neighborhood flow characteristic time sequence sample corresponding to the flow characteristic time sequence sample, wherein the flow characteristic time sequence sample and the neighborhood flow characteristic time sequence sample are in the same preset period time interval in the same time period;
step 508, if the noise flow statistic feature value exists in the neighborhood flow feature time sequence sample, determining that the noise flow statistic feature value is an abnormal flow statistic feature value, and labeling a negative sample label for the flow feature time sequence sample;
If the first terminal device has a traffic statistics feature value with a distance greater than a target distance threshold in a single traffic direction corresponding to the traffic feature time sequence sample, the single traffic direction may be an incoming direction flowing into the first terminal device or an outgoing direction flowing out of the first terminal device, and the reason may be that, besides the first terminal device has an abnormal behavior, the reason may be that network fluctuation or a transmission message between terminals is not correctly received, and other reasons may cause, if it is directly determined that the terminal device has an abnormal behavior, a misjudgment may exist.
As an example, steps 504 through 508 include: if the interval distance is greater than the target distance threshold, determining that a noise flow statistical characteristic value exists in the flow characteristic time sequence sample, namely determining that a flow statistical characteristic value suspected to be abnormal exists in the flow characteristic time sequence sample in the single flow direction in a preset period time interval of the first terminal equipment, wherein the suspected abnormal flow statistical characteristic value is the noise flow statistical characteristic value; determining a neighborhood flow characteristic time sequence sample corresponding to the flow characteristic time sequence sample, wherein the flow characteristic time sequence sample and the neighborhood flow characteristic time sequence sample are in the same preset period time interval in the same time period, namely the neighborhood flow characteristic time sequence sample and the flow characteristic time sequence sample are flow characteristic time sequence samples in different flow directions in the same preset period time interval in the same time period, for example, the time period is one day, the preset period time interval is the first minute in one day, the flow characteristic time sequence sample can be the flow characteristic time sequence sample corresponding to flow data flowing into the first terminal equipment in the first minute and flowing into the first terminal equipment in the first direction, and the neighborhood flow characteristic time sequence sample can be the flow characteristic time sequence sample corresponding to flow data flowing out of the first terminal equipment in the first minute and flowing out of the first terminal equipment in the first direction; if noise flow statistics characteristic values exist in the neighborhood flow characteristic time sequence samples as well, determining that the noise flow statistics characteristic values are abnormal flow statistics characteristic values, namely, the first terminal equipment does not have abnormal behaviors, and labeling negative sample labels for the flow characteristic time sequence samples.
Step 510, if the noise flow statistic feature value does not exist in the neighborhood flow feature time sequence sample, determining that the noise flow statistic feature value is not an abnormal flow statistic feature value, and labeling the flow feature time sequence sample with a positive sample label.
As an example, step 510 includes: if the noise flow statistical characteristic value does not exist in the neighborhood flow characteristic time sequence sample, determining that the noise flow statistical characteristic value is not an abnormal flow statistical characteristic value, namely, the first terminal equipment does not have abnormal behavior, and labeling a positive sample label for the flow characteristic time sequence sample.
In the above embodiment, if the interval distance is not greater than the target distance threshold, determining that no abnormal flow statistical feature value exists in the flow feature time sequence sample, and labeling a positive sample label for the flow feature time sequence sample; if the interval distance is greater than the target distance threshold, determining that a noise flow statistic characteristic value exists in the flow characteristic time sequence sample; determining a neighborhood flow characteristic time sequence sample corresponding to the flow characteristic time sequence sample, wherein the flow characteristic time sequence sample and the neighborhood flow characteristic time sequence sample are in the same preset period time interval in the same time period; if the noise flow statistics characteristic value exists in the neighborhood flow characteristic time sequence sample, the noise flow statistics characteristic value is determined to be an abnormal flow statistics characteristic value, a negative sample label is marked for the flow characteristic time sequence sample, if the noise flow statistics characteristic value does not exist in the neighborhood flow characteristic time sequence sample, the noise flow statistics characteristic value is determined not to be an abnormal flow statistics characteristic value, and a positive sample label is marked for the flow characteristic time sequence sample, so that misjudgment of abnormal terminal behaviors caused by network fluctuation or that transmission messages among terminals are not received correctly and the like can be eliminated to a certain extent, the label marking of the flow characteristic time sequence sample is more accurate, and a foundation is laid for improving the training effect of the terminal behavior detection model.
In one embodiment, feature extraction is performed on each historical flow data to obtain flow statistics features in each preset period time interval, including:
dividing the historical flow data in a preset period time interval according to the equipment addresses of the second terminal equipment to obtain flow dividing data corresponding to the equipment addresses; determining the effective load quantity between the first terminal equipment and each second terminal equipment according to the flow segmentation data; and taking the characteristic vector formed by the payload quantities as a flow statistical characteristic in a preset period time interval.
The historical flow data comprise first flow data flowing into the inlet direction of the first terminal device and second flow data flowing out of the outlet direction of the first terminal device in a preset period time interval.
As an example, data with the same device address of the second terminal device are respectively aggregated in the first traffic data of the preset period time interval to obtain first aggregate traffic data corresponding to the address of each second terminal device; respectively aggregating the data with the same equipment addresses of the second terminal equipment in the second traffic data of the preset period time interval to obtain second aggregate traffic data corresponding to the addresses of the second terminal equipment; the first aggregate flow data and the second aggregate flow data are used as flow segmentation data corresponding to each equipment address; taking a feature vector formed by the effective load quantities corresponding to the first aggregate flow data as a first flow statistical feature in a preset period time interval; taking a characteristic vector formed by effective load amounts corresponding to the second aggregate flow data as a second flow statistical characteristic in a preset period time interval; and taking the first flow statistical characteristic and the second flow statistical characteristic as the flow statistical characteristic in a preset period time interval. According to the flow direction and the equipment address of each second terminal equipment, the historical flow data in the preset period time interval is segmented, the effective load quantity of the segmented flow segmentation data is formed into the feature vector, the flow statistical features in the preset period time interval can be obtained, and the feature extraction of the flow statistical features can be realized without a complex neural network, so that the efficiency of flow statistical feature extraction can be improved, and a foundation is laid for improving the training efficiency of the terminal behavior detection model.
In one embodiment, after constructing the terminal behavior detection model corresponding to each preset period time interval according to each flow characteristic time sequence sample and each sample label, the method further includes:
acquiring real-time flow data between a first terminal device and each second terminal device; constructing a real-time flow characteristic time sequence sample according to the real-time flow statistical characteristic corresponding to the real-time flow data and the current time interval corresponding to the real-time flow data; positioning a target detection model in each terminal behavior detection model based on a preset periodic time interval in which a current time interval is positioned; based on the target detection model, detecting whether the first terminal equipment has abnormal behaviors in the current time interval according to the real-time flow characteristic time sequence sample.
As an example, acquiring real-time traffic data between a first terminal device and each second terminal device, and determining a current time interval and a traffic direction corresponding to the real-time traffic data, wherein the traffic direction comprises an incoming direction flowing into the first terminal device and an outgoing direction flowing out of the first terminal device; acquiring a flow statistical characteristic value of real-time flow data with the same flow direction, wherein the flow statistical characteristic value can be an effective load; according to the flow direction, determining a cycle time characteristic value corresponding to the current time interval; splicing a characteristic vector formed by flow statistics characteristic values with the same flow direction and a cycle time characteristic value corresponding to the flow direction into a real-time flow characteristic time sequence sample; positioning a target detection model in each terminal behavior detection model based on a preset periodic time interval and a flow direction corresponding to the current time interval; by inputting the real-time flow characteristic time sequence sample into the target detection model, whether the first terminal equipment has abnormal behaviors in the current time interval is detected, for example, the direct output result of the target detection model can be abnormal behavior probability, if the abnormal behavior probability is larger than a preset probability threshold value, the first terminal equipment is determined to have abnormal behaviors in the current time interval, and if the abnormal behavior probability is not larger than the preset probability threshold value, the first terminal equipment is determined to have no abnormal behaviors in the current time interval.
In the above embodiment, a terminal behavior detection model is deployed in different flow directions in each preset period time interval, and according to the terminal behavior detection models, abnormal behavior detection can be performed on real-time flow data in different flow directions in different preset period time intervals, so that a time dimension is added in the abnormal behavior process, and therefore, whether periodic terminal behaviors are abnormal or not can be accurately detected, and the accuracy of terminal behavior detection is improved.
As an example, referring to fig. 5, fig. 5 is a flow chart of flow analysis performed on a first terminal device and each second terminal device in an intranet in an embodiment, where the first terminal device and the second terminal device are both one of intranet terminal devices, the core switch is a network switch between the intranet terminal devices, flow segmentation refers to segmenting historical flow data in a preset period time interval according to device addresses of the second terminal devices, obtaining flow segmentation data corresponding to the device addresses, feature extraction refers to flow statistics feature extraction, time sequence sample construction refers to constructing a flow feature time sequence sample, label labeling refers to sample labels corresponding to the flow feature time sequence sample label, and model construction refers to constructing a terminal behavior detection model in different flow directions in each preset period time interval.
In one embodiment, a first device address of a first terminal device is obtained, and flow data taking the first device address as a source address and taking the first device address as a destination address are respectively captured in each preset period time interval to obtain historical flow data in each preset period time interval, wherein the historical flow data can exist in a form of a capture packet file, and a section identifier of the preset period time interval can be included in a file name of the capture packet file and is used for identifying which preset period time interval the capture packet file is.
After the historical flow data in each preset period time interval are obtained, respectively aggregating the data with the same equipment address of the second terminal equipment in the first flow data in the preset period time interval to obtain first aggregate flow data corresponding to the address of each second terminal equipment; respectively aggregating the data with the same equipment addresses of the second terminal equipment in the second traffic data of the preset period time interval to obtain second aggregate traffic data corresponding to the addresses of the second terminal equipment; the first aggregate flow data and the second aggregate flow data are used as flow segmentation data corresponding to each equipment address; taking a feature vector formed by the effective load quantities corresponding to the first aggregate flow data as a first flow statistical feature in a preset period time interval; taking a characteristic vector formed by effective load amounts corresponding to the second aggregate flow data as a second flow statistical characteristic in a preset period time interval; and taking the first flow statistical characteristic and the second flow statistical characteristic as the flow statistical characteristic in a preset period time interval. Therefore, the feature extraction of the flow statistical features can be realized without a complex neural network, so that the efficiency of flow statistical feature extraction can be improved, and a foundation is laid for improving the training efficiency of the terminal behavior detection model.
After the flow statistical characteristics in each preset period time interval are obtained through characteristic extraction, clustering flow statistical characteristic values in each flow characteristic time sequence sample respectively to obtain a clustering center corresponding to each flow characteristic time sequence sample; acquiring a characteristic value of the clustering center, and positioning a characteristic value interval in which the clustering center is positioned according to the magnitude of the characteristic value; inquiring a target distance threshold corresponding to the clustering center according to the mapping relation between the characteristic value interval and the distance threshold; if the interval distance is not greater than the target distance threshold, determining that no abnormal flow statistical characteristic value exists in the flow characteristic time sequence sample, and labeling a positive sample label for the flow characteristic time sequence sample; if the interval distance is greater than the target distance threshold, determining that a noise flow statistical characteristic value exists in the flow characteristic time sequence sample, namely determining that a flow statistical characteristic value suspected to be abnormal exists in the flow characteristic time sequence sample in the single flow direction in a preset period time interval of the first terminal equipment, wherein the suspected abnormal flow statistical characteristic value is the noise flow statistical characteristic value; determining a neighborhood flow characteristic time sequence sample corresponding to the flow characteristic time sequence sample, wherein the flow characteristic time sequence sample and the neighborhood flow characteristic time sequence sample are in the same preset period time interval in the same time period; if the noise flow statistic characteristic value does not exist in the neighborhood flow characteristic time sequence sample, determining that the noise flow statistic characteristic value is not an abnormal flow statistic characteristic value, and labeling a positive sample label for the flow characteristic time sequence sample.
Therefore, the method and the device realize that the corresponding target distance threshold value is flexibly adapted to the flow characteristic time sequence sample according to the flow scene where the flow characteristic time sequence sample is located, so that the set target distance threshold value is more accurate, label marking is performed based on the more accurate target distance threshold value, the label marking accuracy can be improved, in the embodiment, when the noise flow statistic characteristic value exists in the flow characteristic time sequence sample, whether the noise flow statistic characteristic value exists in the neighborhood flow characteristic time sequence sample or not can be synchronously detected, if the noise flow statistic characteristic value exists, the noise flow statistic characteristic value is determined to be an abnormal flow statistic characteristic value, and negative sample labels are marked for the flow characteristic time sequence sample, misjudgment of abnormal terminal behaviors caused by network fluctuation or the fact that transmission messages between terminals are not received correctly can be eliminated to a certain extent, the label marking of the flow characteristic time sequence sample is more accurate, and a foundation is laid for improving the training effect of a terminal behavior detection model.
And constructing a terminal behavior detection model corresponding to each preset period time interval according to each flow characteristic time sequence sample and each sample label, so that the terminal behavior detection model can be trained according to the flow statistics characteristics and the time characteristic values, and the time dimension is added into a training input sample for training the terminal behavior detection model, so that the trained terminal behavior detection model also has higher detection accuracy for periodic terminal behaviors, the training effect of the terminal behavior detection model is improved, and the flow statistics characteristics can be obtained without analyzing flow data, so that the time for acquiring the input training sample for training the terminal behavior detection model is saved, the training efficiency of the terminal behavior detection model is improved, and the training effect and the training efficiency of the terminal behavior detection model can be improved.
Further, after training to obtain terminal behavior detection models of different flow directions in each preset period time interval, acquiring real-time flow data between the first terminal device and each second terminal device, and determining a current time interval and a flow direction corresponding to the real-time flow data, wherein the flow direction comprises an inlet direction flowing into the first terminal device and an outlet direction flowing out of the first terminal device; acquiring a flow statistical characteristic value of real-time flow data with the same flow direction, wherein the flow statistical characteristic value can be an effective load; according to the flow direction, determining a cycle time characteristic value corresponding to the current time interval; splicing a characteristic vector formed by flow statistics characteristic values with the same flow direction and a cycle time characteristic value corresponding to the flow direction into a real-time flow characteristic time sequence sample; positioning a target detection model in each terminal behavior detection model based on a preset periodic time interval and a flow direction corresponding to the current time interval; and detecting whether the first terminal equipment has abnormal behaviors in the current time interval or not by inputting the real-time flow characteristic time sequence sample into a target detection model. Therefore, a terminal behavior detection model is deployed in different flow directions in each preset period time interval, abnormal behavior detection can be carried out on real-time flow data in different flow directions in different preset period time intervals according to the terminal behavior detection models, and time dimension is added in the abnormal behavior process, so that whether periodic terminal behaviors are abnormal or not can be accurately detected, and the accuracy of terminal behavior detection is improved.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a terminal behavior detection model construction device for realizing the terminal behavior detection model construction method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the device for constructing the terminal behavior detection model provided in the following may be referred to the limitation of the method for constructing the terminal behavior detection model hereinabove, and will not be repeated herein.
In one embodiment, as shown in fig. 6, there is provided a terminal behavior detection model construction apparatus, including: an acquisition module 602, a feature extraction module 604, a time series sample construction module 606, and a model construction module 608, wherein:
the obtaining module 602 is configured to obtain historical traffic data between the first terminal device and each second terminal device in each preset period time interval.
The feature extraction module 604 is configured to perform feature extraction on each of the historical flow data, so as to obtain flow statistics features in each of the preset period time intervals.
The time sequence sample construction module 606 is configured to construct a flow characteristic time sequence sample in each preset period time interval according to the period time characteristic value corresponding to each preset period time interval and each flow statistical characteristic.
The model construction module 608 is configured to obtain sample tags corresponding to the flow characteristic time sequence samples, and construct a terminal behavior detection model corresponding to the preset periodic time interval according to the flow characteristic time sequence samples and the sample tags.
In one embodiment, the model building module 608 is further configured to:
clustering the flow statistical characteristic values in each flow characteristic time sequence sample to obtain a clustering result corresponding to each flow characteristic time sequence sample; and labeling each flow characteristic time sequence sample according to each clustering result to obtain each sample label.
In one embodiment, the clustering result includes a cluster center; the model building module 608 is further configured to:
determining a target distance threshold corresponding to the clustering center according to the characteristic value interval of the clustering center; calculating the interval distance between each flow statistic characteristic value in the flow characteristic time sequence sample and the clustering center; and labeling the flow characteristic time sequence samples according to the interval distance and the target distance threshold value to obtain the sample labels.
In one embodiment, the sample tags include a positive sample tag and a negative sample tag; the model building module 608 is further configured to:
if the interval distance is not greater than the target distance threshold, determining that no abnormal flow statistical characteristic value exists in the flow characteristic time sequence sample, and labeling a positive sample label for the flow characteristic time sequence sample; if the interval distance is larger than the target distance threshold, determining that a noise flow statistic feature value exists in the flow feature time sequence sample; determining a neighborhood flow characteristic time sequence sample corresponding to the flow characteristic time sequence sample, wherein the flow characteristic time sequence sample and the neighborhood flow characteristic time sequence sample are in the same preset period time interval in the same time period; if the noise flow statistical characteristic value exists in the neighborhood flow characteristic time sequence sample, determining that the noise flow statistical characteristic value is an abnormal flow statistical characteristic value, and labeling a negative sample label for the flow characteristic time sequence sample; if the noise flow statistical characteristic value does not exist in the neighborhood flow characteristic time sequence sample, determining that the noise flow statistical characteristic value is not an abnormal flow statistical characteristic value, and labeling a positive sample label for the flow characteristic time sequence sample.
In one embodiment, the feature extraction module is further configured to:
dividing the historical flow data in the preset period time interval according to the equipment address of each second terminal equipment to obtain flow dividing data corresponding to each equipment address; determining the effective load quantity between the first terminal equipment and each second terminal equipment according to the flow segmentation data; and taking the characteristic vector formed by the payload quantities as the flow statistical characteristic in the preset period time interval.
In one embodiment, the terminal behavior detection model construction device further includes:
the real-time flow data acquisition module is used for acquiring real-time flow data between the first terminal equipment and each second terminal equipment;
the real-time sequence sample construction module is used for constructing a real-time flow characteristic time sequence sample according to the real-time flow statistical characteristic corresponding to the real-time flow data and the current time interval corresponding to the real-time flow data;
the model positioning module is used for positioning a target detection model in each terminal behavior detection model based on a preset periodic time interval in which the current time interval is positioned;
And the terminal behavior detection module is used for detecting whether the first terminal equipment has abnormal behaviors in the current time interval or not according to the real-time flow characteristic time sequence sample based on the target detection model.
The modules in the terminal behavior detection model construction device can be realized in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing terminal behavior detection model construction data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for building a model of terminal behavior detection.
It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. The method for constructing the terminal behavior detection model is characterized by comprising the following steps:
acquiring historical flow data between first terminal equipment and second terminal equipment in each preset period time interval;
respectively extracting features of the historical flow data to obtain flow statistical features in the preset period time intervals;
constructing flow characteristic time sequence samples in each preset period time interval according to the period time characteristic value corresponding to each preset period time interval and each flow statistical characteristic;
Obtaining sample labels corresponding to the flow characteristic time sequence samples, and constructing a terminal behavior detection model corresponding to the preset periodic time interval according to the flow characteristic time sequence samples and the sample labels.
2. The method of claim 1, wherein the obtaining a sample tag corresponding to each of the traffic feature timing samples comprises:
clustering the flow statistical characteristic values in each flow characteristic time sequence sample to obtain a clustering result corresponding to each flow characteristic time sequence sample;
and labeling each flow characteristic time sequence sample according to each clustering result to obtain each sample label.
3. The method of claim 2, wherein the clustering result comprises a cluster center; labeling each flow characteristic time sequence sample according to each clustering result to obtain each sample label, wherein the labeling comprises the following steps:
determining a target distance threshold corresponding to the clustering center according to the characteristic value interval of the clustering center;
calculating the interval distance between each flow statistic characteristic value in the flow characteristic time sequence sample and the clustering center;
And labeling the flow characteristic time sequence samples according to the interval distance and the target distance threshold value to obtain the sample labels.
4. The method of claim 3, wherein the sample tags comprise a positive sample tag and a negative sample tag; labeling the flow characteristic time sequence sample according to each interval distance and the target distance threshold value to obtain the sample label, wherein the labeling comprises the following steps:
if the interval distance is not greater than the target distance threshold, determining that no abnormal flow statistical characteristic value exists in the flow characteristic time sequence sample, and labeling a positive sample label for the flow characteristic time sequence sample;
if the interval distance is larger than the target distance threshold, determining that a noise flow statistic feature value exists in the flow feature time sequence sample;
determining a neighborhood flow characteristic time sequence sample corresponding to the flow characteristic time sequence sample, wherein the flow characteristic time sequence sample and the neighborhood flow characteristic time sequence sample are in the same preset period time interval in the same time period;
if the noise flow statistical characteristic value exists in the neighborhood flow characteristic time sequence sample, determining that the noise flow statistical characteristic value is an abnormal flow statistical characteristic value, and labeling a negative sample label for the flow characteristic time sequence sample;
If the noise flow statistical characteristic value does not exist in the neighborhood flow characteristic time sequence sample, determining that the noise flow statistical characteristic value is not an abnormal flow statistical characteristic value, and labeling a positive sample label for the flow characteristic time sequence sample.
5. The method of claim 1, wherein the performing feature extraction on each of the historical traffic data to obtain a traffic statistic feature in each of the preset cycle time intervals includes:
dividing the historical flow data in the preset period time interval according to the equipment address of each second terminal equipment to obtain flow dividing data corresponding to each equipment address;
determining the effective load quantity between the first terminal equipment and each second terminal equipment according to the flow segmentation data;
and taking the characteristic vector formed by the payload quantities as the flow statistical characteristic in the preset period time interval.
6. The method according to claim 1, wherein after the construction of the terminal behavior detection model corresponding to each of the preset cycle time intervals from each of the traffic characteristic timing samples and each of the sample tags, the method further comprises:
Acquiring real-time flow data between the first terminal equipment and each second terminal equipment;
constructing a real-time flow characteristic time sequence sample according to the real-time flow statistical characteristic corresponding to the real-time flow data and the current time interval corresponding to the real-time flow data;
positioning a target detection model in each terminal behavior detection model based on a preset periodic time interval in which the current time interval is positioned;
and detecting whether the first terminal equipment has abnormal behaviors in the current time interval according to the real-time flow characteristic time sequence sample based on the target detection model.
7. A terminal behavior detection model construction apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring historical flow data between the first terminal equipment and each second terminal equipment in each preset period time interval;
the characteristic extraction module is used for respectively carrying out characteristic extraction on each historical flow data to obtain flow statistical characteristics in each preset period time interval;
the time sequence sample construction module is used for constructing flow characteristic time sequence samples in the preset period time intervals according to the period time characteristic values corresponding to the preset period time intervals and the flow statistical characteristics;
The model construction module is used for acquiring sample labels corresponding to the flow characteristic time sequence samples, and constructing a terminal behavior detection model corresponding to the preset periodic time interval according to the flow characteristic time sequence samples and the sample labels.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202310624689.4A 2023-05-30 2023-05-30 Terminal behavior detection model construction method, device, equipment and storage medium Pending CN116723157A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310624689.4A CN116723157A (en) 2023-05-30 2023-05-30 Terminal behavior detection model construction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310624689.4A CN116723157A (en) 2023-05-30 2023-05-30 Terminal behavior detection model construction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116723157A true CN116723157A (en) 2023-09-08

Family

ID=87870797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310624689.4A Pending CN116723157A (en) 2023-05-30 2023-05-30 Terminal behavior detection model construction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116723157A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828371A (en) * 2024-03-01 2024-04-05 山东永恒电子科技有限公司 Intelligent analysis method for business information of comprehensive operation and maintenance platform
CN117828371B (en) * 2024-03-01 2024-05-24 山东永恒电子科技有限公司 Intelligent analysis method for business information of comprehensive operation and maintenance platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828371A (en) * 2024-03-01 2024-04-05 山东永恒电子科技有限公司 Intelligent analysis method for business information of comprehensive operation and maintenance platform
CN117828371B (en) * 2024-03-01 2024-05-24 山东永恒电子科技有限公司 Intelligent analysis method for business information of comprehensive operation and maintenance platform

Similar Documents

Publication Publication Date Title
CN113645232B (en) Intelligent flow monitoring method, system and storage medium for industrial Internet
CN113282461B (en) Alarm identification method and device for transmission network
CN109840157A (en) Method, apparatus, electronic equipment and the storage medium of fault diagnosis
CN113037595B (en) Abnormal device detection method and device, electronic device and storage medium
CN113660225A (en) Network attack event prediction method, system, device and medium based on time sequence point
CN112565187B (en) Power grid attack detection method, system, equipment and medium based on logistic regression
CN110471945B (en) Active data processing method, system, computer equipment and storage medium
Hariharan et al. Camlpad: Cybersecurity autonomous machine learning platform for anomaly detection
CN115396204A (en) Industrial control network flow abnormity detection method and device based on sequence prediction
KR102472081B1 (en) A System and Method for Monitoring Manufacturing Process
CN112817785A (en) Anomaly detection method and device for micro-service system
CN113553577B (en) Unknown user malicious behavior detection method and system based on hypersphere variational automatic encoder
CN113222040B (en) Marine fixed target identification method, device, computer equipment and storage medium
CN111306051B (en) Probe type state monitoring and early warning method, device and system for oil transfer pump unit
CN117411703A (en) Modbus protocol-oriented industrial control network abnormal flow detection method
CN116723157A (en) Terminal behavior detection model construction method, device, equipment and storage medium
CN111385273B (en) Internet of things business process identification method and device, electronic equipment and medium
CN108761250B (en) Industrial control equipment voltage and current-based intrusion detection method
CN113285978B (en) Fault identification method based on block chain and big data and general computing node
CN111582343B (en) Equipment fault prediction method and device
CN114331688A (en) Method and device for detecting batch operation state of bank counter system business
CN116381419B (en) Transmission line fault processing method, device, computer equipment and storage medium
CN116893924B (en) Equipment fault processing method, device, electronic equipment and storage medium
CN117851945A (en) Method, device and medium for detecting abnormality of application log of banking system
CN113347021B (en) Model generation method, collision library detection method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination