CN115296930B - Periodic behavior detection method, system and terminal - Google Patents

Periodic behavior detection method, system and terminal Download PDF

Info

Publication number
CN115296930B
CN115296930B CN202211194497.6A CN202211194497A CN115296930B CN 115296930 B CN115296930 B CN 115296930B CN 202211194497 A CN202211194497 A CN 202211194497A CN 115296930 B CN115296930 B CN 115296930B
Authority
CN
China
Prior art keywords
behavior
sequence
periodic
screening
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211194497.6A
Other languages
Chinese (zh)
Other versions
CN115296930A (en
Inventor
仝永杰
路冰
卢延科
孙琦
唐上
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongfu Safety Technology Co Ltd
Original Assignee
Zhongfu Safety Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongfu Safety Technology Co Ltd filed Critical Zhongfu Safety Technology Co Ltd
Priority to CN202211194497.6A priority Critical patent/CN115296930B/en
Publication of CN115296930A publication Critical patent/CN115296930A/en
Application granted granted Critical
Publication of CN115296930B publication Critical patent/CN115296930B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a periodic behavior detection method, a system and a terminal, which relate to the technical field of data clustering algorithm, wherein the method comprises the steps of sorting flow data of a network, removing duplication and arranging the flow data in an ascending order according to time; searching a periodic sequence and a periodic value, and screening the periodic sequence and the periodic value to leave a behavior sequence containing periodicity; and calculating the average value of the mean square deviations of the first two types of the behavior sequences corresponding to each behavior mode, sequencing the behavior modes according to the average value, and outputting the behavior modes and the period value after sequencing. Through the screening of the data, sequences which do not contain periodic behaviors in a large number of actual flows are removed, and the overall efficiency is improved. The behavior modes are sequenced through the mean square error, and the detection efficiency is ensured.

Description

Periodic behavior detection method, system and terminal
Technical Field
The invention relates to the technical field of data clustering algorithms, in particular to a periodic behavior detection method, a periodic behavior detection system and a terminal based on DBSCAN.
Background
In the analysis of user behavior, personal periodic behavior (such as periodic access, downloading or operation) is a common phenomenon and is also an important component of user behavior. The behavior law of the user can be mined through the identification of the periodic behavior of the individual, and for some sensitive events, if the periodicity or semiperiodicity exists, the supervision needs to be strengthened, so that the risk is reduced.
The prior art for detecting the periodic behavior mainly includes an autocorrelation method, a fast fourier transform method, and the like. The periodicity of the time stamp sequence is judged mainly by calculating an autocorrelation function or an autocorrelation coefficient and combining a specific statistical mode based on an autocorrelation method; the method based on fast fourier transform mainly converts a time domain signal into a frequency domain space, and then analyzes the periodicity of the sequence and the possible size of the period value in the frequency domain space.
In the existing method, different appropriate thresholds are provided for different periodic sequences based on an autocorrelation method, the selection of the thresholds is difficult, the false alarm rate is high under certain specific conditions, and the effect on sequences with small periodic values is generally poor. Correlation methods based on fast fourier transforms are generally difficult to suppress the effects of noise and are generally only suitable for sequences with a period value of medium to short periods.
Disclosure of Invention
The invention provides a periodic behavior detection method based on DBSCAN, which eliminates sequences without periodic behaviors in a large amount of actual flow by screening data and improves the overall efficiency.
The method comprises the following steps:
the method comprises the following steps that firstly, flow data of a network are sorted, duplicate removed and arranged in an ascending order according to time;
searching for the periodic sequence and the periodic value, and screening the periodic sequence and the periodic value to leave a behavior sequence containing periodicity;
and step three, calculating the average value of the first two kinds of mean square deviations of the behavior sequence corresponding to each behavior mode, sequencing the behavior modes according to the average value, and outputting the behavior mode and the period value after sequencing.
It should be further noted that, the flow data in the first step includes: five tuples of source ip, destination ip, protocol, destination port, and time.
It should be further noted that, in the step one, the traffic data is divided into two layers of screening.
It should be further noted that the first layer screening method includes:
(1) Screening out the flow with preset fields as specific values or filed flows already recorded;
(2) Taking the flow data with the same source ip, destination ip, protocol and destination port in the quintuple as the same behavior mode, and screening out the flows with the same behavior mode quantity less than or equal to 3 or more than 30000;
and after the screening, sorting the flow data again to form a time stamp sequence corresponding to the mode.
It should be further noted that the second layer screening method includes:
setting a time shaft, setting a time scale at intervals, dividing the time shaft into a plurality of intervals with equal interval length, and sequentially arranging interval sequence numbers from 0 to back;
sequentially configuring all time points in the time stamp sequence into corresponding time intervals;
counting the sequence numbers of the time points in the time interval in turn, and recording all the sequence numbers as a list
Figure 625463DEST_PATH_IMAGE001
Computing
Figure 204737DEST_PATH_IMAGE001
The interval of the middle sequence is calculated in such a way that the former is subtracted from the latter to obtain a new interval sequence which is recorded as
Figure 193815DEST_PATH_IMAGE002
Statistics of
Figure 377979DEST_PATH_IMAGE002
The number of each interval in the sequence is sorted according to the number;
the sum of the first two terms is recorded as S, and the ratio of S to the total interval sample is calculated
Figure 579546DEST_PATH_IMAGE003
And if the ratio is greater than the set threshold th, the behavior pattern and the corresponding ratio are saved.
It should be further noted that the second step further includes:
step 11, converting the time sequence into an interval sequence T, wherein
Figure 108983DEST_PATH_IMAGE004
Step 12, clustering T by using a DBSCAN method;
step 13, for each cluster
Figure 726086DEST_PATH_IMAGE005
Calculating the mean p and standard deviation of the current cluster
Figure 32433DEST_PATH_IMAGE006
If it falls in
Figure 381899DEST_PATH_IMAGE007
Is in the amount of
Figure 450612DEST_PATH_IMAGE008
If the proportion of the total quantity of the medium elements exceeds a set threshold value, discarding the current cluster; otherwise, reserving;
step 14, for remaining
Figure 952131DEST_PATH_IMAGE009
Sorting according to the number of elements in each cluster;
step 15, calculating the chi-square threshold of the current cluster
Figure 258959DEST_PATH_IMAGE010
If the number of elements in the current cluster
Figure 769706DEST_PATH_IMAGE011
Then calculate the average of the current cluster
Figure 741859DEST_PATH_IMAGE012
Sum mean square error
Figure 945439DEST_PATH_IMAGE013
Is recorded as
Figure 653808DEST_PATH_IMAGE014
Step 16, the clusters are arranged according to
Figure 624038DEST_PATH_IMAGE015
Sorting is carried out;
step 17, if the total number of the elements in the first two clusters is greater than a preset threshold value and the total number of the categories is less than a preset number, reserving
Figure 244506DEST_PATH_IMAGE016
Wherein the mean value
Figure 386031DEST_PATH_IMAGE017
A period as an input sequence;
step 18, outputting the period of the behavior pattern
Figure 91950DEST_PATH_IMAGE018
It should be further noted that step 12 further includes:
if the clustering cannot be carried out, the behavior mode is considered to have no periodicity, and all the processes are finished;
if forming K clusters after clustering
Figure 836177DEST_PATH_IMAGE019
Calculating the noise rate nsr;
if nsr is less than or equal to the set threshold, removing noise data and performing step 13;
and if the nsr is greater than the set threshold value, the interval sequence T is considered to have no periodicity, and all the processes are finished.
It is further noted that, in step three, the average value is calculated by the following formula:
Figure 842310DEST_PATH_IMAGE020
the invention also provides a periodic behavior detection system, which comprises: a data screening layer, a period detection layer and a sequencing layer;
the data screening layer is used for sorting the flow data of the network, removing the duplicate and performing ascending arrangement according to time;
the period detection layer is used for searching the period sequence and the period value, screening the period sequence and the period value and leaving a behavior sequence containing periodicity;
and the sequencing layer is used for calculating the average value of the mean square deviations of the first two types of the behavior sequences corresponding to each behavior mode, sequencing the behavior modes according to the average value, and outputting the behavior modes and the period value after sequencing.
The invention also provides a terminal which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and the steps of the periodic behavior detection method are realized when the processor executes the program.
According to the technical scheme, the invention has the following advantages:
the periodic behavior detection method of the invention rejects a large amount of sequences without periodic behaviors in the flow data by screening the flow data, improves the overall efficiency, detects whether the sequences have periodicity and the most probable period by methods such as DBSCAN, chi-square inspection, setting multi-class threshold values and the like, and finally sorts the behavior modes by mean square error, thereby ensuring the efficiency, detecting the periodic values, and carrying out semi-periodic behaviors and multi-periodic behaviors (according to the period in the daytime)
Figure 512326DEST_PATH_IMAGE021
Press period at night
Figure 94091DEST_PATH_IMAGE022
Access is performed) has excellent detection capability and overcomes the deficiency in the amount of periodically accessed data, i.e., the inability to detect and record all traffic, noise, periodic fluctuations, and other data quality problems in the traffic detector.
The periodic behavior detection method of the invention solves the defects that the prior method has different proper thresholds for different periodic sequences, has higher false alarm rate under certain specific conditions and has poor sequence effect on sequences with smaller period values. The method avoids the problem that the related method based on the fast Fourier transform is difficult to inhibit the noise influence, and can be suitable for sequences with medium and short periods as period values. Sequences without periodic behaviors in a large amount of actual flow are eliminated, and the overall detection efficiency is improved.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings used in the description will be briefly introduced, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for periodic behavior detection;
FIG. 2 is a sequence diagram after sorting;
FIG. 3 is a plot of a behavior pattern sequence over time;
FIG. 4 is a sequence diagram of the sequenced behaviors;
FIG. 5 is a schematic diagram of a periodic behavior detection system.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the diagram provided in the method for detecting periodic behavior of the present invention is only a schematic illustration of the basic idea of the present invention, and only the modules related to the present invention are shown in the drawings rather than the number and functions of the modules in actual implementation, and the functions, number and purposes of the modules in actual implementation may be changed arbitrarily, and the module layout function may be more complicated.
The periodic behavior detection method can acquire and process the associated data based on the artificial intelligence technology. Among them, artificial Intelligence (AI) is to simulate, extend and extend human Intelligence using a digital computer or a digital computer-controlled machine, sense an environment, a data clustering algorithm, and obtain an optimal behavior pattern and a periodicity value using the behavior pattern.
Fig. 1 shows a flow chart of a preferred embodiment of the periodic behavior detection method of the present invention. The periodic behavior detection method is applied to one or more terminal machines, which are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The terminal may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), an Internet Protocol Television (IPTV), and other devices.
The terminal may also include network equipment and/or user equipment. Wherein the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network servers.
The Network where the terminal is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.
The periodic behavior detection method of the present invention, in which Clustering calculation is performed Based on the DBSCAN method, will be described in detail with reference to fig. 1 to 5, wherein DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a Density-Based Clustering algorithm. The cluster is defined as the maximum set of points connected in density, an area with high enough density can be divided into clusters, clustering calculation in any shape is carried out in flow data of a network with certain noise, and the overall periodic behavior detection efficiency is improved.
In a first aspect of the embodiments of the present invention, a method for detecting a periodic behavior is provided, as shown in fig. 1, the method includes:
s101, arranging the flow data of the network, removing duplication and arranging the flow data in ascending order according to time;
in the embodiment, the traffic data of the network is sorted into five tuples including source ip (src _ ip), destination ip (dst _ ip), protocol (app _ proto), destination port (d _ port), and time (datetime), and is deduplicated and arranged in ascending order according to the time, as shown in table 1.
TABLE 1 quinary groups after finishing
Figure 874965DEST_PATH_IMAGE023
In the embodiment of the present invention, since the amount of data is enormous, two-layer screening is performed on the data before detection in order to improve efficiency.
Wherein the first layer of screening comprises the following modes:
(1) And screening the flow with the field as a preset value or the filed flow which is already recorded according to the actual service situation.
(2) Taking the flow data with the same data information of four fields of a source ip (src _ ip), a destination ip (dst _ ip), a protocol (app _ proto) and a destination port (d _ port) in the quintuple as the same behavior mode, and screening out the flow with the same behavior mode quantity less than or equal to 3 or more than 30000. And performing sorting again, sorting the traffic data into a time stamp sequence corresponding to the mode, for example, converting a date into a time stamp, as shown in fig. 2.
In an embodiment of the invention, the second screening comprises the following modes:
a time axis is set, a time scale is set at intervals, the size of the interval time interval is determined according to the data condition and the detection granularity, the time axis is divided into a plurality of intervals with equal interval length, and the interval serial numbers are arranged from 0 to the back in sequence. Sequentially configuring all time points in the time stamp sequence into corresponding time intervals; a sequence of time stamps may be plotted as shown in fig. 3.
Counting the sequence numbers of time points in the time interval in sequence, recording only one sequence number of a plurality of time points in the time interval, recording all the sequence numbers as a list L, calculating the interval of the sequence in the L by subtracting the former from the latter to obtain a new interval sequence, and recording the new interval sequence as a list L
Figure 33807DEST_PATH_IMAGE024
Statistics of
Figure 77987DEST_PATH_IMAGE025
The number of the intervals in the sequence is sorted according to the number, and the sorting mode can be from large to small. Wherein, the interval is more than or equal to 3 for reservation, and then more than or equal to 2 for reservation.
Further, the sum of the first two terms is recorded as S, and the ratio of S to the total interval sample is calculated
Figure 322280DEST_PATH_IMAGE026
If the ratio is greater than the set threshold th, the behavior pattern and the corresponding ratio are saved, otherwise, the behavior pattern and the corresponding ratio are screened out.
After the above processes, each remaining behavior pattern corresponds to a ratio, the behavior patterns recorded with the time stamps are sorted from high to low according to the size of the ratio, and as shown in fig. 4, the sequence of the behavior patterns is saved for the next process.
S102, searching for a periodic sequence and a periodic value, and screening the periodic sequence and the periodic value to leave a periodic behavior sequence;
in an embodiment of the present invention, based on step S102, a possible embodiment will be given below to illustrate a specific implementation thereof in a non-limiting manner.
In this embodiment, a sequence that may contain a period and its corresponding period value are found. After passing through the screening layer, sequences containing periodic behaviors in a large probability are left, and each behavior sequence is processed according to the sequence of the data screening layer) according to the following steps:
step 11, converting the time sequence into an interval sequence T, wherein
Figure 110107DEST_PATH_IMAGE027
Step 12, clustering T by using a DBSCAN method;
specifically, T is clustered using the DBSCAN method. If the clustering cannot be carried out, the interval sequence T is considered to have no periodicity, and all the processes are finished;
if forming K clusters after clustering
Figure 825253DEST_PATH_IMAGE028
Calculating the noise rate nsr in the following way: number/total number successfully clustered); if nsr is less than or equal to the set threshold, noise data is removed, step 3 is carried out, if nsr is greater than the set threshold, the sequence is considered to have no periodicity, and all the processes are finished.
Step 13, for each cluster
Figure 102651DEST_PATH_IMAGE005
Calculating the mean p and standard deviation of the current cluster
Figure 926251DEST_PATH_IMAGE006
If it falls on
Figure 351723DEST_PATH_IMAGE029
Is in the amount of
Figure 842747DEST_PATH_IMAGE008
If the proportion of the total quantity of the medium elements exceeds a set threshold value, discarding the current cluster; otherwise, reserving;
step 14, for remaining
Figure 572937DEST_PATH_IMAGE009
Sorting according to the number of elements in each cluster;
step 15, calculating the chi-square threshold of the current cluster
Figure 149411DEST_PATH_IMAGE030
If the number of elements in the current cluster
Figure 341358DEST_PATH_IMAGE011
Then calculate the average of the current cluster
Figure 470245DEST_PATH_IMAGE012
Sum mean square error
Figure 89445DEST_PATH_IMAGE031
Is recorded as
Figure 887637DEST_PATH_IMAGE032
Otherwise, the current cluster is discarded.
Step 16, the clusters are arranged according to
Figure 463368DEST_PATH_IMAGE015
The sorting can be performed from low to high.
Step 17, if the total number of the elements in the first two clusters is greater than a preset threshold value and the total number of the categories is less than a preset number, reserving
Figure 663405DEST_PATH_IMAGE033
(ii) a Wherein the mean value
Figure 977DEST_PATH_IMAGE034
As a possible periodicity of the input sequence, the sequence is otherwise considered to have no periodicity.
Wherein the mean value
Figure 286465DEST_PATH_IMAGE017
A period as an input sequence;
step 18, outputting the period of the behavior pattern
Figure 820214DEST_PATH_IMAGE035
S103, calculating the mean value of the mean square deviations of the first two types of the behavior sequence corresponding to each behavior mode, sequencing the behavior modes according to the mean value, and outputting the behavior mode and the period value after sequencing.
Wherein, the calculation mode of the average value is as follows:
Figure 935411DEST_PATH_IMAGE036
the behavior patterns are sorted according to the average value, and the sorted output is used as the behavior patterns and the period value.
The periodic behavior detection method eliminates sequences which do not contain periodic behaviors in a large amount of flow data by screening the flow data, improves the overall efficiency, detects whether the sequences have periodicity and the most possible period by methods such as DBSCAN, chi-square inspection, multi-class threshold setting and the like, and finally sequences behavior modes by mean square error, so that the efficiency is ensured, the periodic values can be detected, and the periodic behaviors and the multi-periodic behaviors (the behaviors are classified according to the period in the daytime) are detected
Figure 630835DEST_PATH_IMAGE037
Press period at night
Figure 13406DEST_PATH_IMAGE038
Access is performed) has excellent detection capability and overcomes the deficiency in the amount of periodically accessed data, i.e., the inability to detect and record all traffic, noise, periodic fluctuations, and other data quality problems in the traffic detector.
Based on the above periodic behavior detection method, the present invention further provides a periodic behavior detection system, as shown in fig. 5, the system includes: a data screening layer, a period detection layer and a sequencing layer;
the data screening layer is used for sorting the flow data of the network, removing the duplication and performing ascending sequence according to time;
the period detection layer is used for searching the period sequence and the period value, screening the period sequence and the period value and leaving a behavior sequence containing periodicity;
and the sequencing layer is used for calculating the average value of the mean square deviations of the first two types of the behavior sequences corresponding to each behavior mode, sequencing the behavior modes according to the average value, and outputting the behavior modes and the period value after sequencing.
Wherein, the calculation mode of the average value is as follows:
Figure 288529DEST_PATH_IMAGE039
and sequencing the behavior patterns according to the average value, and taking the output after sequencing as the behavior patterns and the period value.
The periodic behavior detection method eliminates sequences which do not contain periodic behaviors in a large amount of flow data by screening the flow data, improves the overall efficiency, detects whether the sequences have periodicity and the most possible period by methods such as DBSCAN, chi-square inspection, multi-class threshold setting and the like, and finally sequences behavior modes by mean square error, so that the efficiency is ensured, the periodic values can be detected, and the periodic behaviors and the multi-periodic behaviors (the behaviors are classified according to the period in the daytime) are detected
Figure 197580DEST_PATH_IMAGE021
Press period at night
Figure 440735DEST_PATH_IMAGE022
Access is performed) has excellent detection capability and overcomes the deficiency in the amount of periodically accessed data, i.e., the inability to detect and record all traffic, noise, periodic fluctuations, and other data quality problems in the traffic detector.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
In the periodic behavior detection system provided by the present invention, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form of connection.
The periodic behavior detection system and method provided by the present invention, which incorporate the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein, can be implemented in electronic hardware, computer software, or combinations of both, where the components and steps of the various examples have been described in a functional general sense in the foregoing description for the purpose of clearly illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
As will be appreciated by one skilled in the art, aspects of the periodic behavior detection system and method provided by the present invention may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.), or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (3)

1. A method for periodic behavior detection, the method comprising:
the method comprises the following steps that firstly, flow data of a network are sorted, duplicate removed and arranged in an ascending order according to time; the flow data includes: quintuple of source ip, destination ip, protocol, destination port and time;
wherein the flow data is divided into two layers of screens;
the first layer of screening mode comprises:
(1) Screening out the flow with preset fields as specific values or filed flows already recorded;
(2) Taking the flow data with the same source ip, destination ip, protocol and destination port in the quintuple as the same behavior mode, and screening out the flows with the quantity of the same behavior mode less than or equal to 3 or more than 30000;
after screening out, sorting the flow data again, and sorting the flow data into time stamp sequences corresponding to the modes;
the second layer of screening mode comprises the following steps:
setting a time axis, setting a time scale at intervals, dividing the time axis into a plurality of intervals with equal interval length, and sequentially arranging interval sequence numbers from 0 to back;
sequentially configuring all time points in the time stamp sequence into corresponding time intervals;
counting the sequence numbers of the time points in the time interval in turn, and recording all the sequence numbers as a list
Figure 822120DEST_PATH_IMAGE001
Computing
Figure 766943DEST_PATH_IMAGE001
The interval of the middle sequence is calculated in such a way that the former is subtracted from the latter to obtain a new interval sequence which is recorded as
Figure 119426DEST_PATH_IMAGE002
Statistics of
Figure 389871DEST_PATH_IMAGE002
The number of each interval in the sequence is sorted according to the number;
the sum of the first two terms is recorded as S, and the ratio of S to the total interval sample is calculated
Figure 827805DEST_PATH_IMAGE003
If the ratio is larger than a set threshold th, the behavior pattern and the corresponding ratio are saved;
step two, searching for a periodic sequence and a periodic value, and screening the periodic sequence and the periodic value to leave a behavior sequence containing periodicity;
after passing through the screening layer, sequences containing periodic behaviors are left, and each behavior sequence is sorted according to the data screening layer, and the method specifically comprises the following steps:
step 11, converting the time sequence into an interval sequence T, wherein
Figure 48090DEST_PATH_IMAGE004
Step 12, clustering T by using a DBSCAN method;
if the clustering cannot be carried out, the behavior mode is considered to have no periodicity, and all the processes are finished;
if forming K clusters after clustering
Figure 520659DEST_PATH_IMAGE005
Calculating the noise rate nsr;
if nsr is less than or equal to the set threshold, removing noise data and performing step 13;
if nsr is greater than a set threshold, the interval sequence T is considered to have no periodicity, and all processes are finished;
step 13, for each cluster
Figure 696426DEST_PATH_IMAGE006
Calculating the mean p and standard deviation of the current cluster
Figure 621656DEST_PATH_IMAGE007
If it falls on
Figure 173861DEST_PATH_IMAGE008
Is in the amount of
Figure 235357DEST_PATH_IMAGE009
If the proportion of the total quantity of the medium elements exceeds a set threshold value, discarding the current cluster; otherwise, reserving;
step 14, for remaining
Figure 316446DEST_PATH_IMAGE010
Sorting according to the number of elements in each cluster;
step 15, calculating the chi-square threshold of the current cluster
Figure 728973DEST_PATH_IMAGE011
If the number of elements in the current cluster
Figure 84868DEST_PATH_IMAGE012
Then calculate the average of the current cluster
Figure 266450DEST_PATH_IMAGE013
Sum mean square error
Figure 252861DEST_PATH_IMAGE014
Is recorded as
Figure 215001DEST_PATH_IMAGE015
Step 16, the clusters are arranged according to
Figure 781111DEST_PATH_IMAGE016
Sorting is carried out;
step 17, if the total number of the elements in the first two clusters is greater than a preset threshold value and the total number of the categories is less than a preset number, reserving the elements;
Figure 410676DEST_PATH_IMAGE017
wherein the mean value
Figure 771250DEST_PATH_IMAGE018
A period as an input sequence;
step 18, outputting the period of the behavior pattern
Figure 158369DEST_PATH_IMAGE019
Step three, calculating the average value of the first two kinds of mean square deviations of the behavior sequence corresponding to each behavior mode, sequencing the behavior modes according to the average value, and outputting the behavior mode and a period value after sequencing;
the average value is calculated by the following formula:
Figure 327838DEST_PATH_IMAGE020
2. a periodic behavior detection system, characterized in that the system employs the periodic behavior detection method as claimed in claim 1;
the system comprises: a data screening layer, a period detection layer and a sequencing layer;
the data screening layer is used for sorting the flow data of the network, removing the duplication and performing ascending sequence according to time;
the period detection layer is used for searching the period sequence and the period value, screening the period sequence and the period value and leaving a behavior sequence containing periodicity;
and the sequencing layer is used for calculating the average value of the mean square deviations of the first two types of the behavior sequences corresponding to each behavior mode, sequencing the behavior modes according to the average value, and outputting the behavior modes and the period value after sequencing.
3. A terminal comprising a memory, a processor and a computer program stored on said memory and executable on said processor, wherein said processor when executing the program performs the steps of the periodic behavior detection method of claim 1.
CN202211194497.6A 2022-09-29 2022-09-29 Periodic behavior detection method, system and terminal Active CN115296930B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211194497.6A CN115296930B (en) 2022-09-29 2022-09-29 Periodic behavior detection method, system and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211194497.6A CN115296930B (en) 2022-09-29 2022-09-29 Periodic behavior detection method, system and terminal

Publications (2)

Publication Number Publication Date
CN115296930A CN115296930A (en) 2022-11-04
CN115296930B true CN115296930B (en) 2023-02-17

Family

ID=83834709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211194497.6A Active CN115296930B (en) 2022-09-29 2022-09-29 Periodic behavior detection method, system and terminal

Country Status (1)

Country Link
CN (1) CN115296930B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827092A (en) * 2010-03-30 2010-09-08 北京理工大学 Detection method for periodic subsequence in network data stream
CN105262729A (en) * 2015-09-11 2016-01-20 携程计算机技术(上海)有限公司 Trojan horse detection method and system
WO2017034512A1 (en) * 2015-08-21 2017-03-02 Hewlett Packard Enterprise Development Lp Interactive analytics on time series
CN112491877A (en) * 2020-11-26 2021-03-12 中孚安全技术有限公司 User behavior sequence anomaly detection method, terminal and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2916260A1 (en) * 2014-03-06 2015-09-09 Tata Consultancy Services Limited Time series analytics
US10198339B2 (en) * 2016-05-16 2019-02-05 Oracle International Corporation Correlation-based analytic for time-series data
KR102464390B1 (en) * 2016-10-24 2022-11-04 삼성에스디에스 주식회사 Method and apparatus for detecting anomaly based on behavior analysis
US11132342B2 (en) * 2019-12-02 2021-09-28 Alibaba Group Holding Limited Periodicity detection and period length estimation in time series

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101827092A (en) * 2010-03-30 2010-09-08 北京理工大学 Detection method for periodic subsequence in network data stream
WO2017034512A1 (en) * 2015-08-21 2017-03-02 Hewlett Packard Enterprise Development Lp Interactive analytics on time series
CN105262729A (en) * 2015-09-11 2016-01-20 携程计算机技术(上海)有限公司 Trojan horse detection method and system
CN112491877A (en) * 2020-11-26 2021-03-12 中孚安全技术有限公司 User behavior sequence anomaly detection method, terminal and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《周期性网络流量的特性与应用研究》;张正平;《CNKI优秀硕士学位论文全文库》;20200215;全文 *

Also Published As

Publication number Publication date
CN115296930A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
WO2021184727A1 (en) Data abnormality detection method and apparatus, electronic device and storage medium
CN111178380B (en) Data classification method and device and electronic equipment
CN103401698B (en) For the monitoring system that server health is reported to the police in server set group operatione
CN108809745A (en) A kind of user's anomaly detection method, apparatus and system
CN107222511B (en) Malicious software detection method and device, computer device and readable storage medium
CN112241494A (en) Key information pushing method and device based on user behavior data
CN105429792B (en) User behavior flow acquisition methods and device, user behavior analysis method and system
CN112134862A (en) Coarse-fine granularity mixed network anomaly detection method and device based on machine learning
CN113726783A (en) Abnormal IP address identification method and device, electronic equipment and readable storage medium
CN112800115A (en) Data processing method and data processing device
CN109819128A (en) A kind of quality detecting method and device of telephonograph
CN111064719B (en) Method and device for detecting abnormal downloading behavior of file
CN114780606B (en) Big data mining method and system
CN115174233A (en) Network security analysis method, device, system and medium based on big data
CN111628974A (en) Differential privacy protection method and device, electronic equipment and storage medium
CN110414591A (en) A kind of data processing method and equipment
CN115296930B (en) Periodic behavior detection method, system and terminal
CN112232290B (en) Data clustering method, server, system and computer readable storage medium
CN117294497A (en) Network traffic abnormality detection method and device, electronic equipment and storage medium
CN113746780B (en) Abnormal host detection method, device, medium and equipment based on host image
CN113240013A (en) Model training method, device and equipment based on sample screening and storage medium
CN113850294A (en) Abnormal encrypted traffic identification method and system
US11436320B2 (en) Adaptive computer security
CN105930430B (en) Real-time fraud detection method and device based on non-accumulative attribute
CN112087450A (en) Abnormal IP identification method, system and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant