CN111314161B - Traffic identification method and device - Google Patents

Traffic identification method and device Download PDF

Info

Publication number
CN111314161B
CN111314161B CN201911059598.0A CN201911059598A CN111314161B CN 111314161 B CN111314161 B CN 111314161B CN 201911059598 A CN201911059598 A CN 201911059598A CN 111314161 B CN111314161 B CN 111314161B
Authority
CN
China
Prior art keywords
space
time trajectory
identification
flow
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911059598.0A
Other languages
Chinese (zh)
Other versions
CN111314161A (en
Inventor
肖圣龙
武金
刁士涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Three Cloud Computing Co ltd
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201911059598.0A priority Critical patent/CN111314161B/en
Publication of CN111314161A publication Critical patent/CN111314161A/en
Application granted granted Critical
Publication of CN111314161B publication Critical patent/CN111314161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/067Generation of reports using time frame reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Abstract

The application discloses a traffic identification method and device. The method comprises the following steps: generating a space-time trajectory graph according to the flow log data of the flow to be identified and a space-time trajectory coordinate system; the abscissa of the space-time trajectory coordinate system is interface access time, and the ordinate of the space-time trajectory coordinate system is interface index identification; and inputting the space-time trajectory graph into a flow identification model for identification to obtain an output identification result. The method has the advantages that a space-time trajectory coordinate system is constructed through characteristics such as flow logs, interface access time and the like, a space-time trajectory graph is generated, and finally, more remarkable identification characteristics are obtained from an image level according to the space-time trajectory graph, so that flow identification capacity and accuracy of an identification result are improved, the identification error rate is reduced, user experience is improved, and meanwhile a good identification effect is achieved. And the traffic identification model does not depend on service data, and has higher identification efficiency, stronger generalization capability, longer life cycle and universality and better identification effect.

Description

Traffic identification method and device
Technical Field
The application relates to the field of identification algorithms, in particular to a traffic identification method and device.
Background
The website platform may encounter continuous abnormal use behaviors of the user during operation, such as crawling data, traffic attacks and the like. Therefore, the method and the device for accurately and quickly identifying the abnormal traffic have important significance for improving the monitoring and early warning level and improving the user experience of the website platform. Although the prior art has various identification technical schemes, each method has certain defects, and the identification effect cannot be guaranteed. For example, under the frequent accumulation method, the method has a good defense effect on violent attacks, but is easy to be utilized by an attacker, and the identification process is easy to miss. Under the abnormal parameter identification method, the method cannot adapt to the characteristics of continuous upgrading and evolution of attack and defense, is easy to crack, and may cause bad use experience for common users. Although the method for recognizing the track of the external interaction device by using a mouse, a keyboard and the like has a good effect, the deployment is complex and the method is difficult to apply to the handheld terminal app. Under the slider track abnormity identification method, although the method is mature and has a good identification effect, only distance and displacement information of time and space can be extracted in the identification process, and information of image pixel points is not reflected, so that certain defects exist.
Disclosure of Invention
In view of the above, the present application is proposed to provide a traffic identification method and apparatus that overcomes or at least partially solves the above mentioned problems.
According to an aspect of the present application, there is provided a traffic identification method, including:
generating a space-time trajectory graph according to the flow log data of the flow to be identified and a space-time trajectory coordinate system; the abscissa of the space-time trajectory coordinate system is interface access time, and the ordinate of the space-time trajectory coordinate system is interface index identification;
and inputting the space-time trajectory graph into a flow identification model for identification to obtain an output identification result.
Optionally, the method further comprises:
extracting a normal data set and an abnormal data set from the sample flow log data;
generating a space-time trajectory graph for each piece of flow log data in the normal data set and the abnormal data set according to the space-time trajectory coordinate system respectively to obtain a sample space-time trajectory graph set;
and training according to the sample space-time trajectory graph set to obtain the flow identification model.
Optionally, the extracting the normal data set and the abnormal data set from the sample flow log data includes:
and according to the content of the flow request, taking the flow log data containing the preset normal behavior as normal data, and taking the flow log data containing the preset abnormal behavior as abnormal data.
Optionally, the obtaining the flow identification model by training according to the sample spatiotemporal trajectory graph set includes:
training based on a preset convolutional neural network basic model and the sample space-time trajectory graph set to obtain the flow identification model; the preset convolutional neural network basic model comprises three convolutional layers, three pooling layers and three full-connection layers.
Optionally, the obtaining the flow identification model by training according to the sample spatiotemporal trajectory graph set includes:
carrying out disorder processing on each space-time trajectory diagram in the sample space-time trajectory diagram set, wherein the space-time trajectory diagram with a first proportion in the sample space-time trajectory diagram is not repeatedly sampled and is used as a training set, the space-time trajectory diagram with a second proportion is used as a verification set, and the space-time trajectory diagram with a third proportion is used as a test set; the sum of the first proportion, the second proportion and the third proportion is 100%.
Optionally, the method further comprises:
generating an interface word vector according to the normal data set;
and determining the interface index identification of each interface according to the interface word vector.
Optionally, the method of any preceding claim, wherein the size of the spatiotemporal trajectory graph is predetermined.
According to another aspect of the present application, there is provided an abnormal traffic identification apparatus, including:
the space-time trajectory graph generating unit is used for generating a space-time trajectory graph according to the flow log data of the flow to be identified and a space-time trajectory coordinate system; the abscissa of the space-time trajectory coordinate system is interface access time, and the ordinate of the space-time trajectory coordinate system is interface index identification;
and the identification unit is used for inputting the space-time trajectory diagram into the flow identification model for identification to obtain an output identification result.
Optionally, the spatiotemporal trajectory graph generating unit is further configured to extract a normal data set and an abnormal data set from the sample flow log data;
generating a space-time trajectory graph for each piece of flow log data in the normal data set and the abnormal data set according to the space-time trajectory coordinate system respectively to obtain a sample space-time trajectory graph set;
and training according to the sample space-time trajectory graph set to obtain the flow identification model.
Optionally, the spatiotemporal trajectory graph generating unit is configured to, according to content of the traffic request, use traffic log data including a preset normal behavior as normal data, and use traffic log data including a preset abnormal behavior as abnormal data.
Optionally, the spatiotemporal trajectory graph generation unit is configured to obtain the traffic identification model based on a preset convolutional neural network basic model and training of the sample spatiotemporal trajectory graph set; the preset convolutional neural network basic model comprises three convolutional layers, three pooling layers and three full-connection layers.
Optionally, the spatiotemporal trajectory map generation unit is configured to perform disorder processing on each spatiotemporal trajectory map in the sample spatiotemporal trajectory map set, and not repeatedly sample a spatiotemporal trajectory map of a first proportion in the sample spatiotemporal trajectory map as a training set, a spatiotemporal trajectory map of a second proportion as a verification set, and a spatiotemporal trajectory map of a third proportion as a test set; the sum of the first proportion, the second proportion and the third proportion is 100%.
Optionally, the spatiotemporal trajectory graph generating unit is further configured to generate an interface word vector according to the normal data set;
and determining the interface index identification of each interface according to the interface word vector.
Optionally, the apparatus according to any of the above, wherein the size of the spatiotemporal trajectory graph is predetermined.
In accordance with yet another aspect of the present application, there is provided an electronic device including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.
According to a further aspect of the application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as in any above.
According to the technical scheme, the space-time trajectory graph is generated according to the flow log data of the flow to be identified and the space-time trajectory coordinate system; the abscissa of the space-time trajectory coordinate system is interface access time, and the ordinate of the space-time trajectory coordinate system is interface index identification; and inputting the space-time trajectory graph into a flow identification model for identification to obtain an output identification result. The method has the advantages that a space-time trajectory coordinate system is constructed through characteristics such as flow logs, interface access time and the like, a space-time trajectory graph is generated, and finally, more remarkable identification characteristics are obtained from an image level according to the space-time trajectory graph, so that flow identification capacity and accuracy of an identification result are improved, the identification error rate is reduced, user experience is improved, and meanwhile a good identification effect is achieved. And the traffic identification model does not depend on service data, and has higher identification efficiency, stronger generalization capability, longer life cycle and universality and better identification effect.
The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 shows a flow diagram of a traffic identification method according to an embodiment of the present application;
FIG. 2 illustrates a schematic diagram of a flow identification device according to an embodiment of the present application;
FIG. 3a illustrates a spatiotemporal trajectory graph of normal behavioral traffic requests according to one embodiment of the present application;
FIG. 3b illustrates a spatiotemporal trajectory graph of an abnormal behavior traffic request according to one embodiment of the present application;
FIG. 4 illustrates a deep learning network architecture diagram of a traffic recognition model according to one embodiment of the present application;
FIG. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 6 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 shows a flow chart of a traffic identification method according to an embodiment of the present application. As shown in fig. 1, the method includes:
step S110, generating a space-time trajectory graph according to flow log data of the flow to be identified and a space-time trajectory coordinate system; the abscissa of the space-time trajectory coordinate system is interface access time, and the ordinate of the space-time trajectory coordinate system is interface index identification.
After the user accesses the website platform, corresponding recorded information can be left in the flow log data. The traffic log data may include indicators of the number of users of the website, the access time of the user, and the number of web pages viewed by the user. By carrying out detailed analysis on each index in the website traffic log data, the behavior characteristics of the user can be found and further described in the form of images. As shown in fig. 3a, access time is based on interfaceEstablishing a planar space-time trajectory coordinate system by the characteristics and the interface characteristics, wherein a horizontal axis X of the coordinate system is a visited time line with the unit of second, the horizontal axis is interface visit time, the time of a user visiting the interface for the first time is taken as start time, the time difference between the time of subsequent visits and the time of the start interface is taken as a specific value of the horizontal axis X, and the form is that X is (0, time)2–time1,time3-time2,timeN-timeN-1). The ordinate of the coordinate system is Y-axis, and the ordinate is the interface index identification. In the form of Y ═ indexurlVector(url1),inexurlVector(url2),……,indexurlVector(urlN)}. And inputting the flow log data of the flow to be identified into a space-time trajectory coordinate system, thereby generating space-time trajectory points in one-to-one correspondence to form a space-time trajectory graph. The horizontal and vertical axes of the trace plot correspond to the length and width of the image, respectively. The gray value of the image space-time track point corresponds to the quantity value of accessing the same interface within one second unit, and the larger the quantity is, the whiter the image of the space-time track point is. For normal behaviors, the types of accessed interfaces are more, and no obvious regular characteristic exists, so that the space-time track points are difficult to form obvious tracks. But the opposite is true for abnormal behavior, as shown in fig. 3b, a spatiotemporal trajectory graph with more distinct features can be formed. Therefore, a space-time trajectory coordinate system is established according to the flow log data and the interface access time characteristics and the interface index identification, and is further converted into a visual space-time trajectory image, so that the abstract characteristics hidden in the text information are converted into clear and visual image characteristics, more obvious identification characteristics are obtained from an image level, and a foundation is laid for improving the flow identification capability and the accuracy of an identification result.
And step S120, inputting the space-time trajectory diagram into the flow identification model for identification to obtain an output identification result.
And (3) carrying out a large amount of deep training on the flow identification model in advance, then inputting the space-time trajectory diagram into the trained flow identification model for identification operation according to actual requirements, and finally judging whether the flow is abnormal or not. Therefore, the traffic recognition model is trained in a large amount by means of a deep learning method, the abnormal traffic recognition work is completed through the traffic recognition model, and the method has high working efficiency and recognition accuracy.
The method shown in fig. 1 can be seen in that a space-time trajectory coordinate system can be constructed through characteristics such as flow logs and interface access time, a space-time trajectory graph is generated, and finally, more remarkable identification characteristics can be obtained from an image level according to the space-time trajectory graph, so that the flow identification capability and the accuracy of an identification result are improved, the identification error rate is reduced, the user experience is improved, and meanwhile, a good identification effect is achieved. And the traffic identification model does not depend on service data, and has higher identification efficiency, stronger generalization capability, longer life cycle and universality and better identification effect.
In an embodiment of the present invention, the method further includes: extracting a normal data set and an abnormal data set from the sample flow log data; generating a space-time trajectory graph for each piece of flow log data in the normal data set and the abnormal data set according to a space-time trajectory coordinate system respectively to obtain a sample space-time trajectory graph set; and training according to the sample space-time trajectory graph set to obtain a flow identification model.
The information in the traffic log data is rich, including traffic logs for normal and abnormal behavior. Therefore, for better model training effect and training efficiency, the sample flow log data can be classified in advance, and the flow log records of normal behavior and abnormal behavior are divided into different data sets to form a normal data set and an abnormal data set. And then generating a space-time trajectory graph respectively according to the space-time trajectory coordinate system aiming at the normal data set and the abnormal data set, and finally obtaining the space-time trajectory graph set of the normal data set and the space-time trajectory set of the abnormal data set in the sample. When the flow model is trained in advance, the corresponding space-time trajectory diagram can be used as training data in a targeted manner. Therefore, the disordered user flow log data is classified and processed to form the corresponding space-time trajectory atlas, the corresponding space-time trajectory atlas can be selected in a targeted manner according to different specific requirements to train the flow recognition model, the activity and the training efficiency of the model training landform are improved, and the better training effect is achieved.
In an embodiment of the present invention, the extracting the normal data set and the abnormal data set from the sample traffic log data in the method includes: and according to the content of the flow request, taking the flow log data containing the preset normal behavior as normal data, and taking the flow log data containing the preset abnormal behavior as abnormal data.
And judging whether the user behavior is abnormal behavior or not, and determining according to the content of the user flow request. According to the content requested by the user traffic, for example, multiple requests, a large number of abnormal downloads, malicious scanning or virus compliance with offensive features are performed on the same webpage in a short time as preset conditions, and whether the abnormal behavior data belong to the abnormal behavior data can be determined by taking the abnormal behavior features as the preset conditions in a customized manner. Therefore, the sample flow log data can be further distinguished into normal data and abnormal data according to certain preset conditions.
In an embodiment of the present invention, the obtaining of the flow identification model according to the training of the sample spatio-temporal trajectory atlas set includes: training based on a preset convolutional neural network basic model and a sample space-time trajectory graph set to obtain a flow identification model; the preset convolutional neural network basic model comprises three convolutional layers, three pooling layers and three full-connection layers.
The convolution neural network is one kind of artificial neural network, belongs to feedforward neural network, and artificial neuron may respond to peripheral unit and is used mainly in identifying displacement, zooming and other form of distortion invariant two-dimensional image for large image processing. The convolution network has many advantages in image processing compared with the general neural network, such as the topological structures of the input image and the network can be well matched; feature extraction and pattern classification are carried out simultaneously and generated in training; the weight sharing can reduce the training parameters of the network, so that the neural network structure becomes simpler, the adaptability is stronger, and the like, and the method is widely applied. A convolutional neural network and a sample space-time trajectory atlas are used for carrying out massive training on the flow identification model, so that the complexity of the network model is reduced, the number of weights is reduced, an image can be directly used as the input of the network, and the complex characteristic extraction and data reconstruction process in the traditional identification algorithm is avoided. Fig. 4 shows a deep learning network architecture diagram of a traffic recognition model according to an embodiment of the present application, which includes three convolutional layers, three pooling layers, and three fully-connected layers. The convolutional layer is responsible for extracting features, the pooling layer is responsible for selecting features, and the full-connection layer is responsible for classification. The input space-time trajectory graph with the pixel size of 500 × 500 firstly enters an input layer, and then is convolved by the pixel value size of 5 × 5 to obtain C1, namely 32 characteristic fragments with the pixel value size of 500 × 500; then, the obtained product enters a pooling layer to be pooled according to the pixel value of 5 by 5, and S2, namely the feature fragment of 32 is obtained, wherein the pixel value is 100 by 100; then, the data enters a convolution layer and is convoluted by the size of a pixel value of 5 × 5 to obtain C3, namely 64 characteristic fragments, wherein the size of the pixel value is 100 × 100; then, the obtained product enters a pooling layer to be pooled according to the pixel value of 5 by 5, and S4, namely 64 feature fragments are obtained, wherein the pixel value is 20 by 20; then, the image enters a convolution layer and is convoluted by the size of a pixel value of 3 × 3 to obtain C5, namely 32 characteristic fragments, and the size of the pixel value is 20 × 20; then, the obtained object enters a pooling layer to be pooled according to the pixel value of 3 × 3, and S6, namely 32 feature slices are obtained, wherein the pixel value is 6 × 6; and finally, entering a full connection layer F7 to obtain 1152 classifications, entering a full connection layer F8 to obtain 128 classifications, and finally entering an output layer F9 to obtain 2 classifications.
In an embodiment of the present invention, the obtaining of the flow identification model according to the training of the sample spatio-temporal trajectory atlas set includes: carrying out disorder processing on each space-time trajectory diagram in the sample space-time trajectory diagram set, wherein the space-time trajectory diagram with the first proportion in the sample space-time trajectory diagram is not repeatedly sampled and is used as a training set, the space-time trajectory diagram with the second proportion is used as a verification set, and the space-time trajectory diagram with the third proportion is used as a test set; the sum of the first proportion, the second proportion and the third proportion is 100%.
And mixing the normal data set and the abnormal data set in the sample, and scrambling the sequence to obtain a non-repeated sampling sample space-time trajectory atlas. In order to further improve the training effect of the flow identification model, the data quantity of the non-oversampling sample space-time trajectory diagram set can be divided into different proportions according to different functions, for example, 80% of the data in the non-oversampling sample space-time trajectory diagram set is used as a training set, 10% of the data is used as a verification set, and 10% of the data is used as a test set, and the sum of the three is 100%. Therefore, the non-repeated sampling sample space-time trajectory atlas formed by dividing a certain proportion is used as training data of the flow identification model, the flow identification model can be trained more flexibly and pertinently, and better training efficiency and training effect are achieved.
In an embodiment of the present invention, the method further includes: generating an interface word vector according to the normal data set; and determining the interface index identification of each interface according to the interface word vector.
When the abnormal user requests the traffic, the abnormal user may make continuous requests for specific interfaces, such as repeated continuous requests for interfaces such as page pictures and login interfaces. In order to more completely acquire the interface data condition, the interface word vector can be acquired according to the normal data set, and then the interface index identifier of each interface is determined according to the acquired interface word vector. In this way, the data of the longitudinal axis of the space-time trajectory coordinate system can be completely determined.
In one embodiment of the invention, the size of the spatiotemporal trajectory graph is preset in the method according to any of the above.
For better subsequent identification of spatio-temporal trajectory graphs by the model, the length and width of each trajectory graph can be set to the same specification. Therefore, the identification effect and the identification efficiency of the flow identification model can be improved.
Fig. 2 shows a schematic structural diagram of a flow rate identification device according to an embodiment of the present application. As shown in fig. 2, the flow rate recognition apparatus 200 includes:
a spatiotemporal trajectory map generating unit 210 for generating a spatiotemporal trajectory map according to the flow log data of the flow to be identified and the spatiotemporal trajectory coordinate system; the abscissa of the space-time trajectory coordinate system is interface access time, and the ordinate of the space-time trajectory coordinate system is interface index identification.
After the user accesses the website platform, corresponding recorded information can be left in the flow log data. The traffic log data may include indicators of the number of users of the website, the access time of the user, and the number of web pages viewed by the user. By carrying out detailed analysis on each index in the website traffic log data, the behavior characteristics of the user can be found and further described in the form of images. As shown in fig. 3a, a planar space-time trajectory coordinate system is established according to the interface access time characteristics and the interface characteristics, wherein the horizontal axis X of the coordinate system is a time line of access in seconds, the horizontal axis is the interface access time, the time of the user accessing the interface for the first time is taken as the start time, the time difference between the time of the subsequent access and the time of the start interface is taken as the specific value of the horizontal axis X, and the form is (0, time) ═ X2–time1,time3-time2,timeN-timeN-1). The ordinate of the coordinate system is Y-axis, and the ordinate is the interface index identification. In the form of Y ═ indexurlVector(url1),inexurlVector(url2),……,indexurlVector(urlN)}. And inputting the flow log data of the flow to be identified into a space-time trajectory coordinate system, thereby generating space-time trajectory points in one-to-one correspondence to form a space-time trajectory graph. The horizontal and vertical axes of the trace plot correspond to the length and width of the image, respectively. The gray value of the image space-time track point corresponds to the quantity value of accessing the same interface within one second unit, and the larger the quantity is, the whiter the image of the space-time track point is. For normal behaviors, the types of accessed interfaces are more, and no obvious regular characteristic exists, so that the space-time track points are difficult to form obvious tracks. But the opposite is true for abnormal behavior, as shown in fig. 3b, a spatiotemporal trajectory graph with more distinct features can be formed. Therefore, a space-time track coordinate system is established according to the flow log data and the interface access time characteristics and the interface index identification and is further converted into a visual space-time track image, so that the abstract characteristics hidden in the text information are converted into clear and visual image characteristics, more obvious identification characteristics are obtained from an image layer, and the flow identification capability and the identification result are improvedAnd the accuracy lays a foundation.
And the identification unit 220 is used for inputting the space-time trajectory diagram into the flow identification model for identification to obtain an output identification result.
And (3) carrying out a large amount of deep training on the flow identification model in advance, then inputting the space-time trajectory diagram into the trained flow identification model for identification operation according to actual requirements, and finally judging whether the flow is abnormal or not. Therefore, the traffic recognition model is trained in a large amount by means of a deep learning method, the abnormal traffic recognition work is completed through the traffic recognition model, and the method has high working efficiency and recognition accuracy.
It can be seen that the device shown in fig. 2 can construct a space-time trajectory coordinate system through characteristics such as traffic logs, interface access time and the like, generate a space-time trajectory graph, and finally acquire more significant identification characteristics from an image level according to the space-time trajectory graph, so that the traffic identification capability and the accuracy of an identification result are improved, the identification error rate is reduced, the user experience is improved, and a good identification effect is achieved. And the traffic identification model does not depend on service data, and has higher identification efficiency, stronger generalization capability, longer life cycle and universality and better identification effect.
In an embodiment of the present invention, in the above apparatus, the spatiotemporal trajectory graph generating unit 210 is further configured to extract a normal data set and an abnormal data set from the sample flow log data; generating a space-time trajectory graph for each piece of flow log data in the normal data set and the abnormal data set according to a space-time trajectory coordinate system respectively to obtain a sample space-time trajectory graph set; and training according to the sample space-time trajectory graph set to obtain a flow identification model.
The information in the traffic log data is rich, including traffic logs for normal and abnormal behavior. Therefore, for better model training effect and training efficiency, the sample flow log data can be classified in advance, and the flow log records of normal behavior and abnormal behavior are divided into different data sets to form a normal data set and an abnormal data set. And then generating a space-time trajectory graph respectively according to the space-time trajectory coordinate system aiming at the normal data set and the abnormal data set, and finally obtaining the space-time trajectory graph set of the normal data set and the space-time trajectory set of the abnormal data set in the sample. When the flow model is trained in advance, the corresponding space-time trajectory diagram can be used as training data in a targeted manner. Therefore, the disordered user flow log data is classified and processed to form the corresponding space-time trajectory atlas, the corresponding space-time trajectory atlas can be selected in a targeted manner according to different specific requirements to train the flow recognition model, the activity and the training efficiency of the model training landform are improved, and the better training effect is achieved.
In an embodiment of the present invention, in the above apparatus, the spatio-temporal trajectory graph generating unit 210 is configured to use, according to the content of the traffic request, traffic log data containing a preset normal behavior as normal data, and traffic log data containing a preset abnormal behavior as abnormal data.
And judging whether the user behavior is abnormal behavior or not, and determining according to the content of the user flow request. According to the content requested by the user traffic, for example, multiple requests, a large number of abnormal downloads, malicious scanning or virus compliance with offensive features are performed on the same webpage in a short time as preset conditions, and whether the abnormal behavior data belong to the abnormal behavior data can be determined by taking the abnormal behavior features as the preset conditions in a customized manner. Therefore, the sample flow log data can be further distinguished into normal data and abnormal data according to certain preset conditions.
In an embodiment of the present invention, in the above apparatus, the spatiotemporal trajectory map generating unit 210 is configured to obtain a traffic identification model based on a preset convolutional neural network base model and a sample spatiotemporal trajectory map set training; the preset convolutional neural network basic model comprises three convolutional layers, three pooling layers and three full-connection layers.
The convolution neural network is one kind of artificial neural network, belongs to feedforward neural network, and artificial neuron may respond to peripheral unit and is used mainly in identifying displacement, zooming and other form of distortion invariant two-dimensional image for large image processing. The convolution network has many advantages in image processing compared with the general neural network, such as the topological structures of the input image and the network can be well matched; feature extraction and pattern classification are carried out simultaneously and generated in training; the weight sharing can reduce the training parameters of the network, so that the neural network structure becomes simpler, the adaptability is stronger, and the like, and the method is widely applied. A convolutional neural network and a sample space-time trajectory atlas are used for carrying out massive training on the flow identification model, so that the complexity of the network model is reduced, the number of weights is reduced, an image can be directly used as the input of the network, and the complex characteristic extraction and data reconstruction process in the traditional identification algorithm is avoided.
In an embodiment of the present invention, in the above apparatus, the spatiotemporal trajectory map generation unit 210 is configured to perform disorder processing on each spatiotemporal trajectory map in the sample spatiotemporal trajectory map set, and not repeatedly sample the spatiotemporal trajectory map with a first proportion in the sample spatiotemporal trajectory map as a training set, the spatiotemporal trajectory map with a second proportion as a verification set, and the spatiotemporal trajectory map with a third proportion as a test set; the sum of the first proportion, the second proportion and the third proportion is 100%.
And mixing the normal data set and the abnormal data set in the sample, and scrambling the sequence to obtain a non-repeated sampling sample space-time trajectory atlas. In order to further improve the training effect of the flow identification model, the data quantity of the non-oversampling sample space-time trajectory diagram set can be divided into different proportions according to different functions, for example, 80% of the data in the non-oversampling sample space-time trajectory diagram set is used as a training set, 10% of the data is used as a verification set, and 10% of the data is used as a test set, and the sum of the three is 100%. Therefore, the non-repeated sampling sample space-time trajectory atlas formed by dividing a certain proportion is used as training data of the flow identification model, the flow identification model can be trained more flexibly and pertinently, and better training efficiency and training effect are achieved.
In an embodiment of the present invention, in the above apparatus, the space-time trajectory diagram generating unit 210 is further configured to generate an interface word vector according to the normal data set; and determining the interface index identification of each interface according to the interface word vector.
When the abnormal user requests the traffic, the abnormal user may make continuous requests for specific interfaces, such as repeated continuous requests for interfaces such as page pictures and login interfaces. In order to more completely acquire the interface data condition, the interface word vector can be acquired according to the normal data set, and then the interface index identifier of each interface is determined according to the acquired interface word vector. In this way, the data of the longitudinal axis of the space-time trajectory coordinate system can be completely determined.
In one embodiment of the present invention, the size of the spatiotemporal trajectory diagrams is preset in the above apparatus.
For better subsequent identification of spatio-temporal trajectory graphs by the model, the length and width of each trajectory graph can be set to the same specification. Therefore, the identification effect and the identification efficiency of the flow identification model can be improved.
In summary, according to the technical scheme of the application, a space-time trajectory graph is generated according to the flow log data of the flow to be identified and a space-time trajectory coordinate system; the abscissa of the space-time trajectory coordinate system is interface access time, and the ordinate of the space-time trajectory coordinate system is interface index identification; and inputting the space-time trajectory graph into a flow identification model for identification to obtain an output identification result. The method has the advantages that a space-time trajectory coordinate system is constructed through characteristics such as flow logs, interface access time and the like, a space-time trajectory graph is generated, and finally, more remarkable identification characteristics are obtained from an image level according to the space-time trajectory graph, so that flow identification capacity and accuracy of an identification result are improved, the identification error rate is reduced, user experience is improved, and meanwhile a good identification effect is achieved. And the traffic identification model does not depend on service data, and has higher identification efficiency, stronger generalization capability, longer life cycle and universality and better identification effect.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a flow identification device according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
For example, fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 500 comprises a processor 510 and a memory 520 arranged to store computer executable instructions (computer readable program code). The memory 520 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 520 has a storage space 530 storing computer readable program code 531 for performing any of the method steps in the above described method. For example, the storage space 530 for storing the computer readable program code may include respective computer readable program codes 531 for respectively implementing various steps in the above method. The computer readable program code 531 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 6. FIG. 6 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 600 has stored thereon a computer readable program code 531 for performing the steps of the method according to the application, readable by the processor 510 of the electronic device 500, which computer readable program code 531, when executed by the electronic device 500, causes the electronic device 500 to perform the steps of the method described above, in particular the computer readable program code 531 stored on the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 531 may be compressed in a suitable form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (9)

1. A traffic identification method, characterized in that the method comprises:
generating a space-time trajectory graph according to the flow log data of the flow to be identified and a space-time trajectory coordinate system; the abscissa of the space-time trajectory coordinate system is interface access time, and the ordinate of the space-time trajectory coordinate system is interface index identification;
inputting the space-time trajectory graph into a flow identification model for identification to obtain an output identification result;
the method further comprises the following steps:
extracting a normal data set and an abnormal data set from the sample flow log data;
generating a space-time trajectory graph for each piece of flow log data in the normal data set and the abnormal data set according to the space-time trajectory coordinate system respectively to obtain a sample space-time trajectory graph set;
and training according to the sample space-time trajectory graph set to obtain the flow identification model.
2. The method of claim 1, wherein extracting the normal data set and the abnormal data set from the sample traffic log data comprises:
and according to the content of the flow request, taking the flow log data containing the preset normal behavior as normal data, and taking the flow log data containing the preset abnormal behavior as abnormal data.
3. The method of claim 1, wherein the training the traffic recognition model from the set of sample spatiotemporal trajectories comprises:
training based on a preset convolutional neural network basic model and the sample space-time trajectory graph set to obtain the flow identification model; the preset convolutional neural network basic model comprises three convolutional layers, three pooling layers and three full-connection layers.
4. The method of claim 1, wherein the training the traffic recognition model from the set of sample spatiotemporal trajectories comprises:
carrying out disorder processing on each space-time trajectory diagram in the sample space-time trajectory diagram set, wherein the space-time trajectory diagram with a first proportion in the sample space-time trajectory diagram is not repeatedly sampled and is used as a training set, the space-time trajectory diagram with a second proportion is used as a verification set, and the space-time trajectory diagram with a third proportion is used as a test set; the sum of the first proportion, the second proportion and the third proportion is 100%.
5. The method of claim 1, further comprising:
generating an interface word vector according to the normal data set;
and determining the interface index identification of each interface according to the interface word vector.
6. The method of any one of claims 1-5, wherein the size of the spatiotemporal trajectory graph is preset.
7. An abnormal flow rate recognition apparatus, characterized in that the apparatus comprises:
the space-time trajectory graph generating unit is used for generating a space-time trajectory graph according to the flow log data of the flow to be identified and a space-time trajectory coordinate system; the abscissa of the space-time trajectory coordinate system is interface access time, and the ordinate of the space-time trajectory coordinate system is interface index identification;
the identification unit is used for inputting the space-time trajectory diagram into a flow identification model for identification to obtain an output identification result;
the space-time trajectory graph generating unit is also used for extracting a normal data set and an abnormal data set from the sample flow log data;
generating a space-time trajectory graph for each piece of flow log data in the normal data set and the abnormal data set according to the space-time trajectory coordinate system respectively to obtain a sample space-time trajectory graph set;
and training according to the sample space-time trajectory graph set to obtain the flow identification model.
8. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the method of any one of claims 1-6.
9. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-6.
CN201911059598.0A 2019-11-01 2019-11-01 Traffic identification method and device Active CN111314161B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911059598.0A CN111314161B (en) 2019-11-01 2019-11-01 Traffic identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911059598.0A CN111314161B (en) 2019-11-01 2019-11-01 Traffic identification method and device

Publications (2)

Publication Number Publication Date
CN111314161A CN111314161A (en) 2020-06-19
CN111314161B true CN111314161B (en) 2022-01-28

Family

ID=71159637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911059598.0A Active CN111314161B (en) 2019-11-01 2019-11-01 Traffic identification method and device

Country Status (1)

Country Link
CN (1) CN111314161B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113992349B (en) * 2021-09-23 2023-05-19 云南财经大学 Malicious traffic identification method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106790019A (en) * 2016-12-14 2017-05-31 北京天融信网络安全技术有限公司 The encryption method for recognizing flux and device of feature based self study
CN107819745A (en) * 2017-10-25 2018-03-20 北京京东尚科信息技术有限公司 The defence method and device of abnormal flow
CN109995601A (en) * 2017-12-29 2019-07-09 中国移动通信集团上海有限公司 A kind of network flow identification method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11855850B2 (en) * 2017-04-25 2023-12-26 Nutanix, Inc. Systems and methods for networked microservice modeling and visualization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106790019A (en) * 2016-12-14 2017-05-31 北京天融信网络安全技术有限公司 The encryption method for recognizing flux and device of feature based self study
CN107819745A (en) * 2017-10-25 2018-03-20 北京京东尚科信息技术有限公司 The defence method and device of abnormal flow
CN109995601A (en) * 2017-12-29 2019-07-09 中国移动通信集团上海有限公司 A kind of network flow identification method and device

Also Published As

Publication number Publication date
CN111314161A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN110275958B (en) Website information identification method and device and electronic equipment
CN108595583B (en) Dynamic graph page data crawling method, device, terminal and storage medium
CN103530365B (en) Obtain the method and system of the download link of resource
CN107360137A (en) Construction method and device for the neural network model of identifying code identification
CN111680698A (en) Image recognition method and device and training method and device of image recognition model
CN110263680B (en) Image processing method, device and system and storage medium
CN103618696B (en) Method and server for processing cookie information
CN107688743A (en) The determination method and system of a kind of rogue program
CN110365691B (en) Phishing website distinguishing method and device based on deep learning
CN104202291A (en) Anti-phishing method based on multi-factor comprehensive assessment method
CN106650433A (en) Detecting method and system for abnormal behavior
CN108235122A (en) The monitoring method and device of video ads
CN108875517A (en) Method for processing video frequency, device and system and storage medium
CN110516691A (en) A kind of Vehicular exhaust detection method and device
CN113268641B (en) User data processing method based on big data and big data server
CN107888606A (en) A kind of domain name credit assessment and system
CN111314161B (en) Traffic identification method and device
CN110675252A (en) Risk assessment method and device, electronic equipment and storage medium
CN109784059B (en) Trojan file tracing method, system and equipment
CN103618761B (en) Method and browser for processing cookie information
CN103605670B (en) A kind of method and apparatus for determining the crawl frequency of network resource point
CN111953665A (en) Server attack access identification method and system, computer equipment and storage medium
CN111160797A (en) Wind control model construction method and device, storage medium and terminal
CN110427971A (en) Recognition methods, device, server and the storage medium of user and IP
CN108171053B (en) Rule discovery method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20221018

Address after: 100102 Room 01, Floor 3, Room 01, Building 2 to 4, Yard 6, Wangjing East Road, Chaoyang District, Beijing

Patentee after: Beijing three cloud computing Co.,Ltd.

Patentee after: BEIJING SANKUAI ONLINE TECHNOLOGY Co.,Ltd.

Address before: 2106-030, No.9, Beisihuan West Road, Haidian District, Beijing 100190

Patentee before: BEIJING SANKUAI ONLINE TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right