CN117220915A - Flow analysis method and device and electronic equipment - Google Patents

Flow analysis method and device and electronic equipment Download PDF

Info

Publication number
CN117220915A
CN117220915A CN202311036077.XA CN202311036077A CN117220915A CN 117220915 A CN117220915 A CN 117220915A CN 202311036077 A CN202311036077 A CN 202311036077A CN 117220915 A CN117220915 A CN 117220915A
Authority
CN
China
Prior art keywords
classifiers
classifier
flow
data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311036077.XA
Other languages
Chinese (zh)
Inventor
李易聪
王丽芳
周涛华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202311036077.XA priority Critical patent/CN117220915A/en
Publication of CN117220915A publication Critical patent/CN117220915A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a flow analysis method, a flow analysis device and electronic equipment, wherein the flow analysis method comprises the following steps: obtaining flow data to be detected, and obtaining K training sample data, wherein K is an integer greater than 1; inputting the K training sample data into a fusion model, and outputting K classification results, wherein the fusion model comprises S classifiers, and each classification result comprises: s classification sub-results corresponding to the S classifiers; performing accuracy rate calculation on K classification sub-results corresponding to each classifier in the K classification results to obtain S accuracy rate values corresponding to the S classifiers; determining target classifiers in the S classifiers according to the S accuracy values corresponding to the S classifiers; and inputting the flow data to be detected into a target classifier in the fusion model, and outputting a target classification result.

Description

Flow analysis method and device and electronic equipment
Technical Field
The present application relates to the field of data application technologies, and in particular, to a flow analysis method, a flow analysis device, and an electronic device.
Background
In the age of the rapid development of internet technology, the growth of internet users is in an exponential scale. However, today the internet protocol version 6 (Internet Protocol Version, ipv 6) network security environment is not optimistic. The traditional Internet threat has a development trend towards the diffusion of industrial control systems, the network attack behavior has a polymorphic development, and the attack range is spread over various industries. However, in the data processing process in the prior art, the application scene suitable for single classification is single, the output result is greatly fluctuated by external factors, and the data processing result is easy to be inaccurate.
Disclosure of Invention
The embodiment of the application provides a flow analysis method, a flow analysis device and electronic equipment, which are used for solving the problem that the existing data processing method is easy to cause inaccurate data processing results.
In a first aspect, an embodiment of the present application provides a flow analysis method, where the method includes:
obtaining flow data to be detected, and obtaining K training sample data, wherein K is an integer greater than 1;
inputting the K training sample data into a fusion model, and outputting K classification results, wherein the fusion model comprises S classifiers, and each classification result comprises: s classification sub-results corresponding to the S classifiers, wherein S is an integer greater than 1;
performing accuracy rate calculation on K classification sub-results corresponding to each classifier in the K classification results to obtain S accuracy rate values corresponding to the S classifiers;
determining target classifiers in the S classifiers according to the S accuracy values corresponding to the S classifiers;
and inputting the flow data to be detected into a target classifier in the fusion model, and outputting a target classification result.
Optionally, the acquiring K training sample data includes:
acquiring N training sample data, wherein N is an integer greater than K;
respectively carrying out similarity calculation on the flow data to be detected and the N training sample data to obtain N similarity values corresponding to the N training sample data;
selecting K similarity values from the N similarity values according to the sequence from large to small;
and determining the K training sample data according to the K similarity values.
Optionally, the calculating the accuracy of the K classification sub-results corresponding to each classifier in the K classification results to obtain S accuracy values corresponding to the S classifiers includes:
performing accuracy rate calculation on K classification sub-results corresponding to a first classifier in the K classification results to obtain an accuracy rate value corresponding to the first classifier, wherein the first classifier is any one classifier in the S classifiers;
and obtaining S accuracy values corresponding to the S classifiers according to the accuracy values corresponding to the first classifier.
Optionally, the determining, according to the S accuracy values corresponding to the S classifiers, the target classifier in the S classifiers includes:
and determining the classifier corresponding to the highest value in the S accuracy values as a target classifier according to the S accuracy values corresponding to the S classifiers.
Optionally, the flow data to be measured is input into a target classifier in the fusion model, and after a target classification result is output, the method further includes:
under the condition that the target classification result meets a preset condition, determining that the flow data to be detected is abnormal flow data;
and under the condition that the target classification result does not meet the preset condition, determining the flow data to be detected as normal flow data.
Optionally, the method further comprises:
obtaining a target network flow;
preprocessing the target network flow to obtain an internet protocol version 6 IPV6 flow;
extracting the characteristics of the IPV6 flow through a network measuring tool package to obtain full-attribute flow characteristics;
performing feature selection processing on the full-attribute flow features to obtain feature vector data;
the target network traffic is network traffic to be tested or N traffic training samples; the feature vector data is the flow data to be detected under the condition that the target network flow is the network flow to be detected; and under the condition that the target network traffic is the N traffic training samples, the feature vector data is the N training sample data.
Optionally, the S classifiers include: support Vector Machine (SVM) classifier, neighbor algorithm (KNN) classifier and Sparse Representation (SRC) classifier.
In a second aspect, an embodiment of the present application further provides a flow analysis device, including:
the first acquisition module is used for acquiring flow data to be detected and K training sample data, wherein K is an integer greater than 1;
the first processing module is used for inputting the K training sample data into a fusion model and outputting K classification results, the fusion model comprises S classifiers, and each classification result comprises: s classification sub-results corresponding to the S classifiers, wherein S is an integer greater than 1;
the second processing module is used for calculating the accuracy of K classification sub-results corresponding to each classifier in the K classification results to obtain S accuracy values corresponding to the S classifiers;
the first determining module is used for determining target classifiers in the S classifiers according to the S accuracy values corresponding to the S classifiers;
and the third processing module is used for inputting the flow data to be detected into the target classifier in the fusion model and outputting a target classification result.
In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a transceiver, and a processor:
a memory for storing a computer program; a transceiver for transceiving data under the control of the processor; and a processor for reading the computer program in the memory and performing the flow analysis method as described above.
In a fourth aspect, an embodiment of the present application further provides a processor readable storage medium, where a computer program is stored, where the computer program is configured to cause a processor to execute the above-mentioned flow analysis method.
According to the embodiment of the application, the flow data to be detected and K training sample data are obtained, the K training sample data are input into a fusion model comprising S classifiers, K classification results are output, accuracy rate calculation is carried out on K classification sub-results corresponding to each classifier in the K classification results, S accuracy rate values corresponding to the S classifiers are obtained, and a target classifier in the S classifiers is determined according to the S accuracy rate values corresponding to the S classifiers; and inputting the flow data to be detected into a target classifier in the fusion model, and outputting a target classification result. According to the scheme, the S classifiers are integrated into the machine learning fusion model, the S classifiers in the fusion model are used for outputting the classification results, the accuracy of the output results of each classifier is calculated to judge which classifier is more accurate, and the classifier with higher accuracy is used for analyzing the flow of the flow data to be detected, so that the accuracy of the classification results can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart illustrating steps of a flow analysis method according to an embodiment of the present application;
FIG. 2 is a block diagram of a flow analysis device according to an embodiment of the present application;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the embodiment of the application, the term "and/or" describes the association relation of the association objects, which means that three relations can exist, for example, a and/or B can be expressed as follows: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The term "plurality" in embodiments of the present application means two or more, and other adjectives are similar.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Specifically, an embodiment of the present application provides a flow analysis method, as shown in fig. 1, which specifically may include the following steps:
step 101, obtaining flow data to be detected, and obtaining K training sample data, wherein K is an integer greater than 1.
In the step 101, first, the flow data to be measured is obtained, and K training sample data similar to the flow data to be measured are obtained according to the flow data to be measured.
102, inputting the K training sample data into a fusion model, and outputting K classification results, wherein the fusion model comprises S classifiers, and each classification result comprises: and S classification sub-results corresponding to the S classifiers, wherein S is an integer greater than 1.
In step 102, after obtaining K pieces of training sample data, the K pieces of training sample data are respectively input into the fusion model, and K classification results are obtained through data analysis of the fusion model, where each classification result includes S classification sub-results, and one classification sub-result is obtained through data analysis by one classifier. In other words, a training sample data is input into the fusion model, data analysis is performed through S classifiers, each classifier outputs a classification sub-result, S classifiers output S classification sub-results, and S classification sub-results form a classification result. Thus, the K classification results include k×s classification sub-results, and each classifier corresponds to the K classification sub-results.
And 103, calculating the accuracy of K classification sub-results corresponding to each classifier in the K classification results to obtain S accuracy values corresponding to the S classifiers.
In step 103, in the K classification results, performing accuracy calculation on the K classification sub-results corresponding to each classifier to obtain an accuracy value corresponding to the classifier; and calculating S accuracy values corresponding to the S classifiers in the accuracy calculation mode.
And 104, determining target classifiers in the S classifiers according to the S accuracy values corresponding to the S classifiers.
In step 104, the accuracy of the classification of each classifier can be known through the accuracy value corresponding to the classifier, so that the target classifier in the S classifiers can be determined.
And 105, inputting the flow data to be detected into a target classifier in the fusion model, and outputting a target classification result.
In step 105, after determining the target classifier in the S classifiers, the flow to be detected is input into the fusion model, the flow data analysis is performed through the target classifier in the fusion model, the target classification result is output, and the flow data analysis is performed on the flow data to be detected by adopting the target classifier with higher accuracy, so that the accuracy of the classification result can be improved.
According to the embodiment of the application, the flow data to be detected and K training sample data are obtained, the K training sample data are input into a fusion model comprising S classifiers, K classification results are output, accuracy rate calculation is carried out on K classification sub-results corresponding to each classifier in the K classification results, S accuracy rate values corresponding to the S classifiers are obtained, and a target classifier in the S classifiers is determined according to the S accuracy rate values corresponding to the S classifiers; and inputting the flow data to be detected into a target classifier in the fusion model, and outputting a target classification result. According to the scheme, the S classifiers are integrated into the machine learning fusion model, the S classifiers in the fusion model are used for outputting the classification results, the accuracy of the output results of each classifier is calculated to judge which classifier is more accurate, and the classifier with higher accuracy is used for analyzing the flow of the flow data to be detected, so that the accuracy of the classification results can be improved.
As an optional specific embodiment of step 101, the step of obtaining the flow data to be measured may specifically include:
obtaining a target network flow;
preprocessing the target network flow to obtain an internet protocol version 6 IPV6 flow;
extracting the characteristics of the IPV6 flow through a network measuring tool package to obtain full-attribute flow characteristics;
performing feature selection processing on the full-attribute flow features to obtain feature vector data;
the target network traffic is network traffic to be measured, and the feature vector data is traffic data to be measured.
Specifically, under the condition that the target network traffic is the network traffic to be measured, the IPv6 component in the network traffic to be measured is extracted through the address length and the address representation method, and the IPV6 traffic of the network traffic to be measured is obtained. And extracting the characteristics of parameters and data types in the IPV6 flow by using a network measurement tool package netmate and using an extensible markup language (Extensible Markup Language, XML) technology to obtain the characteristics of the full-attribute flow. And then removing redundancy processing for the full-attribute flow characteristics through a characteristic selection method Filter-Wrapper, and performing normalization processing to obtain characteristic vector data.
It should be noted that, the netmate uses an XML file to configure modules in the netmate, and the acquired data packets need to be filtered. Or if the netmate uses a specific module for a specific data packet, the module to be used may be configured. netmate is a network measurement toolkit that performs packet filtering, packet segmentation, and statistical data output.
As an optional specific embodiment of step 101, the step of obtaining K training sample data may specifically include:
acquiring N training sample data, wherein N is an integer greater than K;
respectively carrying out similarity calculation on the flow data to be detected and the N training sample data to obtain N similarity values corresponding to the N training sample data;
selecting K similarity values from the N similarity values according to the sequence from large to small;
and determining the K training sample data according to the K similarity values.
Specifically, N training sample data are obtained, and after the flow data to be measured are obtained through the network flow to be measured, similarity calculation is performed on the flow data to be measured and each training data in the N training sample data, so that a similarity value corresponding to each training sample data is obtained, and the similarity value is used for representing the similarity degree of the training sample data and the flow data to be measured. After the N similarity values corresponding to the N training sample data are obtained through the similarity calculation method, the N similarity values are ranked in size, K similarity values are selected according to the sequence from large to small, and therefore K training sample data are determined and serve as data in the verification set.
The following illustrates the calculation process of the similarity value of any one of the N training sample data:
calculating a similarity value between training sample data and flow data to be tested through a cosine distance formula, wherein the formula is as follows:
wherein cos (theta) represents a similarity value between training sample data and flow data to be tested, the similarity value is between 0 and 1, and 0 and 1 are included; the closer the similarity value is to 1, the higher the similarity degree between the training sample data and the flow data to be tested is;
F 1 representing training sample data, F 1 (f 11 ,f 12 …f 1n );
F 1 (f 11 ,f 12 …f 1n ) Representing that the training sample data comprises n characteristic values;
F 2 representing flow data to be measured, F 2 (f 21 ,f 22 …f 2n );
F 2 (f 21 ,f 22 …f 2n ) And the flow data to be measured comprise n characteristic values.
In the specific calculation process of cos (theta), F is judged 1 And F 2 Whether or not it is discrete data, if F 1 And F 2 Discrete data, then
If F 1 And F 2 And calculating according to a cosine distance formula if the data are continuous data.
Further, the step of acquiring N training sample data may specifically include:
obtaining a target network flow;
preprocessing the target network flow to obtain an internet protocol version 6 IPV6 flow;
extracting the characteristics of the IPV6 flow through a network measuring tool package to obtain full-attribute flow characteristics;
performing feature selection processing on the full-attribute flow features to obtain feature vector data;
the target network traffic is N traffic training samples, and the feature vector data is the N training sample data.
Specifically, under the condition that the target network traffic is N traffic training samples, extracting IPv6 components in each traffic training sample in the N traffic training samples by an address length and address representation method to obtain IPV6 traffic of the N traffic training samples. And then, extracting the characteristics of parameters and data types in the IPV6 flow by using an XML technology by using a network measuring tool package netmate to obtain the full-attribute flow characteristics. And then removing redundancy processing on the full-attribute flow characteristics through a positive selection method Filter-Wrapper, and performing normalization processing to obtain characteristic vector data, wherein the characteristic vector data is supervised characteristic vector data.
As an optional specific embodiment of step 103, the step of calculating the accuracy of K classification sub-results corresponding to each classifier in the K classification results to obtain S accuracy values corresponding to the S classifiers may specifically include:
performing accuracy rate calculation on K classification sub-results corresponding to a first classifier in the K classification results to obtain an accuracy rate value corresponding to the first classifier, wherein the first classifier is any one classifier in the S classifiers;
and obtaining S accuracy values corresponding to the S classifiers according to the accuracy values corresponding to the first classifier.
Specifically, after K pieces of training sample data are obtained, the K pieces of training sample data are respectively input into a fusion model, data analysis processing is carried out on the training sample data through the fusion model, K classification results are obtained, and each piece of training sample data corresponds to one classification result. Because the fusion model is composed of S classifiers, each training sample data is equivalent to being respectively input into the S classifiers, and the classification sub-results corresponding to the classifier are obtained through processing of each classifier, therefore, one classification result corresponding to one training sample data is composed of S classification sub-results, and each classification sub-result corresponds to one classifier.
From the above, it can be seen that the K training sample data corresponds to K classification results, and k×s classification sub-results are included in the K classification results, that is, the K classification results include K classification sub-results corresponding to different classifiers. And carrying out accuracy calculation on K classification sub-results corresponding to the first classifier in the S classifiers to obtain an accuracy value corresponding to the first classifier. Through the calculation mode, the accuracy values corresponding to the classifiers are calculated.
As an optional specific embodiment of step 104, the step of determining the target classifier in the S classifiers according to the S accuracy values corresponding to the S classifiers may specifically include:
and determining the classifier corresponding to the highest value in the S accuracy values as a target classifier according to the S accuracy values corresponding to the S classifiers.
Specifically, after the S accuracy values corresponding to the S classifiers are obtained, the S classifiers are subjected to size sorting, so that the maximum value of the S accuracy values can be determined, the classifier corresponding to the maximum value is further determined to be the target classifier, and data analysis processing is performed on the flow data to be detected through the target classifier, so that a more accurate classification result can be obtained.
The determination of the target classifier is described by a specific example:
wherein class (F) represents the highest of the S accuracy values;
f represents any training sample data;
i represents the number of the classifier;
k represents the number of training sample data in the verification set;
y i (F) An evaluation function representing the accuracy of the classifier i in the verification set, wherein if the classifier i evaluates the training sample data F correctly, the value is 1, otherwise, the value is 0;
further, the S classifiers include, but are not limited to, the following three:
first kind: support vector machine (Support Vector Machine, SVM) classifier.
The SVM classifier linearly expresses the input high-dimensional data by using an output function through the SVR function, and maps out the high-dimensional feature space at the same time, so that in order to minimize the actual classification risk, the structure risk objective function is optimized based on the structure risk minimization principle, and finally, a corresponding regression function is obtained, and a classification result is obtained through the regression function.
Second kind: a neighbor algorithm (K-NearestNeighbor, KNN) classifier.
The KNN classifier is used for judging by using Euclidean distance between flow characteristics with the same dimension, sorting similarity values of the obtained training sample data and the flow data to be tested, obtaining K training sample data before the ranking of the similarity values, and finally determining category attributes of the flow data to be tested, namely classification results, according to the occurrence frequency of the K training sample data.
Third kind: sparse representation (Sparse Representation based Classification, SRC) classifier.
The SRC classifier takes training sample data as a base, calculates sparse representation of the flow data to be measured on all training sample data, and theoretically only coefficients on training sample data similar to the flow data to be measured are nonzero, and coefficients on training sample data of other categories are zero, so that category attributes of the flow data to be measured, namely classification results, can be obtained through positions of nonzero elements in the sparse coefficients.
In an optional specific embodiment, after the step 105 inputs the flow data to be measured into the target classifier in the fusion model and outputs a target classification result, the method may further include:
under the condition that the target classification result meets a preset condition, determining that the flow data to be detected is abnormal flow data;
and under the condition that the target classification result does not meet the preset condition, determining the flow data to be detected as normal flow data.
Specifically, the target classification result includes: the flow type of the flow data to be measured and the proportion of the flow type. And after inputting the flow data to be detected into a target classifier in the fusion model and outputting a target classification result, judging whether the target classification result meets a preset condition. The preset condition may be set such that the proportion of the flow type in the target classification result is greater than or equal to a preset threshold (e.g., 85%). If the proportion of the traffic types in the target classification result is greater than or equal to a preset threshold value, determining that the traffic data to be detected is abnormal traffic data, and determining that the corresponding network traffic to be detected is abnormal traffic. If the proportion of the traffic types in the target classification result is smaller than a preset threshold value, determining that the traffic data to be detected is normal traffic data, and determining that the corresponding network traffic to be detected is normal traffic.
In summary, in the above embodiment of the present application, the SVM, KNN, SRC classifier is fused into one fusion model, the classification result is output through three classifiers in the fusion model, and then the accuracy of the output result of each classifier is calculated to determine which classifier is more accurate, and the classifier with higher accuracy is adopted to perform flow analysis on the flow data to be measured, so that the accuracy of the classification result can be improved. Meanwhile, in order to improve the accuracy of the classification result of the classifier, redundant processing is removed from all-attribute flow characteristics, so that the data processing pressure of the later-stage classifier is reduced, and the accuracy of the classification result is further improved. The fusion model solves the problems that the scene which is good by a single model is single and the classification result is not representative, so that the oriented scene has wider adaptability.
Having described the flow analysis method provided by the embodiment of the present application, the flow analysis device provided by the embodiment of the present application will be described with reference to the accompanying drawings.
As shown in fig. 2, the embodiment of the present application further provides a flow analysis device 200, which includes:
the first obtaining module 201 is configured to obtain flow data to be tested, and obtain K pieces of training sample data, where K is an integer greater than 1;
a first processing module 202, configured to input the K training sample data into a fusion model, and output K classification results, where the fusion model includes S classifiers, and each classification result includes: s classification sub-results corresponding to the S classifiers, wherein S is an integer greater than 1;
the second processing module 203 is configured to calculate the accuracy of K classification sub-results corresponding to each classifier in the K classification results, so as to obtain S accuracy values corresponding to the S classifiers;
a first determining module 204, configured to determine a target classifier in the S classifiers according to the S accuracy values corresponding to the S classifiers;
and the third processing module 205 is configured to input the flow data to be tested into a target classifier in the fusion model, and output a target classification result.
According to the embodiment of the application, the flow data to be detected and K training sample data are obtained, the K training sample data are input into a fusion model comprising S classifiers, K classification results are output, accuracy rate calculation is carried out on K classification sub-results corresponding to each classifier in the K classification results, S accuracy rate values corresponding to the S classifiers are obtained, and a target classifier in the S classifiers is determined according to the S accuracy rate values corresponding to the S classifiers; and inputting the flow data to be detected into a target classifier in the fusion model, and outputting a target classification result. According to the scheme, the S classifiers are integrated into the machine learning fusion model, the S classifiers in the fusion model are used for outputting the classification results, the accuracy of the output results of each classifier is calculated to judge which classifier is more accurate, and the classifier with higher accuracy is used for analyzing the flow of the flow data to be detected, so that the accuracy of the classification results can be improved.
Optionally, the first obtaining module 201 is specifically configured to:
acquiring N training sample data, wherein N is an integer greater than K;
respectively carrying out similarity calculation on the flow data to be detected and the N training sample data to obtain N similarity values corresponding to the N training sample data;
selecting K similarity values from the N similarity values according to the sequence from large to small;
and determining the K training sample data according to the K similarity values.
Optionally, the second processing module 203 is specifically configured to:
performing accuracy rate calculation on K classification sub-results corresponding to a first classifier in the K classification results to obtain an accuracy rate value corresponding to the first classifier, wherein the first classifier is any one classifier in the S classifiers;
and obtaining S accuracy values corresponding to the S classifiers according to the accuracy values corresponding to the first classifier.
Optionally, the first determining module 204 is specifically configured to:
and determining the classifier corresponding to the highest value in the S accuracy values as a target classifier according to the S accuracy values corresponding to the S classifiers.
Optionally, the apparatus further includes:
the second determining module is used for determining that the flow data to be detected is abnormal flow data under the condition that the target classification result meets a preset condition;
and under the condition that the target classification result does not meet the preset condition, determining the flow data to be detected as normal flow data.
Optionally, the apparatus further includes:
the second acquisition module is used for acquiring the target network traffic;
the fourth processing module is used for preprocessing the target network flow to obtain an internet protocol 6 th edition IPV6 flow;
the first extraction module is used for extracting the characteristics of the IPV6 flow through a network measurement tool kit to obtain full-attribute flow characteristics;
the fifth processing module is used for carrying out feature selection processing on the all-attribute flow features to obtain feature vector data;
the target network traffic is network traffic to be tested or N traffic training samples; the feature vector data is the flow data to be detected under the condition that the target network flow is the network flow to be detected; and under the condition that the target network traffic is the N traffic training samples, the feature vector data is the N training sample data.
Optionally, the S classifiers include: support Vector Machine (SVM) classifier, neighbor algorithm (KNN) classifier and Sparse Representation (SRC) classifier.
In summary, in the above embodiment of the present application, the SVM, KNN, SRC classifier is fused into one fusion model, the classification result is output through three classifiers in the fusion model, and then the accuracy of the output result of each classifier is calculated to determine which classifier is more accurate, and the classifier with higher accuracy is adopted to perform flow analysis on the flow data to be measured, so that the accuracy of the classification result can be improved. Meanwhile, in order to improve the accuracy of the classification result of the classifier, redundant processing is removed from all-attribute flow characteristics, so that the data processing pressure of the later-stage classifier is reduced, and the accuracy of the classification result is further improved. The fusion model solves the problems that the scene which is good by a single model is single and the classification result is not representative, so that the oriented scene has wider adaptability.
It should be noted that, the flow analysis device provided in the embodiment of the present application can implement all the method steps implemented in the embodiment of the flow analysis method, and can achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as those of the embodiment of the method in the embodiment are omitted.
It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice. In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
As shown in fig. 3, an embodiment of the present application further provides an electronic device, including a memory 320, a transceiver 310, and a processor 300:
a memory 320 for storing a computer program;
a transceiver 310 for transceiving data under the control of the processor;
a processor 300 for reading the computer program in the memory and performing the steps of the flow analysis method as described in any of the embodiments above.
Wherein in fig. 3, a bus architecture may comprise any number of interconnected buses and bridges, and in particular, one or more processors represented by processor 300 and various circuits of memory represented by memory 320, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. Transceiver 310 may be a number of elements, including a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium, including wireless channels, wired channels, optical cables, etc. The processor 300 is responsible for managing the bus architecture and general processing, and the memory 320 may store data used by the processor 300 in performing operations.
The processor 300 may be a Central Processing Unit (CPU), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or a complex programmable logic device (Complex Programmable Logic Device, CPLD), or it may employ a multi-core architecture.
The processor is configured to execute any of the traffic analysis methods provided by the embodiments of the present application in accordance with the obtained executable instructions by calling a computer program stored in the memory. The processor and the memory may also be physically separate.
It should be noted that, the electronic device provided in the embodiment of the present application can implement all the method steps implemented in the embodiment of the flow analysis method, and can achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as those in the embodiment of the method are omitted herein.
Embodiments of the present application also provide a processor-readable storage medium storing a computer program for causing the processor to execute the above-described flow analysis method.
The processor-readable storage medium may be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic storage (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical storage (e.g., CD, DVD, BD, HVD, etc.), semiconductor storage (e.g., ROM, EPROM, EEPROM, nonvolatile storage (NAND FLASH), solid State Disk (SSD)), and the like.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-executable instructions. These computer-executable instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These processor-executable instructions may also be stored in a processor-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the processor-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These processor-executable instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A method of flow analysis, the method comprising:
obtaining flow data to be detected, and obtaining K training sample data, wherein K is an integer greater than 1;
inputting the K training sample data into a fusion model, and outputting K classification results, wherein the fusion model comprises S classifiers, and each classification result comprises: s classification sub-results corresponding to the S classifiers, wherein S is an integer greater than 1;
performing accuracy rate calculation on K classification sub-results corresponding to each classifier in the K classification results to obtain S accuracy rate values corresponding to the S classifiers;
determining target classifiers in the S classifiers according to the S accuracy values corresponding to the S classifiers;
and inputting the flow data to be detected into a target classifier in the fusion model, and outputting a target classification result.
2. The method of claim 1, wherein the obtaining K training sample data comprises:
acquiring N training sample data, wherein N is an integer greater than K;
respectively carrying out similarity calculation on the flow data to be detected and the N training sample data to obtain N similarity values corresponding to the N training sample data;
selecting K similarity values from the N similarity values according to the sequence from large to small;
and determining the K training sample data according to the K similarity values.
3. The method of claim 1, wherein the calculating the accuracy of K classification sub-results corresponding to each classifier from the K classification results to obtain S accuracy values corresponding to the S classifiers includes:
performing accuracy rate calculation on K classification sub-results corresponding to a first classifier in the K classification results to obtain an accuracy rate value corresponding to the first classifier, wherein the first classifier is any one classifier in the S classifiers;
and obtaining S accuracy values corresponding to the S classifiers according to the accuracy values corresponding to the first classifier.
4. The method of claim 1, wherein determining the target classifier of the S classifiers based on the S accuracy values corresponding to the S classifiers comprises:
and determining the classifier corresponding to the highest value in the S accuracy values as a target classifier according to the S accuracy values corresponding to the S classifiers.
5. The method according to claim 1, wherein the inputting the flow data to be measured into the target classifier in the fusion model, and after outputting a target classification result, the method further comprises:
under the condition that the target classification result meets a preset condition, determining that the flow data to be detected is abnormal flow data;
and under the condition that the target classification result does not meet the preset condition, determining the flow data to be detected as normal flow data.
6. The method according to claim 2, wherein the method further comprises:
obtaining a target network flow;
preprocessing the target network flow to obtain an internet protocol version 6 IPV6 flow;
extracting the characteristics of the IPV6 flow through a network measuring tool package to obtain full-attribute flow characteristics;
performing feature selection processing on the full-attribute flow features to obtain feature vector data;
the target network traffic is network traffic to be tested or N traffic training samples; the feature vector data is the flow data to be detected under the condition that the target network flow is the network flow to be detected; and under the condition that the target network traffic is the N traffic training samples, the feature vector data is the N training sample data.
7. The method of claim 1, wherein the S classifiers comprise: support Vector Machine (SVM) classifier, neighbor algorithm (KNN) classifier and Sparse Representation (SRC) classifier.
8. A flow analysis device, the device comprising:
the first acquisition module is used for acquiring flow data to be detected and K training sample data, wherein K is an integer greater than 1;
the first processing module is used for inputting the K training sample data into a fusion model and outputting K classification results, the fusion model comprises S classifiers, and each classification result comprises: s classification sub-results corresponding to the S classifiers, wherein S is an integer greater than 1;
the second processing module is used for calculating the accuracy of K classification sub-results corresponding to each classifier in the K classification results to obtain S accuracy values corresponding to the S classifiers;
the first determining module is used for determining target classifiers in the S classifiers according to the S accuracy values corresponding to the S classifiers;
and the third processing module is used for inputting the flow data to be detected into the target classifier in the fusion model and outputting a target classification result.
9. An electronic device comprising a memory, a transceiver, and a processor:
a memory for storing a computer program; a transceiver for transceiving data under control of the processor; a processor for reading the computer program in the memory and performing the flow analysis method according to any one of claims 1 to 7.
10. A processor-readable storage medium, characterized in that the processor-readable storage medium stores a computer program for causing the processor to execute the flow analysis method according to any one of claims 1 to 7.
CN202311036077.XA 2023-08-16 2023-08-16 Flow analysis method and device and electronic equipment Pending CN117220915A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311036077.XA CN117220915A (en) 2023-08-16 2023-08-16 Flow analysis method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311036077.XA CN117220915A (en) 2023-08-16 2023-08-16 Flow analysis method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN117220915A true CN117220915A (en) 2023-12-12

Family

ID=89036024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311036077.XA Pending CN117220915A (en) 2023-08-16 2023-08-16 Flow analysis method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN117220915A (en)

Similar Documents

Publication Publication Date Title
US20190166024A1 (en) Network anomaly analysis apparatus, method, and non-transitory computer readable storage medium thereof
CN111914090B (en) Method and device for enterprise industry classification identification and characteristic pollutant identification
CN116150676B (en) Equipment fault diagnosis and identification method and device based on artificial intelligence
CN107168995B (en) Data processing method and server
CN111898366A (en) Document subject word aggregation method and device, computer equipment and readable storage medium
CN110472802B (en) Data characteristic evaluation method, device and equipment
CN116432091B (en) Equipment fault diagnosis method based on small sample, construction method and device of model
US11580425B2 (en) Managing defects in a model training pipeline using synthetic data sets associated with defect types
CN112685324A (en) Method and system for generating test scheme
KR20190008515A (en) Process Monitoring Device and Method using RTC method with improved SAX method
CN114662602A (en) Outlier detection method and device, electronic equipment and storage medium
CN113646714A (en) Processing parameter setting method and device for production equipment and computer readable medium
CN116166967B (en) Data processing method, equipment and storage medium based on meta learning and residual error network
CN111368837B (en) Image quality evaluation method and device, electronic equipment and storage medium
CN111783883A (en) Abnormal data detection method and device
CN116304721A (en) Data standard making method and system for big data management based on data category
CN117220915A (en) Flow analysis method and device and electronic equipment
CN115494431A (en) Transformer fault warning method, terminal equipment and computer readable storage medium
CN112395179B (en) Model training method, disk prediction method, device and electronic equipment
CN113656354A (en) Log classification method, system, computer device and readable storage medium
CN114355234A (en) Intelligent quality detection method and system for power module
CN114398228A (en) Method and device for predicting equipment resource use condition and electronic equipment
CN111385342B (en) Internet of things industry identification method and device, electronic equipment and storage medium
CN116996527B (en) Method for synchronizing data of converging current divider and storage medium
CN108735295B (en) Blood analysis method and terminal equipment based on regression tree model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination