CN115801471B - Network security data processing method based on big data processing - Google Patents

Network security data processing method based on big data processing Download PDF

Info

Publication number
CN115801471B
CN115801471B CN202310095115.2A CN202310095115A CN115801471B CN 115801471 B CN115801471 B CN 115801471B CN 202310095115 A CN202310095115 A CN 202310095115A CN 115801471 B CN115801471 B CN 115801471B
Authority
CN
China
Prior art keywords
data
network security
security data
priori knowledge
data packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310095115.2A
Other languages
Chinese (zh)
Other versions
CN115801471A (en
Inventor
郑君丽
颜慧娟
罗珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Hedun Technology Co ltd
Original Assignee
Jiangxi Hedun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Hedun Technology Co ltd filed Critical Jiangxi Hedun Technology Co ltd
Priority to CN202310095115.2A priority Critical patent/CN115801471B/en
Publication of CN115801471A publication Critical patent/CN115801471A/en
Application granted granted Critical
Publication of CN115801471B publication Critical patent/CN115801471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention provides a network security data processing method based on big data processing; the method comprises the following steps: s1, setting a data collector on a link connected with a server, wherein the data collector collects all network security data interacted with the server; s2, preprocessing network security data; s3, sending the preprocessed network security data into a priori filtering module and a deep learning classifying module to classify the network security data; and S4, performing visualization processing according to the classification result of the network security data. According to the invention, the network security data is subjected to dimension reduction and reconstruction according to the attribute in the priori knowledge, and the network security data subjected to dimension reduction and reconstruction is classified through the instruction algorithm set in advance by a user, so that the network security data classification efficiency can be improved, and the attack effect on flooding is particularly remarkable.

Description

Network security data processing method based on big data processing
Technical Field
The invention relates to the field of data processing, in particular to a network security data processing method based on big data processing.
Background
With the continuous popularization of networks, various fields of society are more and more separated from the networks, but with the continuous increase of network attacks, information is stolen by means of network attacks, operation such as server damage is greatly threatened on network security, so that analysis on network security data in network transmission is needed, and reference is provided for defending the network security.
At present, a deep learning model is generally needed for classifying and analyzing the network security data, the characteristic extraction is needed for the network security data, the calculated amount is large, and particularly when the network security data is large, the defect is more remarkable, and the timeliness of the classification result of the network security data is also weaker due to the large calculated amount.
Disclosure of Invention
The invention provides a network security data processing method based on big data processing, which can improve the network security data classification efficiency by carrying out dimension reduction and rearrangement on the network security data according to the attribute in priori knowledge and classifying the network security data after dimension reduction and rearrangement through an instruction algorithm set in advance by a user, and has an especially prominent attack effect on flooding.
A network security data processing method based on big data processing comprises the following steps:
s1: a data collector is arranged on a link connected with a server, and collects all network security data interacted with the server, wherein the network security data exists in a data packet form;
s2: preprocessing network security data, which specifically comprises the steps of sequentially cleaning, standardizing and reducing the network security data;
s3: the preprocessed network security data are sent to a priori filtering module and a deep learning classifying module, the network security data are classified into attack data and data to be classified, and the attack data are marked through attack data types;
s4: performing visual processing according to the classification result of the network security data, and intuitively displaying the network security condition corresponding to the network security data in front of a user;
the prior filtering module performs layer-by-layer filtering classification on the network security data through prior knowledge, wherein the prior knowledge exists in the form of delta, phi and mu, delta is an attribute set, and delta= { mu of the attribute set 1 ,μ 2 ,μ 3 ······μ n [ mu ] therein n Is an attribute corresponding to an attribute inside the network security data; psi is instruction ID, which is used as pointer to call the corresponding instruction algorithm; μ is the attack data type correspondingly classified by priori knowledge;
the priori filter module is internally provided with a plurality of priori knowledge, the priori knowledge in the prior filter module forms a layered structure, and one of the priori knowledge forms a priori knowledge layer which is marked as X i I=1, 2,3 · the contents of which are (1) and (l), and l is the total number of prior knowledge layers in the prior filtering module.
In step S3, the network security data is filtered and classified layer by the prior filtering module, which includes the following steps:
t1: sending the network security data into a priori filtering module, and distributing data packet IDs to all data packets in the network security data;
t2: let k=1, k be used as the number to select the priori knowledge, obtain the total number I of the internal priori knowledge layers of the priori filtering module;
t3: selecting a priori knowledge layer X k And according to a priori knowledge layer X k The corresponding attribute set delta obtains the corresponding attribute mu n Copying all network security data, wherein data packets in the network security data and data packets in the network security data copy are mapped one by one through data packet ID, and attribute mu is used for obtaining the data packet ID n Processing the data packet in the network security data copy, namely, the attribute and the attribute mu in the data packet n Matching is carried out, if matching is successful, the attribute of successful matching in the data packet and the data content corresponding to the attribute are reserved; if the matching fails, deleting the attribute of the matching failure in the data packet and the data content corresponding to the attribute, and performing dimension reduction and rearrangement on the inside of the network security data copy;
t4: layer X according to a priori knowledge k Acquiring a corresponding instruction ID psi, calling a corresponding instruction algorithm by taking the instruction ID psi as a pointer to perform two-classification on the data packet in the network security data copy after the dimension reduction and the reformation, dividing the data packet in the network security data copy into attack data and data to be classified, and marking the data packet classified into the attack data by the attack data type mu;
t5: after all the data packets in the network security data copy are classified, processing the data packets in the network security data according to the data packet ID corresponding to the data packet in the network security data copy, including outputting the data packets in the network security data corresponding to the data packet ID corresponding to the data packet classified as attack data in the network security data copy, and calculating the data packet output; the network security data copy is classified into the network security data internal data packet corresponding to the data packet ID corresponding to the data packet of the data to be classified, the network security data copy is reserved, and the network security data copy is deleted;
t6: assigning k+1 to k, and judging whether "k is less than or equal to I" is true or notIf "k.ltoreq.I" is satisfied, X is taken as k The rest network security data after being classified by the priori knowledge layer is sent to the next priori knowledge layer and returns to T3; if 'k is less than or equal to I', entering T7;
t7: and sending the residual network security data processed by the prior filtering module into the deep learning classification module.
The deep learning classification module is internally provided with a plurality of classifiers, performs classification operation on the network security data, and specifically classifies the network security data into attack data and data to be classified, wherein the attack data is marked by attack data types;
when the residual network security data processed by the prior filtering module reaches the deep learning classifying module, the residual network security data processed by the prior filtering module is further classified by different classifiers in sequence, the data packet classified as attack data is output, the output quantity of the data packet is calculated, and the data packet classified as data to be classified is sent to the next classifier.
The method also comprises the steps of adjusting the sequence of the priori knowledge layer according to the output quantity duty ratio of the corresponding data packet of the priori knowledge layer and the weight coefficient of the attack data type, and the steps are as follows:
h1: in the step T6, when 'k is less than or equal to I', entering T7 and simultaneously entering H2;
h2: let j=1, j be used as the number selection a priori knowledge layer;
and H3: selecting a priori knowledge layer X j Packet output Q of (2) j Calculating the data packet output ratio v=q j /Q z Wherein Q is z For the total number of data packets in the network security data, and acquiring a priori knowledge layer X j A weight coefficient a corresponding to the attack data type mu;
h4: calculating the a priori knowledge layer X j Ranking weight beta of (2) j =1/(1+e -rυ+d )+ω·(e a -1), wherein r, d and ω are all ranking weights β j Is obtained through training of a learning model;
and H5: will sort the weights beta j Adding the value to the aggregate zeta, assigning j+1 to j, and judging that 'j is less than or equal to I' isIf not, if 'j is less than or equal to I', returning to H3; if 'j is less than or equal to I', H6 is entered;
h6: all ranking weights β in the set ζ j Reorder in descending order and rank the rank weights β j The corresponding prior knowledge layers are also reordered to adjust the positions of the prior knowledge layers in the prior verification filter module.
The specific steps for obtaining r, d and omega through training of the learning model are as follows: acquiring a network security data training set which is classified by a user in advance, sending the network security data training set to a priori knowledge module, after filtering classification operation is completed, acquiring a sequencing result of a priori knowledge layer through processing of H1 to H6, calculating the sequencing result through a cross entropy loss function with a priori knowledge layer expected sequencing result which is set by the user in advance, and outputting values of r, d and omega if the calculated loss function value is within a preset threshold range without operation; if the calculated loss function value is not within the preset threshold value range, the values of r, d and omega are adjusted in a feedback mode, and iterative calculation is conducted.
The method also comprises the step of disabling the prior knowledge layer with the data packet output of 0, and comprises the following steps:
g1: a priori knowledge storage module is arranged for storing a priori knowledge layer with the data packet output of 0, and when the prior knowledge layer filters and classifies the network security data by the prior knowledge filtering module, the data packet output Q of the prior knowledge layer is continuously monitored j
And G2: judgment of Q j Whether or not =0 "holds, if" Q j =0″ holds, enter G3; if' Q j =0 "is not true, no operation;
and G3: and moving the corresponding prior knowledge layer from the prior filtering module to the prior knowledge storage module, deactivating the prior knowledge layer, and returning to G1.
The method further comprises the step of analyzing the network security data remained after being classified by the prior filtering module and the deep learning classifying module again, and specifically comprises the following steps:
d1: acquiring residual network security data after being classified by the prior filtering module and the deep learning classifying module, and recording the residual network security data as network security data;
d2: performing cluster analysis on the network security data P by using an unsupervised cluster analysis algorithm to obtain a data packet set epsilon in a cluster, obtaining the number M of data packets in the cluster, and calculating the cluster data packet number ratio lambda=M/Q z
D3: judging whether lambda is less than or equal to phi or not, wherein phi is a threshold value of the quantity of clustered data packets, and if lambda is less than or equal to phi, no operation is performed; if 'lambda less than or equal to phi' is not established, entering D4;
d4: sending the data packet set epsilon into a priori knowledge storage module, and enabling c=1, wherein c is used for selecting a priori knowledge layer stored in the priori knowledge storage module as a number;
d5: selecting a priori knowledge layer X stored in a priori knowledge storage module I+c The method comprises the steps of carrying out a first treatment on the surface of the Through a priori knowledge layer X stored in a priori knowledge storage module I+c Filtering and classifying the data packet set epsilon and calculating the data packet output Q I+c Judge "Q I+c Whether or not =0 "holds, if" Q I+c =0 "is not true, enter D6; if' Q I+c =0″ holds, enter D7;
d6: layer X of corresponding a priori knowledge I+c Added to a priori filtering module, and the ranking weight beta of the priori knowledge layer is calculated according to steps H3 to H4 I+c Enter H5, layer X of priori knowledge I+c Reordering together with a priori knowledge layer within a priori filtering module;
d7: c+1 is assigned to C, whether C is less than or equal to C is judged, C is the total number of prior knowledge layers stored in the prior knowledge storage module, if so, the data packet set is sent to the prior knowledge layer stored in the next prior knowledge storage module, and D5 is returned; if not, enter D8;
d8: and marking the data packet set epsilon as an unknown data set, and sending the unknown data set to a user, wherein the user analyzes according to the unknown data set and performs the operation of adding a priori knowledge layer or adding a classifier.
The invention has the following advantages:
1. according to the invention, the big data processing technology is applied to the network security data, the network security data is subjected to classification analysis, the network security situation is intuitively displayed in front of a user, the user can conveniently and timely check the current network security situation, and the defense strategy of the server is adjusted in a targeted manner.
2. According to the invention, the network security data is subjected to dimension reduction and reconstruction according to the attribute in the priori knowledge, and the network security data subjected to dimension reduction and reconstruction is classified through the instruction algorithm set in advance by a user, so that the network security data classification efficiency can be improved, and the attack effect on flooding is particularly remarkable.
3. According to the invention, after the filtering and classifying operation is completed once through the priori filtering module, the priori knowledge layers in the priori filtering module are reordered according to the data packet output quantity ratio and the weight coefficient of the attack data type, and the priori knowledge layers with higher data packet output quantity ratio are arranged in front, so that the network security data with higher attack times are output in advance, and the classification result of the subsequent network security data is convenient to visualize; meanwhile, the weight coefficient of the attack data type is balanced, and the attack data with high security threat degree is prevented from being output in a lagged way.
4. According to the invention, the data packet output quantity is detected, and the priori knowledge layer with the data packet output quantity of 0 is deactivated, so that the priori knowledge layer with the data packet output quantity of 0 is prevented from occupying computer resources, and the overall computer resource utilization efficiency is improved.
5. According to the invention, the data packet set which needs to be suspected is detected by carrying out cluster analysis on the network security data which is remained after being classified by the prior filtering module and the deep learning classifying module, and the prior filtering module and the deep learning classifying module are updated according to the data packet set, so that the accuracy rate of network security data detection is improved.
Drawings
Fig. 1 is a flow chart of a network security data processing method based on big data processing according to an embodiment of the present invention.
Description of the embodiments
In order to enable those skilled in the art to better understand the technical solution of the present invention, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Embodiment 1, a network security data processing method based on big data processing, as shown in fig. 1, is directed to a visual analysis for DDOS flooding attack and worm attack, and specifically includes the following steps:
s1: setting a data collector on a link connected with a server, recording an interface transmitted by the server network through sdk in a set time period, opening up a thread pool, collecting all network security data of the interface transmitted by the server network, wherein the network security data exists in a form of a data packet, the data packet contains various attributes, has higher data dimension, and generally stores information through key-value pairs, such as a source IP address: 192.168.1.116.3337.
s2: preprocessing network security data, specifically including data cleaning, data standardization and data reduction of the network security data, and removing noise data such as incomplete data, error data, repeated data and the like in the network security data through data cleaning; scaling the network security data by data normalization so that the data inside the network security data falls in a smaller interval; in data reduction, selecting an attribute for network security data through an optimal model selected by an AIC rule, so as to realize data volume compression of the network security data; in the above way, the data has certain consistency and usability.
S3: and sending the preprocessed network security data into a priori filtering module and a deep learning classifying module, classifying the network security data into attack data and data to be classified, and marking the attack data by the attack data type.
The prior filtering module classifies network security data in advance according to prior knowledge preset by a user, wherein the prior knowledge exists in the form of delta, phi and mu, delta is an attribute set, and delta= { mu of the attribute set 1 ,μ 2 ,μ 3 ······μ n [ mu ] therein n As attributes, corresponding to attributes within the network security data, such as source IP address, source port, and payload, etc.; psi is instruction ID, which is composed of 16bit binary numbers and is used as a pointer to call a corresponding instruction algorithm; μ is the attack data type classified by the priori knowledge, such as transmission layer attack data, application layer attack data and the like, and the μ is set simultaneously when the priori knowledge is set by a user; the priori filter module is internally provided with a plurality of priori knowledge, the priori knowledge in the prior filter module forms a layered structure, and one of the priori knowledge forms a priori knowledge layer which is marked as X i I=1, 2,3 · the contents of which are (1) and (l), l is the total number of a priori knowledge layers inside the a priori filtering module, and in step S3, the prior filtering module is used for filtering and classifying the network security data layer by layer, and the specific steps are as follows:
t1: and sending the network security data into a priori filtering module, and distributing data packet IDs to all data packets in the network security data.
T2: let k=1, k be used as the number to select the priori knowledge, and obtain the total number I of the priori knowledge layers inside the a priori filtering module.
T3: selecting a priori knowledge layer X k And according to a priori knowledge layer X k The corresponding attribute set delta obtains the corresponding attribute mu n Copying all network security data, wherein data packets in the network security data and data packets in the network security data copy are mapped one by one through data packet ID, and attribute mu is used for obtaining the data packet ID n Processing the data packet in the network security data copy, namely, the attribute and the attribute mu in the data packet n Matching is carried out, if matching is successful, the attribute of successful matching in the data packet and the data content corresponding to the attribute are reserved; if the matching fails, deleting the attribute of the matching failure in the data packet and the data content corresponding to the attribute, and realizing the dimension reduction and the rearrangement of the inside of the network security data copy.
T4: layer X according to a priori knowledge k Acquiring a corresponding instruction ID phi, and calling a corresponding instruction algorithm by taking the instruction ID phi as a pointer to perform two-step processing on the data packet in the network security data copy after the dimension reduction and the reformationClassifying, namely dividing the data packet in the network security data copy into attack data and data to be classified, and marking the data packet classified into the attack data by the attack data type mu.
T5: after all the data packets in the network security data copy are classified, processing the data packets in the network security data according to the data packet ID corresponding to the data packet in the network security data copy, including outputting the data packets in the network security data corresponding to the data packet ID corresponding to the data packet classified as attack data in the network security data copy, and calculating the data packet output; and reserving the network security data internal data packet corresponding to the data packet ID corresponding to the data packet which is internally classified into the data to be classified, and deleting the network security data copy.
T6: assigning k+1 to k, judging whether 'k is less than or equal to I', if 'k is less than or equal to I', X is calculated k The rest network security data after being classified by the priori knowledge layer is sent to the next priori knowledge layer and returns to T3; if "k.ltoreq.I" is not true, it indicates that the processing has been performed through all the layers of prior knowledge, and T7 is entered.
T7: and sending the residual network security data processed by the prior filtering module into the deep learning classification module.
To more specifically describe the prior filtering module, the following is a practical application example: when the transmission layer flooding attack is required to be classified, the transmission layer flooding attack is carried out by sending a large number of data packets, the data packets have similar characteristics and are concentrated in the same area, the attribute of priori knowledge is set as a source IP address, a target IP address, the data packet size and the data packet sending time, the instruction algorithm corresponding to the instruction ID of the priori knowledge is a KNN algorithm, when the network security data reach the priori knowledge, the attribute of the data packet size and the data packet sending time is subjected to dimension reduction and reformation according to the source IP address, the target IP address, the data packet size and the data packet sending time, then clustering analysis is carried out through the KNN algorithm, and when the ratio of the number of the clustered data packets to the total data packet in the network security data exceeds a set threshold value, the clustered data packets are classified as attack data, otherwise the data to be classified.
The network security data is subjected to dimension reduction and reconstruction according to the attribute in the priori knowledge, and the network security data subjected to dimension reduction and reconstruction is classified through the instruction algorithm set in advance by a user, so that the network security data classification efficiency can be improved, and the attack effect on flooding is particularly remarkable.
The deep learning classification module is internally provided with a plurality of classifiers, each classifier is trained through a network security data training set of different types, network security data are classified into attack data and data to be classified, wherein the attack data are marked through the attack data types, when the network security data which are remained after being processed by the prior filtering module reach the deep learning classification module, the network security data which are remained after being processed by the prior filtering module are further classified through the different classifiers in sequence, data packets which are classified into the attack data are output, data packet output is calculated, the data packets which are classified into the data to be classified are sent to the next classifier, and particularly, the data to be classified which are output by the last classifier are temporarily not processed.
S4: the network security conditions corresponding to the network security data are visually displayed in front of the user according to the classification result of the network security data, the network security conditions comprise attack times (namely data packet output corresponding to the attack data), attack data types and other information, visual display is carried out in a chart or dynamic curve mode and the like, so that the user can conveniently and timely check the network security conditions, and the defense strategy of the server can be adjusted in a targeted manner.
According to the invention, the big data processing technology is applied to the network security data, the network security data is subjected to classification analysis, the network security situation is intuitively displayed in front of a user, the user can conveniently and timely check the current network security situation, and the defense strategy of the server is adjusted in a targeted manner.
In the actual use process of the method, because the data processing has timeliness and is particularly obvious in big data processing, a user can expect to output network security data with higher attack times in advance, so that the user can know security threats faced by a server more quickly, and the method comprises the following steps of:
h1: in step T6, when "k.ltoreq.I" is not satisfied, H2 is entered while T7 is entered.
H2: let j=1, j be used as the number to select the a priori knowledge layer.
And H3: selecting a priori knowledge layer X j Packet output Q of (2) j Calculating the data packet output ratio v=q j /Q z Wherein Q is z For the total number of data packets in the network security data, and acquiring a priori knowledge layer X j The weight coefficient a corresponding to the attack data type mu is set by people according to the specific function of the server, the interval is (0, 1), and if the server is used for providing network service, the weight coefficient corresponding to the attack data type mu is the transmission layer attack data and is set to be higher; if the server is for enterprise system control, the weighting factor for application layer attack data corresponding to the attack data type is set higher.
H4: calculating the a priori knowledge layer X j Ranking weight beta of (2) j =1/(1+e -rυ+d )+ω·(e a -1), wherein r, d and ω are all ranking weights β j Is obtained through training of a learning model.
And H5: will sort the weights beta j Adding the value into a set zeta, assigning j+1 to j, judging whether 'j is less than or equal to I' is met, if 'j is less than or equal to I' is met, indicating that all prior knowledge layers are not traversed yet, and returning to H3; if not, the method indicates that all the prior knowledge layers are traversed, and H6 is entered.
H6: all ranking weights β in the set ζ j Reorder in descending order and rank the rank weights β j The corresponding prior knowledge layers are also reordered to adjust the positions of the prior knowledge layers in the prior verification filter module.
The specific steps for obtaining r, d and omega through training of the learning model are as follows: acquiring a network security data training set which is classified by a user in advance, sending the network security data training set to a priori knowledge module, after finishing filtering classification operation, acquiring a sequencing result of a priori knowledge layer through processing of H1 to H6, and performing cross entropy loss function calculation with a desired sequencing result set by the user in advance, if the loss function value is in a preset threshold range, setting the preset threshold range by the user, and outputting values of r, d and omega without operation; if the loss function value is not within the preset threshold value range, the values of r, d and omega are not applicable, the values of r, d and omega are adjusted in a feedback mode, and iterative calculation is conducted.
According to the invention, after the filtering and classifying operation is completed once through the priori filtering module, the priori knowledge layers in the priori filtering module are reordered according to the data packet output quantity ratio and the weight coefficient of the attack data type, and the priori knowledge layers with higher data packet output quantity ratio are arranged in front, so that the network security data with higher attack times are output in advance, and the classification result of the subsequent network security data is convenient to visualize; meanwhile, the weight coefficient of the attack data type is balanced, and the attack data with high security threat degree is prevented from being output in a lagged way.
In a specific network attack example, since the attack means of an attacker will be continuously changed, the condition that the output quantity of the prior knowledge layer data packet is 0 will occur, but when the prior knowledge layer data packet is filtered and classified by the prior filtering module, even if the output quantity of the prior knowledge layer data packet is 0, the prior knowledge layer data packet is still subjected to dimension reduction and reformation according to the attribute set delta of the prior knowledge layer, which is meaningless for the filtering and classification of the network security data, and wastes computer resources, therefore, the prior knowledge layer with the output quantity of the data packet being 0 needs to be deactivated, and the method specifically comprises the following steps:
g1: a priori knowledge storage module is arranged for storing a priori knowledge layer with the data packet output quantity of 0, and when the prior knowledge storage module carries out filtering classification on the network security data, the data packet output quantity Q is continuously monitored j
And G2: judgment'Q j If yes, it indicates that the packet output of the a priori knowledge layer is 0, and G3 is entered; if not, the packet output of the prior knowledge layer is not 0, which is meaningful output and has no operation.
And G3: and moving the corresponding prior knowledge layer from the prior filtering module to the prior knowledge storage module, deactivating the prior knowledge layer, and returning to G1.
According to the invention, the data packet output quantity is detected, and the priori knowledge layer with the data packet output quantity of 0 is deactivated, so that the priori knowledge layer with the data packet output quantity of 0 is prevented from occupying computer resources, and the overall computer resource utilization efficiency is improved.
In the operation, the prior filtering module filters and classifies the network security data through the prior knowledge set by the user, and is limited by the prior setting of the user; the deep learning classification module is trained through a network security data training set and is limited by the network security data training set; the prior filtering module and the deep learning classifying module have certain limitations, so that the residual network security data after being classified by the prior filtering module and the deep learning classifying module still has attack data which cannot be identified by the prior filtering module and the deep learning classifying module, and therefore, the residual network security data after being classified by the prior filtering module and the deep learning classifying module still needs to be analyzed again, and the method specifically comprises the following steps:
d1: and acquiring the residual network security data after being classified by the prior filtering module and the deep learning classifying module, and recording the residual network security data as the network security data.
D2: performing cluster analysis on the network security data P by using a CLARANS algorithm to obtain a data packet set epsilon in a cluster, obtaining the number M of data packets in the cluster, and calculating the cluster data packet number ratio lambda=M/Q z
Since an attacker generally sets a large amount of attack data with the same type in order to improve the attack success rate when the attack data is generated, the rest network security data after being classified by the prior filtering module and the deep learning classifying module is analyzed by an unsupervised clustering analysis algorithm, and if clusters exist, that is, a large amount of data packets with the same type exist, the data packets are reasonably suspected.
D3: judging whether lambda is less than or equal to phi or not, wherein phi is a clustering data packet quantity proportion threshold value, setting by a user, if so, indicating that the quantity of data packets in a clustering cluster is still in a reasonable range, not doubting the data packets, and still considering the data packets as data to be classified, and having no operation; if not, it is indicated that the number of data packets in the cluster is not within a reasonable range, and the data packets are required to be suspected, and D4 is entered.
D4: and sending the data packet set epsilon into a priori knowledge storage module, wherein c=1, and c is used for selecting a priori knowledge layer stored in the priori knowledge storage module as a number.
D5: selecting a priori knowledge layer X stored in a priori knowledge storage module I+c The method comprises the steps of carrying out a first treatment on the surface of the Through a priori knowledge layer X stored in a priori knowledge storage module I+c Filtering and classifying the data packet set epsilon and calculating the data packet output Q I+c Judge "Q I+c Whether or not =0 "holds, if" Q I+c =0 "does not hold, indicating that the a priori knowledge layer that has been disabled can classify the data packet set epsilon, entering D6; if so, the prior knowledge layer which is deactivated cannot classify the data packet set epsilon, and D7 is entered.
D6: layer X of corresponding a priori knowledge I+c Added to a priori filtering module, and the ranking weight beta of the priori knowledge layer is calculated according to steps H3 to H4 I+c Enter H5, layer X of priori knowledge I+c Together with a priori knowledge layer within the a priori filtering module.
D7: c+1 is assigned to C, whether C is equal to or less than C is judged, C is the total number of prior knowledge layers stored in the prior knowledge storage module, if so, the data packet set epsilon is sent to the prior knowledge layer stored in the next prior knowledge storage module, and D5 is returned; if not, go to D8.
D8: and marking the data packet set epsilon as an unknown data set, and sending the unknown data set to a user, wherein the user analyzes according to the unknown data set and performs the operation of adding a priori knowledge layer or adding a classifier.
According to the invention, the data packet set which needs to be suspected is detected by carrying out cluster analysis on the network security data which is remained after being classified by the prior filtering module and the deep learning classifying module, and the prior filtering module and the deep learning classifying module are updated according to the data packet set, so that the accuracy rate of network security data detection is improved.
It will be understood that modifications and variations will be apparent to those skilled in the art from the foregoing description, and it is intended that all such modifications and variations be included within the scope of the following claims. Parts of the specification not described in detail belong to the prior art known to those skilled in the art.

Claims (6)

1. The network security data processing method based on big data processing is characterized by comprising the following steps:
s1: a data collector is arranged on a link connected with a server, and collects all network security data interacted with the server, wherein the network security data exists in a data packet form;
s2: preprocessing network security data, which specifically comprises the steps of sequentially cleaning, standardizing and reducing the network security data;
s3: the preprocessed network security data are sent to a priori filtering module and a deep learning classifying module, the network security data are classified into attack data and data to be classified, and the attack data are marked through attack data types;
s4: performing visual processing according to the classification result of the network security data, and intuitively displaying the network security condition corresponding to the network security data in front of a user;
the prior filtering module performs layer-by-layer filtering classification on the network security data through prior knowledge, wherein the prior knowledge exists in the form of (delta, phi, mu), delta is an attribute set,property set δ= { μ 1 ,μ 2 ,μ 3 ······μ n [ mu ] therein n Is an attribute corresponding to an attribute inside the network security data; psi is instruction ID, which is used as pointer to call the corresponding instruction algorithm; μ is the attack data type correspondingly classified by priori knowledge;
the priori filter module is internally provided with a plurality of priori knowledge, the priori knowledge in the prior filter module forms a layered structure, and one of the priori knowledge forms a priori knowledge layer which is marked as X i I=1, 2,3 · the contents of which are (1) and (l), l is the total number of priori knowledge layers in the priori filtering module;
in step S3, the network security data is filtered and classified layer by the prior filtering module, which includes the following steps:
t1: sending the network security data into a priori filtering module, and distributing data packet IDs to all data packets in the network security data;
t2: let k=1, k be used as the number to select the priori knowledge, obtain the total number I of the internal priori knowledge layers of the priori filtering module;
t3: selecting a priori knowledge layer X k And according to a priori knowledge layer X k The corresponding attribute set delta obtains the corresponding attribute mu n Copying all network security data, wherein data packets in the network security data and data packets in the network security data copy are mapped one by one through data packet ID, and attribute mu is used for obtaining the data packet ID n Processing the data packet in the network security data copy, namely, the attribute and the attribute mu in the data packet n Matching is carried out, if matching is successful, the attribute of successful matching in the data packet and the data content corresponding to the attribute are reserved; if the matching fails, deleting the attribute of the matching failure in the data packet and the data content corresponding to the attribute, and performing dimension reduction and rearrangement on the inside of the network security data copy;
t4: layer X according to a priori knowledge k Acquiring a corresponding instruction ID phi, calling a corresponding instruction algorithm by taking the instruction ID phi as a pointer to perform two classifications on the data packets in the network security data copy after the dimension reduction and the reformation, and dividing the data packets in the network security data copy into attacksThe data and the data to be classified are marked by the attack data type mu;
t5: after all the data packets in the network security data copy are classified, processing the data packets in the network security data according to the data packet ID corresponding to the data packet in the network security data copy, including outputting the data packets in the network security data corresponding to the data packet ID corresponding to the data packet classified as attack data in the network security data copy, and calculating the data packet output; the network security data copy is classified into the network security data internal data packet corresponding to the data packet ID corresponding to the data packet of the data to be classified, the network security data copy is reserved, and the network security data copy is deleted;
t6: assigning k+1 to k, judging whether 'k is less than or equal to I', if 'k is less than or equal to I', X is calculated k The rest network security data after being classified by the priori knowledge layer is sent to the next priori knowledge layer and returns to T3; if 'k is less than or equal to I', entering T7;
t7: and sending the residual network security data processed by the prior filtering module into the deep learning classification module.
2. The network security data processing method based on big data processing according to claim 1, wherein the deep learning classification module is internally provided with a plurality of classifiers to perform classification operation on the network security data, specifically classified into attack data and data to be classified, wherein the attack data is marked by attack data types;
when the residual network security data processed by the prior filtering module reaches the deep learning classifying module, the residual network security data processed by the prior filtering module is further classified by different classifiers in sequence, the data packet classified as attack data is output, the output quantity of the data packet is calculated, and the data packet classified as data to be classified is sent to the next classifier.
3. The method for processing network security data based on big data processing according to claim 2, further comprising the step of adjusting the order of the priori knowledge layers according to the weight coefficients of the output ratio of the corresponding data packets of the priori knowledge layers and the attack data types, wherein the steps are as follows:
h1: in the step T6, when 'k is less than or equal to I', entering T7 and simultaneously entering H2;
h2: let j=1, j be used as the number selection a priori knowledge layer;
and H3: selecting a priori knowledge layer X j Packet output Q of (2) j Calculating the data packet output ratio v=q j /Q z Wherein Q is z For the total number of data packets in the network security data, and acquiring a priori knowledge layer X j A weight coefficient a corresponding to the attack data type mu;
h4: calculating the a priori knowledge layer X j Ranking weight beta of (2) j =1/(1+e -rυ+d )+ω·(e a -1), wherein r, d and ω are all ranking weights β j Is obtained through training of a learning model;
and H5: will sort the weights beta j Adding the value into a set zeta, assigning j+1 to j, judging whether 'j is less than or equal to I' is met, and returning to H3 if 'j is less than or equal to I' is met; if 'j is less than or equal to I', H6 is entered;
h6: all ranking weights β in the set ζ j Reorder in descending order and rank the rank weights β j The corresponding prior knowledge layers are also reordered to adjust the positions of the prior knowledge layers in the prior verification filter module.
4. A network security data processing method based on big data processing according to claim 3, wherein the specific steps of acquiring r, d and ω through learning model training are as follows: acquiring a network security data training set which is classified by a user in advance, sending the network security data training set to a priori knowledge module, after filtering classification operation is completed, acquiring a sequencing result of a priori knowledge layer through processing of H1 to H6, calculating the sequencing result through a cross entropy loss function with a priori knowledge layer expected sequencing result which is set by the user in advance, and outputting values of r, d and omega if the calculated loss function value is within a preset threshold range without operation; if the calculated loss function value is not within the preset threshold value range, the values of r, d and omega are adjusted in a feedback mode, and iterative calculation is conducted.
5. The method for processing network security data based on big data processing according to claim 4, further comprising disabling a priori knowledge layer with data packet output of 0, comprising the steps of:
g1: a priori knowledge storage module is arranged for storing a priori knowledge layer with the data packet output quantity of 0, and when the prior knowledge storage module carries out filtering classification on the network security data, the data packet output Q of the priori knowledge layer is continuously monitored j
And G2: judgment of Q j Whether or not =0 "holds, if" Q j =0″ holds, enter G3; if' Q j =0 "is not true, no operation;
and G3: and moving the corresponding prior knowledge layer from the prior filtering module to the prior knowledge storage module, deactivating the prior knowledge layer, and returning to G1.
6. The method for processing network security data based on big data processing according to claim 5, further comprising the step of analyzing again the network security data remaining after being classified by the prior filtering module and the deep learning classifying module, specifically comprising the steps of:
d1: acquiring residual network security data after being classified by the prior filtering module and the deep learning classifying module, and marking the residual network security data as network security data P;
d2: performing cluster analysis on the network security data P by using an unsupervised cluster analysis algorithm to obtain a data packet set epsilon in a cluster, obtaining the number M of data packets in the cluster, and calculating the cluster data packet number ratio lambda=M/Q z
D3: judging whether lambda is less than or equal to phi or not, wherein phi is a threshold value of the quantity of clustered data packets, and if lambda is less than or equal to phi, no operation is performed; if 'lambda less than or equal to phi' is not established, entering D4;
d4: sending the data packet set epsilon into a priori knowledge storage module, and enabling c=1, wherein c is used for selecting a priori knowledge layer stored in the priori knowledge storage module as a number;
d5: selecting a priori knowledge layer X stored in a priori knowledge storage module I+c The method comprises the steps of carrying out a first treatment on the surface of the Through a priori knowledge layer X stored in a priori knowledge storage module I+c Filtering and classifying the data packet set epsilon and calculating the data packet output Q I+c Judge "Q I+c Whether or not =0 "holds, if" Q I+c =0 "is not true, enter D6; if' Q I+c =0″ holds, enter D7;
d6: layer X of corresponding a priori knowledge I+c Added to a priori filtering module, and the ranking weight beta of the priori knowledge layer is calculated according to steps H3 to H4 I+c Enter H5, layer X of priori knowledge I+c Reordering together with a priori knowledge layer within a priori filtering module;
d7: c+1 is assigned to C, whether C is less than or equal to C is judged, C is the total number of prior knowledge layers stored in the prior knowledge storage module, if so, the data packet set is sent to the prior knowledge layer stored in the next prior knowledge storage module, and D5 is returned; if not, enter D8;
d8: and marking the data packet set epsilon as an unknown data set, and sending the unknown data set to a user, wherein the user analyzes according to the unknown data set and performs the operation of adding a priori knowledge layer or adding a classifier.
CN202310095115.2A 2023-02-10 2023-02-10 Network security data processing method based on big data processing Active CN115801471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310095115.2A CN115801471B (en) 2023-02-10 2023-02-10 Network security data processing method based on big data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310095115.2A CN115801471B (en) 2023-02-10 2023-02-10 Network security data processing method based on big data processing

Publications (2)

Publication Number Publication Date
CN115801471A CN115801471A (en) 2023-03-14
CN115801471B true CN115801471B (en) 2023-04-28

Family

ID=85430797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310095115.2A Active CN115801471B (en) 2023-02-10 2023-02-10 Network security data processing method based on big data processing

Country Status (1)

Country Link
CN (1) CN115801471B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A kind of network intrusions method for detecting abnormality based on machine learning
CN114422184A (en) * 2021-12-14 2022-04-29 国网浙江省电力有限公司金华供电公司 Network security attack type and threat level prediction method based on machine learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108040073A (en) * 2018-01-23 2018-05-15 杭州电子科技大学 Malicious attack detection method based on deep learning in information physical traffic system
US20190273510A1 (en) * 2018-03-01 2019-09-05 Crowdstrike, Inc. Classification of source data by neural network processing
CN109829299B (en) * 2018-11-29 2022-05-10 电子科技大学 Unknown attack identification method based on depth self-encoder
US20210352095A1 (en) * 2020-05-05 2021-11-11 U.S. Army Combat Capabilities Development Command, Army Research Labortary Cybersecurity resilience by integrating adversary and defender actions, deep learning, and graph thinking
CN114118507A (en) * 2021-07-14 2022-03-01 青岛博天数通信息科技有限公司 Risk assessment early warning method and device based on multi-dimensional information fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A kind of network intrusions method for detecting abnormality based on machine learning
CN114422184A (en) * 2021-12-14 2022-04-29 国网浙江省电力有限公司金华供电公司 Network security attack type and threat level prediction method based on machine learning

Also Published As

Publication number Publication date
CN115801471A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN113162908B (en) Encrypted flow detection method and system based on deep learning
Qu et al. A survey on the development of self-organizing maps for unsupervised intrusion detection
US9729571B1 (en) System, method, and computer program for detecting and measuring changes in network behavior of communication networks utilizing real-time clustering algorithms
CN113037730A (en) Network encryption traffic classification method and system based on multi-feature learning
CN110730140A (en) Deep learning flow classification method based on combination of space-time characteristics
CN108921764B (en) Image steganography method and system based on generation countermeasure network
CN106897404B (en) Recommendation method and system based on multi-GRU layer neural network
CN107659444A (en) Secret protection cooperates with the difference privacy forecasting system and method for Web service quality
CN111565156B (en) Method for identifying and classifying network traffic
CN109951462B (en) Application software flow anomaly detection system and method based on holographic modeling
Manju et al. Ensemble feature selection and classification of internet traffic using XGBoost classifier
CN108229131A (en) Counterfeit APP recognition methods and device
CN114510732A (en) Encrypted traffic classification method based on incremental learning
WO2022009148A1 (en) Machine learning system, method, and computer program for managing guest network access in a residential space
CN115795535A (en) Differential private federal learning method and device for providing adaptive gradient
McCarthy et al. Feature vulnerability and robustness assessment against adversarial machine learning attacks
Magán-Carrión et al. Improving the reliability of network intrusion detection systems through dataset integration
CN115801471B (en) Network security data processing method based on big data processing
CN111737576B (en) Application function personalized recommendation method and device
CN112818399A (en) Big data access authority dynamic adjustment method and big data access control equipment
CN115348198B (en) Unknown encryption protocol identification and classification method, device and medium based on feature retrieval
CN114866301B (en) Encryption traffic identification and classification method and system based on direct push graph
CN114362988B (en) Network traffic identification method and device
Kozlowski et al. A New Method of Testing Machine Learning Models of Detection for Targeted DDoS Attacks.
Joshi et al. Edge-level privacy in Graph Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant