CN114338180A

CN114338180A - Big data network communication implementation method

Info

Publication number: CN114338180A
Application number: CN202111648401.4A
Authority: CN
Inventors: 周雪芳; 苏娜; 李晓岩; 高长全; 史宏; 刘国新
Original assignee: Qingdao Huanghai University
Current assignee: Qingdao Huanghai University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-04-12

Abstract

The invention discloses a big data network communication realization method, belonging to the technical field of network communication, comprising the following steps: the network communication nodes of the network equipment perform data access, network information uploaded by the network equipment of the same type is acquired from a large database, and the corresponding relation between the communication traffic load and the time is obtained; when the network communicates information, the required flow configuration is obtained according to the corresponding relation between the communication flow load and the time, and a configuration request is sent; establishing one or more communication links according to the flow configuration request, and transmitting communication data between the network node and the network equipment through the one or more communication links; during data transmission, data scripts are collected in data streams, preprocessing and analysis are carried out, feature extraction is carried out, model training is carried out through big data information, data stream features are used as input, a monitoring model used for identifying flow with attack behaviors is trained through machine learning, and communication safety monitoring is carried out through the model.

Description

Big data network communication implementation method

Technical Field

The invention relates to the technical field of network communication, in particular to a big data network communication implementation method.

Background

The network is formed by connecting isolated workstations or hosts together by physical links to form data links, thereby achieving the purposes of resource sharing and communication, wherein the communication is information exchange and transmission between people through a certain medium, the network communication is realized by connecting each isolated network device through a network, the communication between people, people and computers, and between computers is realized through information exchange, one of the key technologies to be solved by the current network communication is to reduce the data transmission delay, so as to enable users to rapidly acquire network services, the current network communication is to ensure transmission efficiency, increase transmission power and the number of links at a glance, greatly increase network communication cost, in addition, during communication, the end that receives information lacks identification of communication content, and the security of the network device end cannot be effectively protected.

Disclosure of Invention

In view of the technical defects, the invention aims to provide a method for realizing big data network communication.

In order to solve the technical problems, the invention adopts the following technical scheme: the invention provides a big data network communication implementation method, which comprises the following steps:

s1, performing data access by the network communication node of the network equipment, acquiring the uploaded network information of the network equipment of the same type from the large database, and obtaining the corresponding relation between the communication traffic load and the time through comparison;

s2, when it is monitored that the network communication node terminal sends network communication information to a target server, obtaining required flow configuration according to the corresponding relation between communication flow load and time, and sending a configuration request;

s3, establishing one or more communication links according to the flow configuration request, and transmitting the communication data between the network node and the network equipment through the one or more communication links;

s4, during data transmission, acquiring data scripts in data streams, preprocessing and analyzing the data scripts, extracting features, and processing the data scripts into a uniform binary format through data stream serialization services;

s5, performing model training through big data information, taking data stream characteristics as input, training a monitoring model for identifying flow with attack behaviors through machine learning, and performing communication security monitoring through the model;

and S6, when the communication data is transmitted, comparing the extracted features in the uniform format with the monitoring model, when suspicious traffic is detected, sending a visual alarm to the network equipment, displaying and alarming the suspicious traffic, timely carrying out terminal communication, and storing the suspicious traffic and the corresponding network nodes into a large database.

In a preferred embodiment, in step S1, the method for determining the correspondence between the traffic load and the time includes: and obtaining the flow load of each device in a period of time of communication with the network device in the past communication history in the large database, obtaining the relation between the load and the period of time, and determining the maximum load and the minimum load which are in the same period of time as the preset period of time in history.

In a preferred embodiment, in step S3, after one or more communication links are established, it is determined whether a traffic that can be transmitted by a communication link meets a traffic load in communication; if the traffic which can be transmitted by the communication link is smaller than the minimum load obtained according to the corresponding relation between the traffic load and the time, overlapping the traffic which can be transmitted by the communication link adjacent to the communication link into the communication link as backup traffic, comparing the maximum load obtained according to the corresponding relation between the overlapped traffic and the traffic load and the time, and requiring the overlapped traffic to meet the maximum load which appears in the same historical time period as the preset time period.

In a preferred embodiment, after the one or more communication links are established in step S3, it is further determined whether the traffic configuration request transmitted by the one or more communication links exceeds a preset threshold: and if the transmitted flow configuration request exceeds a preset threshold value in the communication link, merging according to the communication flow configuration request time and the process flow configuration request position, and delivering and transmitting the merged flow data in a serial mode.

In a preferred embodiment, in step S4, a data script is collected in a data stream to intercept information of an access communication node of the network communication information according to a preset information interception rule, the feature extraction is to extract header information of the network communication information, and an address sequence or an address tag of an access address of the header information is processed into a uniform binary format through a data stream serialization service, so as to obtain a corresponding feature tag in the communication data stream.

In a preferred embodiment, in step S4, when the data script is collected in the data stream, the reading positioning point of the communication node is intercepted, and the interception is started from the reading positioning point until the writing process of the interception is stopped, and the reading positioning point is used as a writing sequence, the number of scripts stored in the writing sequence is compared with the interception number of the preset scripts, and if the interception number stored in the writing sequence exceeds a preset threshold, it is determined that the script intercepts the information successfully, so as to obtain the access communication node information of at least one access communication node.

In a preferred embodiment, in step S5, the data stream features in the big data and the features based on different protocols (i.e., protocol features) form total features, and the total features are stored in a distributed database, where the distributed database is used to store the total features and the mapping relationship between the preprocessed feature data in step S4 and the original data script, so as to implement column-wise sparse storage for storing the network stream features, and then the features in the distributed database are used as inputs to train a monitoring model for identifying traffic with attack behaviors through machine learning.

In an embodiment, the model training may adopt an ensemble learning algorithm, so that the model learns from a large number of features, and the recognition capability of the algorithm on the flow with the attack behavior is continuously optimized.

In a preferred embodiment, in step S6, the suspicious traffic and the corresponding network node are stored in the big database and then used as the training element of the monitoring model again, so as to continuously optimize and improve the recognition capability and accuracy of the traffic with the attack behavior along with the use.

The invention has the beneficial effects that:

1. obtaining the flow load of each device in a time period of communication with the network device in the current communication history from a large database, obtaining the relation between the load and the time period, and determining the maximum load and the minimum load which are in the same time period as the preset time period in the history, so as to conveniently judge the flow load range required by the current network communication;

2. one or more communication links are established according to the traffic load, whether the traffic which can be transmitted by the communication links meets the traffic load in communication or not is judged, and the traffic transmitted by the links does not generate redundancy while meeting network communication, so that the communication requirements are met and the energy consumption is saved;

3. in the communication process, acquiring a data script in a data stream, intercepting access communication node information of network communication information by a preset information interception rule, extracting characteristics to extract header information of the network communication information, processing an address sequence or an address label of an access address of the header information into a uniform binary format through data stream serialization service so as to obtain a corresponding characteristic mark in the communication data stream, combining the extracted characteristic information with characteristic information of other network communication in a large database to form a training set, and then training a monitoring model for identifying flow with an attack behavior through machine learning;

4. and when the communication data is transmitted, the extracted features in the uniform format are compared with the monitoring model in real time, and when suspicious traffic is detected, a visual alarm is sent to the network equipment to display and alarm the suspicious traffic so as to ensure the communication safety.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a big data network communication implementation method according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example (b): as shown in fig. 1, the present invention provides a method for implementing big data network communication, including the following steps:

s2, when it is monitored that the network communication node terminal sends network communication information to the target server, obtaining the required flow configuration according to the corresponding relation between the communication flow load and the time, and sending a configuration request;

Further, in step S1, the method for determining the correspondence between the traffic load and the time is: the method comprises the steps of obtaining the flow load of each device in a time period of communication with the network device in a current communication history in a large database, obtaining the relation between the load and the time period, and determining the maximum load and the minimum load which are in the same time period as a preset time period in the history, so that the flow load range required by the current network communication can be conveniently judged.

Further, in step S3, after one or more communication links are established, it is determined whether the traffic that can be transmitted by the communication links meets the traffic load in communication; if the traffic which can be transmitted by the communication link is smaller than the minimum load obtained according to the corresponding relation between the traffic load and the time, the traffic which can be transmitted by the communication link adjacent to the communication link is superposed into the communication link as backup traffic, and the maximum load obtained according to the corresponding relation between the superposed traffic and the traffic load and the time is compared, so that the superposed traffic is required to meet the maximum load which appears in the same historical time period as the preset time period, the established communication link is verified, the traffic transmitted by the link is ensured to meet network communication, and meanwhile, no redundancy is generated, so that the communication requirement is met, and meanwhile, the energy consumption is saved.

Further, in step S3, after the one or more communication links are established, it is further determined whether the traffic configuration request transmitted by the one or more communication links exceeds a preset threshold: and if the transmitted flow configuration request exceeds a preset threshold value in the communication link, merging according to the communication flow configuration request time and the process flow configuration request position, and delivering and transmitting the merged flow data in a serial mode.

Further, in step S4, a data script is collected in the data stream to intercept the visited communication node information of the network communication information according to a preset information interception rule, the feature extraction is to extract header information of the network communication information, and an address sequence or an address tag of a visited address of the header information is processed into a uniform binary format through a data stream serialization service, so as to obtain a corresponding feature flag in the communication data stream.

Further, in step S4, when the data script is collected in the data stream, the reading anchor point of the communication node is intercepted, and the interception is started from the reading anchor point until the writing process of the interception is stopped, and the reading anchor point is used as a writing sequence, the number of scripts stored in the writing sequence is compared with the preset interception number of scripts, and if the interception number stored in the writing sequence exceeds a preset threshold, it is determined that the script intercepts the information successfully, so as to obtain access communication node information of at least one access communication node, thereby extracting the data information of the communication node information.

Further, in step S5, a total feature is formed by the data stream features in the big data and features based on different protocols (i.e., protocol features), and the total feature is stored in a distributed database, where the distributed database is used to store the total feature and the mapping relationship between the preprocessed feature data in step S4 and the original data script, so as to implement column-wise sparse storage for storing network stream features, and then the features in the distributed database are used as inputs to train a monitoring model for identifying traffic with attack behaviors through machine learning.

Furthermore, an integrated learning algorithm can be adopted for model training, so that the model can learn from a large number of characteristics, and the identification capability of the algorithm on the flow with the attack behavior is continuously optimized.

Further, in step S6, after the suspicious traffic and the corresponding network node are stored in the big database, the suspicious traffic and the corresponding network node are used as the training element of the monitoring model again, so as to continuously optimize and improve the identification capability and accuracy of the traffic with the attack behavior along with the use.

When the network communication device is used, the flow load of each device in a time period of communication with the network device in a current communication history is obtained through the large database, the relationship between the load and the time period is obtained, and the maximum load and the minimum load which are used in the time period which is the same as the preset time period in the history are determined, so that the flow load range required by the current network communication is conveniently judged, one or more communication links are established according to the flow load, and whether the flow which can be transmitted by the communication links meets the flow load in the communication is judged; if the traffic which can be transmitted by the communication link is smaller than the minimum load obtained according to the corresponding relation between the traffic load and the time, the traffic which can be transmitted by the communication link adjacent to the communication link is superposed into the communication link as backup traffic, and the maximum load obtained according to the corresponding relation between the superposed traffic and the traffic load and the time is compared, the superposed traffic is required to meet the maximum load which appears in the same time period as the preset time period in history, the established communication link is verified, the traffic transmitted by the link can meet the network communication and can not generate redundancy, thereby the communication requirement is met and the energy consumption is saved, then in the communication process, a data script is collected in the data stream to intercept the access communication node information of the network communication information according to a preset information interception rule, and the characteristic is extracted as the head information of the network communication information, the method comprises the steps of processing an address sequence or an address label of an access address of header information into a uniform binary format through a data stream serialization service, thus obtaining a corresponding characteristic mark in a communication data stream, combining extracted characteristic information with characteristic information of other network communication in a large database to form a training set, then training a monitoring model for identifying flow with an attack behavior through machine learning, comparing the extracted characteristics of the uniform format with the monitoring model in real time during communication data transmission, sending a visual alarm to network equipment when suspicious flow is detected, displaying and alarming the suspicious flow, and ensuring communication safety.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A big data network communication implementation method is characterized by comprising the following steps:

2. The big data network communication implementation method of claim 1, wherein: in step S1, the method for determining the correspondence between the traffic load and the time is: and obtaining the flow load of each device in a period of time of communication with the network device in the past communication history in the large database, obtaining the relation between the load and the period of time, and determining the maximum load and the minimum load which are in the same period of time as the preset period of time in history.

3. The big data network communication implementation method of claim 1, wherein: in step S3, after one or more communication links are established, it is determined whether traffic that can be transmitted by a communication link satisfies traffic load in communication; if the traffic which can be transmitted by the communication link is smaller than the minimum load obtained according to the corresponding relation between the traffic load and the time, overlapping the traffic which can be transmitted by the communication link adjacent to the communication link into the communication link as backup traffic, comparing the maximum load obtained according to the corresponding relation between the overlapped traffic and the traffic load and the time, and requiring the overlapped traffic to meet the maximum load which appears in the same historical time period as the preset time period.

4. The big data network communication implementation method of claim 1, wherein: in step S3, after one or more communication links are established, it is further determined whether a traffic configuration request transmitted by the one or more communication links exceeds a preset threshold: and if the transmitted flow configuration request exceeds a preset threshold value in the communication link, merging according to the communication flow configuration request time and the process flow configuration request position, and delivering and transmitting the merged flow data in a serial mode.

5. The big data network communication implementation method of claim 1, wherein: in step S4, a data script is collected in the data stream, access communication node information of the network communication information is intercepted according to a preset information interception rule, the feature extraction is to extract header information of the network communication information, and an address sequence or an address tag of an access address of the header information is processed into a uniform binary format through a data stream serialization service, so as to obtain a corresponding feature flag in the communication data stream.

6. The big data network communication implementation method of claim 1, wherein: in step S4, when the data script is collected in the data stream, intercepting the reading positioning point of the communication node, and starting intercepting from the reading positioning point until the writing process of intercepting is stopped, as a writing sequence, comparing the number of scripts stored in the writing sequence with the preset intercepting number of scripts, and if the intercepting number stored in the writing sequence exceeds a preset threshold, determining that the script intercepts the information successfully, so as to obtain the access communication node information of at least one access communication node.

7. The big data network communication implementation method of claim 1, wherein: in step S5, the data stream features in the big data and the features based on different protocols (i.e., protocol features) form total features, and the total features are stored in a distributed database, where the distributed database is used to store the total features and the mapping relationship between the preprocessed feature data and the original data script in step S4, so as to implement column-wise sparse storage for storing network stream features, and then the features in the distributed database are used as inputs to train a monitoring model for identifying traffic with attack behaviors through machine learning.

8. The big data network communication implementation method of claim 7, wherein: the model training can adopt an integrated learning algorithm, so that the model can learn from a large number of characteristics, and the identification capability of the algorithm on the flow with the attack behavior is continuously optimized.

9. The big data network communication implementation method of claim 1, wherein: in step S6, the suspicious traffic and the corresponding network node are stored in the big database and then used as the training element of the monitoring model again, so as to continuously optimize and improve the recognition capability and accuracy of the traffic with the attack behavior along with the use.