CN117478534A - Cluster-oriented network communication log storage method, cluster server and cluster - Google Patents

Cluster-oriented network communication log storage method, cluster server and cluster Download PDF

Info

Publication number
CN117478534A
CN117478534A CN202311424875.XA CN202311424875A CN117478534A CN 117478534 A CN117478534 A CN 117478534A CN 202311424875 A CN202311424875 A CN 202311424875A CN 117478534 A CN117478534 A CN 117478534A
Authority
CN
China
Prior art keywords
cluster
log
network communication
communication connection
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311424875.XA
Other languages
Chinese (zh)
Inventor
谢雨来
张良康
冯丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202311424875.XA priority Critical patent/CN117478534A/en
Publication of CN117478534A publication Critical patent/CN117478534A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a cluster-oriented network communication log storage method, a cluster server and a cluster, belonging to the field of cluster communication log processing, wherein the method comprises the following steps: acquiring a network communication log record from a message queue in a server, extracting quintuple information as a communication connection record, searching the communication connection record in a log barrel, and if the search is successful, updating the last use time of the communication connection record in the log barrel and a graph database; otherwise, after generating ID for it, it is inserted into log barrel and graph database; the log barrel is a six-layer multi-way tree in the memory, the 2 nd-5 th layer nodes are a source IP, a source port, a destination IP and a destination port in sequence, and the leaf nodes store the ID, the last use time and the protocol of the communication connection record; the graph database is a graph structure in a persistent storage device, its nodes are host IP, and edges exist between host nodes where network communication occurs. The invention can effectively improve the storage efficiency of a large amount of network communication logs in the cluster.

Description

Cluster-oriented network communication log storage method, cluster server and cluster
Technical Field
The invention belongs to the field of cluster communication log processing, and particularly relates to a cluster-oriented network communication log storage method, a cluster server and a cluster.
Background
In a host cluster, in order to ensure the communication and safety of the whole cluster, the communication connection existing in the cluster needs to be monitored, and network faults, intrusion behaviors and calling relations existing in the cluster can be better discovered through recording the network communication log of the communication connection. Typically, each host collects its own communication log information, and sends the collected communication log information to a designated server for summarizing, and the server summarizes, stores and displays the network communication log of the current cluster.
In general, the number of hosts in a cluster is huge, and data sent to a server is huge, and storage pressure ratio is large. Moreover, the communication logs are often required to be related to time, the inquiry is carried out according to the time range, and a certain two hosts possibly initiate multiple communications within a certain time, so that more quintuple redundancy exists in the communication traffic logs collected within a certain time range, the storage and inquiry efficiency is reduced, and a plurality of problems are brought to the collection, storage and display of the trunking network communication logs. The existing method often adopts a simple relational database to store the network communication log, and can not effectively solve the problems of storage and redundancy of the communication traffic log.
In addition, the network communication log may be characterized by cold and hot data, and in particular, data accesses farther from the current time may be less frequent and data accesses closer to the current time may be more frequent. This feature can be applied to improving the query efficiency of the network communication log, however, the existing method for storing the network communication log based on the relational database cannot fully utilize the cold and hot features of the network communication log, so the query efficiency needs to be further improved.
Disclosure of Invention
Aiming at the defects and improvement demands of the prior art, the invention provides a cluster-oriented network communication log storage method, a cluster server and a cluster, and aims to effectively improve the storage efficiency of massive network communication logs in the cluster.
To achieve the above object, according to one aspect of the present invention, there is provided a cluster-oriented network communication log storage method, including: while each host in the cluster sends the acquired network communication log record to a message queue of the server, the following steps are executed in the server:
acquiring network communication log records from the message queue, extracting quintuple information in the network communication log records as communication connection records, searching the communication connection records in a log barrel, and if the communication connection records are successfully searched, updating the last use time of the communication connection records in the log barrel and the graph database to be the current time; if the searching is unsuccessful, after an ID is generated for the communication connection record, the communication connection record is inserted into a log barrel and a graph database, and the last use time of the communication connection record is set as the establishment time of the communication connection;
the communication log record comprises quintuple information and the establishment time of communication connection; the log bucket is a six-layer multi-way tree created in the memory, wherein the 2 nd-5 th layer nodes are a source IP, a source port, a destination IP and a destination port in sequence, each connection record corresponds to a path from a root node to a leaf node in the multi-way tree, and the information stored by the leaf node comprises an ID, a last use time and a protocol of the communication connection record; the graph database is a graph structure created in the persistent storage device, wherein nodes are host computer IP, edges exist between nodes corresponding to the hosts where network communication occurs, and information of the edges comprises an ID of a message connection record, last use time, a protocol, a source port and a destination port.
Further, when the communication connection record which does not exist in the log bucket is inserted into the log bucket and the database, the method further comprises the following steps: taking the establishment time of the communication connection as a time stamp, and inserting the time stamp and the ID of the communication connection record into a hot bank; the thermal library is a time sequence database and is used for storing communication connection records in the latest N time windows;
the cluster-oriented network communication log storage method further comprises the following steps:
at the fixed time of each time window, data in the earliest time window in the thermal warehouse is migrated to the refrigerator; the refrigerator is a time sequence database and is used for storing communication connection records in the (n+1) -M time windows recently;
wherein M and N are both positive integers, and M >2N.
Further, N time windows correspond to one week and M time windows correspond to one month.
Further, the method for transferring the data in the thermal warehouse to the cold warehouse further comprises the following steps: and deleting the data in the earliest time window in the refrigeration house.
Further, the method for deleting the data in the earliest time window in the refrigeration house comprises the following steps:
judging whether each ID which is deleted currently is used in other time windows, if not, deleting the communication connection record corresponding to the ID in the log barrel and the graph database;
the ID of the communication connection record is an integer number incrementally assigned from 0, and the ID is generated for the communication connection record in the following manner:
and acquiring an ID list corresponding to the communication connection record stored in the current graph database, and searching an unused minimum value according to the ID list to serve as a generated ID.
Further, the method for storing the network communication log facing the cluster provided by the invention further comprises the following steps: after the server receives the query request, the following steps are executed:
(S1) determining whether the communication connection record in the requested time range R is currently stored according to the time range of the data stored in the heat warehouse and the cold warehouse, if so, turning to the step (S2); otherwise, judging that the query result does not exist, and turning to the step (S4);
(S2) inquiring a thermal warehouse or a cold warehouse according to the time range R to obtain the ID of the corresponding communication connection record and performing duplication removal to obtain an ID list;
(S3) inquiring a graph database according to the ID list to obtain quintuple information of the communication connection records corresponding to the IDs, completing recovery of the network communication log, and returning the recovered network communication log;
(S4) the query is ended.
Further, the method further comprises the steps of obtaining a network communication log record from the message queue, extracting quintuple information in the network communication log record, and then: converting 4 numbers divided by "", in the IP in the five-tuple information into 2-bit hexadecimal character strings, thereby converting the IP into 8-bit hexadecimal character strings; converting ports in the five-tuple information into 2-bit hexadecimal character strings;
and, the communication connection is recorded as quintuple information after the conversion.
Further, the method for storing the network communication log facing the cluster provided by the invention further comprises a step of dynamically adjusting the sending frequency executed at the host side;
the step of dynamically adjusting the transmission frequency includes: and detecting network delay in real time, if the network delay is larger than a preset threshold value, reducing the frequency of sending the network communication log record to the message queue of the server, and recovering the frequency of sending the network communication log record to the message queue of the server to an initial set value when the network delay does not exceed the threshold value.
According to yet another aspect of the present invention, there is provided a cluster server including:
a first computer readable storage medium storing a computer program;
and a first processor for reading the computer program stored in the first computer readable storage medium and executing the cluster-oriented network communication log storage method provided by the invention.
According to yet another aspect of the present invention, there is provided a cluster, including a cluster server provided by the present invention; in the cluster, a sending frequency dynamic adjustment module is deployed in each host;
the transmission frequency dynamic adjustment module includes:
a second computer readable storage medium storing a computer program;
and a second processor, configured to read a computer program stored in a second computer readable storage medium, and execute the step of dynamically adjusting the transmission frequency in the cluster-oriented network communication log storage method provided by the present invention.
In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:
(1) According to the invention, the network communication log is stored by using the log barrel created in the memory at the server side, and the information of the network communication log is inserted into the log barrel only when the five-tuple in the network communication log has no corresponding record in the log barrel, so that the duplication removal is realized, and the storage efficiency of the massive network communication log is effectively improved; meanwhile, the log barrel is in a multi-way tree structure, and quintuples in the network communication log are respectively stored in nodes positioned at different layers in the multi-way tree structure, so that the quintuples overlapped in the first half part share the same node, the stored data volume is further compressed, whether a certain network communication log record is stored or not can be judged quickly through inquiring the log barrel, and the storage efficiency is further improved; in addition, a graph database is created in the persistent storage device, wherein the content is kept synchronous with the log bucket, and the safe availability of the log bucket information is ensured. In general, the invention adopts a storage scheme of mutually cooperating the log bucket and the graph database, and can effectively improve the storage efficiency of a large amount of network communication logs in the cluster.
(2) In the preferred scheme of the invention, a thermal warehouse and a cold warehouse are further maintained in a server based on a time sequence database and are respectively used for storing recently established communication connection records and communication connection records established in a long period of time, so that the separation storage of cold and hot data is realized, the recently generated network communication logs are more likely to be queried according to the access characteristics of the network communication logs in a cluster, the occupation ratio is often less, the probability of being queried of the network communication logs with longer generation time is less, and the occupation ratio is more likely to be greater, therefore, the data requested by the network communication log query request is mainly concentrated in the thermal warehouse with less data volume, and the query efficiency can be effectively improved.
(3) In the preferred scheme of the invention, under the condition that the data with the earliest time in the hot store data is migrated to the cold store to ensure the data degree in the hot store, the data with the earliest time in the cold store is further deleted periodically, the probability that the data is queried later is extremely low according to the access heat of the network communication log, and the data is deleted in time, so that the overlarge data quantity in the cold store can be avoided, the storage space is saved, and the query efficiency of the data in the cold store is ensured.
(4) In a preferred scheme of the invention, the IDs of the communication connection records are set to integer numbers which are incrementally distributed from 0, and the IDs of the communication connection records which are not used in other time windows are simultaneously deleted for the communication connection records which are not used in other time windows, and the corresponding IDs are deleted, and the unused minimum value is selected each time a new ID is generated, so that the exhaustion of the IDs can be effectively avoided, and invalid data is prevented from being stored in the log barrel and the graph database.
(5) In the preferred scheme of the invention, before five-tuple information in the network communication log is stored, the IP and the port are compressed into hexadecimal character strings, so that the comparison efficiency can be improved, and the storage space can be reduced.
(6) In the preferred scheme of the invention, each host computer can also monitor the network delay in real time, and reduce the frequency of sending the network communication log record to the message queue of the server when the network delay is larger, thereby avoiding affecting the normal service of the cluster.
Drawings
Fig. 1 is a flowchart of a method for storing a network communication log for a cluster according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a log bucket structure according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of cold-hot separation storage of a time-series database according to an embodiment of the present invention;
fig. 4 is a flowchart of searching log information according to a time range provided by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In order to solve the technical problems of low storage efficiency of massive network communication logs in the existing method for storing network communication logs based on a relational database, the invention provides a cluster-oriented network communication log storage method, a cluster server and a cluster, and the whole thought is as follows: the data structure of the network communication log stored in the server is improved, and the storage flow is improved based on the improved data structure, so that the network communication log is subjected to duplication removal and compression during storage, and the storage efficiency is effectively improved. Based on the method, cold and hot separation storage of the network communication log data is further realized by means of the time sequence database, so that the query efficiency of the network communication log is improved by utilizing the cold and hot characteristics of the network communication log.
And a plurality of hosts exist in the cluster, one or more hosts are selected as servers, and each host can collect own communication log information in real time and send the communication log information to the servers for storage and subsequent inquiry according to a time range.
The host typically acquires network connection log information by monitoring a host kernel network connection event, specifically, sets a reporting time interval by using a Netfilter conntrack module, acquires all flow objects in the current system in each time interval, and traverses all network flow connections.
The process of obtaining the network data flow through the Netfilter conntrack module is as follows:
(1) For a flow packet received in a system, attempting to acquire a corresponding connrack_info and a connection record;
(2) If the corresponding connrack_info and the connection record do not exist, judging whether the packet needs to be tracked, if the connection record exists or the packet does not need to be tracked, ignoring the packet, and not performing operation;
(3) Extracting information from an L4 header of a flow packet, initializing a connection variable related to a protocol, defining a callback method corresponding to the connection, and checking information such as the integrity of the packet;
(4) Calling the resolve_normal_ct () to start connection tracking, and simultaneously creating a connrack entry corresponding to the packet or updating a corresponding connrack entry record;
(5) And traversing all the surviving entry records in each time interval, and acquiring corresponding information of both communication parties from the records for returning.
After obtaining the network connection flow information through the Netfilter conntrack module, all five-tuple groups (connection initiator, source port, protocol, connection receiver and destination port) of the network connection are obtained, preprocessing operations such as serialization into binary system are performed, the current time is attached after the preprocessing operations are compressed, and the data are compressed and then sent to a message queue in a server. The server will obtain the network communication log information from the message queue and store it.
The following are examples.
Example 1:
a cluster-oriented network communication log storage method, as shown in figure 1, comprises the following steps: while each host in the cluster sends the acquired network communication log record to a message queue of the server, the following steps are executed in the server:
acquiring network communication log records from the message queue, extracting quintuple information in the network communication log records as communication connection records, searching the communication connection records in a log barrel, and if the communication connection records are successfully searched, updating the last use time of the communication connection records in the log barrel and the graph database to be the current time; if the searching is unsuccessful, after an ID is generated for the communication connection record, the communication connection record is inserted into a log barrel and a graph database, and the last use time of the communication connection record is set as the establishment time of the communication connection;
wherein the communication log record includes quintuple information and an establishment time of the communication connection.
In this embodiment, the log bucket is a six-layer multi-way tree created in the memory, the structure of the log bucket is shown in fig. 2, wherein the 2 nd-5 th layer nodes are a source IP, a source port, a destination IP and a destination port in sequence, each connection record corresponds to a path from a root node to a leaf node in the multi-way tree, and the information stored in the leaf node includes an ID, a last use time and a protocol of the communication connection record. Based on the log bucket structure, when storing quintuple data, the first half overlapping quintuples will share the same node, e.g., a quintuple of a communication connection initiated by two different hosts in the same sending direction will share a source IP node, a quintuple of a communication connection initiated by the same sender to two different hosts via the same port will share a source IP node and a source port node, etc.; by the storage mode, the stored data volume can be effectively compressed, and whether a certain network communication log record is stored or not can be judged quickly through the query log bucket. It should be noted that, in this embodiment, the hierarchical arrangement of each information in the five-tuple in the log accords with the unidirectional of the network communication connection, so that the efficiency of querying whether a record exists in the log bucket can be improved.
In this embodiment, the graph database is a graph structure created in the persistent storage device, where the nodes are host IPs, edges exist between nodes corresponding to the hosts where network communications occur, and information of the edges includes IDs of the signal connection records, last use time, protocols, source ports, and destination ports. It is easy to understand that, because the number of hosts in the cluster is often huge, and the server cannot accurately acquire all the nodes existing in the cluster at the initial time, when the communication connection record is inserted into the graph database, it is first required to determine whether the host nodes to which the source IP and the destination IP belong already exist in the database, if so, corresponding edges are established, and information of the edges is recorded, and if not, it is required to create corresponding host nodes in the graph database. In this embodiment, the graph database created in the persistent storage device, where the insertion and update of the record is consistent with the log bucket in the memory, thereby implementing the persistence of the log bucket, and ensuring the security and availability of the log bucket information while the log bucket provides high query efficiency.
When the log bucket and the graph database are used for storing the network communication log information, the information of the network communication log can be inserted into the log bucket and the graph database only when the quintuple in the network communication log has no corresponding record in the log bucket, so that the duplication removal is realized, and the storage efficiency of the massive network communication log is effectively improved.
It will be readily appreciated that since the host compresses the quintuple information when sending the network communication log information, it is necessary to decompress the relevant information first after it is obtained from the message queue. As it is necessary to compare each item of information in the quintuple when the log bucket is queried to determine whether the related record is stored therein, in order to improve the comparison efficiency, as a preferred implementation manner, the present embodiment converts the source IP, the source port, the destination IP, and the destination port into hexadecimal character strings after the quintuple information is acquired, specifically, converts all the 4 digits divided by "", in the IP in the quintuple information into 2-bit hexadecimal character strings, thereby converting the IP into 8-bit hexadecimal character strings; converting ports in the five-tuple information into 2-bit hexadecimal character strings; for example, IP address "255.255.255.255" maps to "EEEEEEEE", port "65535" maps to "EE";
and, the communication connection is recorded as quintuple information after the conversion.
Through the up-conversion mode, the comparison efficiency is improved, and meanwhile, the length of the converted data is shortened, so that a certain compression effect is achieved, and the expenditure of storage space is further reduced.
Considering that the network communication log has the characteristic of cold and hot data, namely, the frequency of data access far from the current time is lower, and the frequency of data access near to the current time is higher. In order to fully utilize the characteristic, the query efficiency of querying the network communication log according to the time range is further improved, and the embodiment maintains a thermal warehouse and a refrigeration warehouse based on the time sequence database, wherein the thermal warehouse is used for storing network communication log data generated recently, and the refrigeration warehouse is used for storing network communication log data generated for a relatively long time, so that the data requested by the query request can be concentrated in the thermal warehouse as much as possible. Because the thermal store only stores recently generated data, the data volume is smaller, the access efficiency is higher, and compared with the method of directly storing all network communication log records by using one time sequence database, the method can effectively improve the overall query request.
In this embodiment, the hot store is specifically configured to store the network communication log data generated in the last week, and the cold store is specifically configured to store the network communication log data generated in the last month before the last week, as a preferred implementation manner, based on the analysis result that the network communication log generated in the last month is hardly accessed. Further analysis of this embodiment finds that, in general, network communication activity in the cluster is least active at 12:00 a night, so in this embodiment, data storage and migration are performed in a time window of day, and data migration from the thermal warehouse to the refrigerator is performed at 12:00 a night, so as to ensure heat of data in the thermal warehouse and avoid the data migration from affecting normal storage of network communication logs.
In order to maintain the above-mentioned thermal warehouse and cold store, as shown in fig. 3, in this embodiment, when the communication connection record that does not exist in the log bucket is inserted into the log bucket and the database, the method further includes: taking the establishment time of the communication connection as a time stamp, and inserting the time stamp and the ID of the communication connection record into a hot bank; the thermal library is a time sequence database and is used for storing communication connection records in the last week;
the cluster-oriented network communication log storage method further comprises the following steps:
at 12:00 a night, migrating data from the earliest day in the hot store to the cold store; the refrigerator is a time sequence database and is used for storing communication connection records generated before the last week and within the last month;
the method for transferring the data in the thermal warehouse to the cold warehouse further comprises the following steps: and deleting the data in the earliest time window in the refrigeration house.
For the ID deleted from the refrigerator, if the ID is not used on other dates, the network communication log data corresponding to the ID is not accessed, belongs to invalid data, and can be used for other communication connection records, and based on the consideration, in this embodiment, the method for deleting the data in the earliest time window in the refrigerator further includes:
judging whether each ID which is deleted currently is used in other time windows, if not, deleting the communication connection record corresponding to the ID in the log barrel and the graph database;
the ID of the communication connection record is an integer number incrementally assigned from 0, and the ID is generated for the communication connection record in the following manner:
and acquiring an ID list corresponding to the communication connection record stored in the current graph database, and searching an unused minimum value according to the ID list to serve as a generated ID.
According to the embodiment, through the cooperation of the data deleting mechanism of the refrigerator and the ID generation mode, the exhaustion of the ID can be avoided, the ID utilization rate is improved, meanwhile, the storage of invalid data in a log bucket and a graph database can be reduced, and the storage utilization rate is further improved.
In practical application, the time range of the data stored in the thermal warehouse and the cold warehouse, the length of the time window and the specific time of the data migration can be set correspondingly according to the behavior characteristics of the current cluster, and the above is only a preferable setting mode and is not to be construed as the only limitation of the invention.
As shown in fig. 4, in this embodiment, after the server receives the query request, the following steps are performed:
(S1) determining whether the communication connection record in the requested time range R is currently stored according to the time range of the data stored in the heat warehouse and the cold warehouse, if so, turning to the step (S2); otherwise, judging that the query result does not exist, and turning to the step (S4);
(S2) inquiring a thermal warehouse or a cold warehouse according to the time range R to obtain the ID of the corresponding communication connection record and performing duplication removal to obtain an ID list;
it is easy to understand that the time range R may be located entirely in the refrigerator, or entirely in the heat store, or may be located partially in the refrigerator and partially in the heat store, and the corresponding ID may be obtained by querying the heat store or the refrigerator according to a specific time.
(S3) inquiring a graph database according to the ID list to obtain quintuple information of the communication connection records corresponding to the IDs, completing recovery of the network communication log, and returning the recovered network communication log;
it is easy to understand that, in the graph database, the source port, the destination port and the protocol information are recorded on the corresponding edge of each ID, and the source IP and the destination IP are recorded on the two nodes connected to the edge, so that the complete quintuple information can be obtained based on the ID access graph database;
because the information stored in the server is the information after the pretreatment such as serialization by the host, and the five-tuple information stored in the graph database is the converted hexadecimal character string, after the inquired five-tuple information is obtained, the inquired five-tuple information needs to be converted into the original format, and is returned to the user after the pretreatment such as deserialization;
(S4) the query is ended.
In order to avoid the influence of the host on the normal service of the cluster when sending the network communication log information to the message queue in the server, as a preferred implementation manner, the embodiment further includes a step of dynamically adjusting the sending frequency executed at the host side;
the step of dynamically adjusting the transmission frequency includes: and detecting network delay in real time, if the network delay is larger than a preset threshold value, reducing the frequency of sending the network communication log record to the message queue of the server, and recovering the frequency of sending the network communication log record to the message queue of the server to an initial set value when the network delay does not exceed the threshold value. Specifically, in this embodiment, when the network delay is greater than the threshold value, it is indicated that more network resources are required for other services, and then the sending frequency of the network communication log data is reduced to half of the current sending frequency, so as to reduce the network resources occupied by sending the network communication log data; once the network delay does not exceed the threshold, indicating that the current network resource is idle, the transmission frequency is restored to the initial transmission frequency to transmit the collected network communication log information to the server as soon as possible.
It should also be noted that, the specific threshold value of the network delay, the initial value of the transmission frequency, and the amplitude of each reduction of the transmission frequency should be set accordingly according to the characteristics of the cluster application, which is only illustrative herein, and should not be construed as the only limitation of the present invention.
In general, the embodiment realizes the duplication removal storage of the network communication log data by creating the log bucket and the graph database, and effectively improves the storage efficiency; the cold and hot databases are maintained based on the time sequence database, so that the separation and storage of cold and hot data in the network communication log are realized, the cold and hot characteristics of the network communication log can be fully utilized, and the query efficiency is improved; the IP and port information in the network communication log are compressed into hexadecimal, so that the character string comparison efficiency is improved, and the storage space is further reduced; the frequency of the host computer for sending the network communication log is dynamically adjusted according to the delay of the current network, so that the sending efficiency of the network communication log is improved under the condition that the normal service of the cluster is not affected.
Example 2:
a cluster server, comprising:
a first computer readable storage medium storing a computer program;
and a first processor configured to read a computer program stored in a first computer readable storage medium, and execute the cluster-oriented network communication log storage method provided in the foregoing embodiment 1.
Example 3;
a cluster, comprising the cluster server provided in the above embodiment 2; in the cluster, a sending frequency dynamic adjustment module is deployed in each host;
the transmission frequency dynamic adjustment module includes:
a second computer readable storage medium storing a computer program;
and a second processor configured to read a computer program stored in a second computer readable storage medium, and execute the step of dynamically adjusting the transmission frequency in the cluster-oriented network communication log storage method provided in embodiment 1.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. The cluster-oriented network communication log storage method is characterized by comprising the following steps of: while each host in the cluster sends the acquired network communication log record to a message queue of a server, the following steps are executed in the server:
acquiring network communication log records from the message queue, extracting quintuple information in the network communication log records as communication connection records, searching the communication connection records in a log barrel, and updating the last use time of the communication connection records in the log barrel and a graph database to be the current time if the communication connection records are successfully searched; if the searching is unsuccessful, after an ID is generated for the communication connection record, the communication connection record is inserted into the log barrel and the graph database, and the last use time of the communication connection record is set as the establishment time of the communication connection;
wherein the communication log record comprises quintuple information and the establishment time of communication connection; the log bucket is a six-layer multi-way tree created in a memory, wherein the 2 nd-5 th layer nodes are a source IP, a source port, a destination IP and a destination port in sequence, each connection record corresponds to a path from a root node to a leaf node in the multi-way tree, and the information stored by the leaf node comprises an ID, a last use time and a protocol of a communication connection record; the graph database is a graph structure created in the persistent storage device, wherein nodes are host computer IP, edges exist between nodes corresponding to the hosts with network communication, and information of the edges comprises an ID of a message connection record, last use time, a protocol, a source port and a destination port.
2. The cluster-oriented network communication log storage method of claim 1, wherein upon inserting a communication connection record that does not exist in the log bucket into the log bucket and database, further comprising: taking the establishment time of the communication connection as a time stamp, and inserting the time stamp and the ID of the communication connection record into a hot bank; the thermal library is a time sequence database and is used for storing communication connection records in the latest N time windows;
the cluster-oriented network communication log storage method further comprises the following steps:
at the fixed time of each time window, migrating the data in the earliest time window in the thermal storage to the refrigeration storage; the refrigerator is a time sequence database and is used for storing communication connection records in the (n+1) -M time windows recently;
wherein M and N are both positive integers, and M >2N.
3. The method for storing a cluster-oriented network communication log according to claim 2, wherein the N time windows correspond to one week and the M time windows correspond to one month.
4. The method of cluster-oriented network communication log storage of claim 2, wherein the transferring data in the hot store to the cold store, simultaneously, further comprises: and deleting the data in the earliest time window in the refrigeration house.
5. The method of cluster-oriented network communication log storage of claim 4, wherein deleting data in an earliest time window in said freezer further comprises:
judging whether each ID which is deleted currently is used in other time windows, if not, deleting the log barrel and the communication connection record corresponding to the ID in the graph database;
the ID of the communication connection record is an integer number incrementally assigned from 0, and the ID is generated for the communication connection record in the following manner:
and acquiring an ID list corresponding to the communication connection record stored in the current graph database, and searching an unused minimum value according to the ID list to serve as a generated ID.
6. The cluster-oriented network communication log storage method of claim 5, further comprising: after the server receives the query request, the following steps are executed:
(S1) determining whether the communication connection record in the requested time range R is currently stored according to the time ranges of the data stored in the thermal warehouse and the cold warehouse, if so, turning to the step (S2); otherwise, judging that the query result does not exist, and turning to the step (S4);
(S2) inquiring a thermal warehouse or a cold warehouse according to the time range R to obtain the ID of the corresponding communication connection record and performing duplication removal to obtain an ID list;
(S3) inquiring the graph database according to the ID list to obtain quintuple information of the communication connection records corresponding to the IDs, completing recovery of the network communication log, and returning the recovered network communication log;
(S4) the query is ended.
7. The method for storing a network communication log for a cluster according to any one of claims 1 to 6, wherein obtaining a network communication log record from the message queue, and extracting quintuple information therein, further comprises: converting 4 numbers divided by "", in the IP in the five-tuple information into 2-bit hexadecimal character strings, thereby converting the IP into 8-bit hexadecimal character strings; converting ports in the five-tuple information into 2-bit hexadecimal character strings;
and, the communication connection is recorded as quintuple information after the conversion.
8. The cluster-oriented network communication log storage method according to any one of claims 1 to 6, further comprising a transmission frequency dynamic adjustment step performed at a host side;
the step of dynamically adjusting the transmission frequency includes: and detecting network delay in real time, if the network delay is larger than a preset threshold value, reducing the frequency of sending the network communication log records to the message queue of the server, and recovering the frequency of sending the network communication log records to the message queue of the server to an initial set value when the network delay does not exceed the threshold value.
9. A cluster server, comprising:
a first computer readable storage medium storing a computer program;
and a first processor configured to read a computer program stored in the first computer-readable storage medium, and execute the cluster-oriented network communication log storage method of any one of claims 1 to 7.
10. A cluster comprising the cluster server of claim 9; in the cluster, a sending frequency dynamic adjustment module is deployed in each host;
the transmission frequency dynamic adjustment module includes:
a second computer readable storage medium storing a computer program;
and a second processor configured to read a computer program stored in the second computer-readable storage medium, and perform the step of dynamically adjusting the transmission frequency in the cluster-oriented network communication log storage method of claim 8.
CN202311424875.XA 2023-10-30 2023-10-30 Cluster-oriented network communication log storage method, cluster server and cluster Pending CN117478534A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311424875.XA CN117478534A (en) 2023-10-30 2023-10-30 Cluster-oriented network communication log storage method, cluster server and cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311424875.XA CN117478534A (en) 2023-10-30 2023-10-30 Cluster-oriented network communication log storage method, cluster server and cluster

Publications (1)

Publication Number Publication Date
CN117478534A true CN117478534A (en) 2024-01-30

Family

ID=89626823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311424875.XA Pending CN117478534A (en) 2023-10-30 2023-10-30 Cluster-oriented network communication log storage method, cluster server and cluster

Country Status (1)

Country Link
CN (1) CN117478534A (en)

Similar Documents

Publication Publication Date Title
CN107943951B (en) Method and system for retrieving block chain service information
US20190222603A1 (en) Method and apparatus for network forensics compression and storage
US11429566B2 (en) Approach for a controllable trade-off between cost and availability of indexed data in a cloud log aggregation solution such as splunk or sumo
CN110413650B (en) Method, device, equipment and storage medium for processing service data
CN110650128A (en) System and method for detecting digital currency stealing attack of Etheng
US10009239B2 (en) Method and apparatus of estimating conversation in a distributed netflow environment
CN108228322B (en) Distributed link tracking and analyzing method, server and global scheduler
CN111641700B (en) Ceph object-based management and retrieval implementation method for storage metadata
WO2023273544A1 (en) Log file storage method and apparatus, device, and storage medium
CN109905479B (en) File transmission method and device
CN113886494A (en) Message storage method, device, equipment and computer readable medium for instant messaging
US11080239B2 (en) Key value store using generation markers
CN115499230A (en) Network attack detection method and device, equipment and storage medium
CN114398520A (en) Data retrieval method, system, device, electronic equipment and storage medium
CN112667620A (en) Data processing method and device, computer equipment and storage medium
CN117478534A (en) Cluster-oriented network communication log storage method, cluster server and cluster
WO2018077138A1 (en) Data configuration method, index management method, related apparatus and computing device
CN111813833B (en) Real-time two-degree communication relation data mining method
CN110825940B (en) Network data packet storage and query method
CN115695587A (en) Service data processing system, method, device and storage medium
CN113190546A (en) Eureka service control method, system and readable storage medium
CN111966635A (en) Method and device for improving file detection speed of distributed storage file system
CN113722362B (en) Remote-based cache data query method, device and system
Huang et al. Ceds: Center-edge collaborative data service for mobile iot data management
CN113316125B (en) Monitoring method, distributed vehicle-mounted system, vehicle and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination