CN115460214B - Distributed network communication log storage and retrieval method and device - Google Patents

Distributed network communication log storage and retrieval method and device Download PDF

Info

Publication number
CN115460214B
CN115460214B CN202211402538.6A CN202211402538A CN115460214B CN 115460214 B CN115460214 B CN 115460214B CN 202211402538 A CN202211402538 A CN 202211402538A CN 115460214 B CN115460214 B CN 115460214B
Authority
CN
China
Prior art keywords
storage
module
retrieval
storage node
communication logs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211402538.6A
Other languages
Chinese (zh)
Other versions
CN115460214A (en
Inventor
赵泽祺
王凯峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yuantek Technology Co ltd
Original Assignee
Beijing Yuantek Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yuantek Technology Co ltd filed Critical Beijing Yuantek Technology Co ltd
Priority to CN202211402538.6A priority Critical patent/CN115460214B/en
Publication of CN115460214A publication Critical patent/CN115460214A/en
Application granted granted Critical
Publication of CN115460214B publication Critical patent/CN115460214B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

A distributed network connection log storage and retrieval method and device, the method receives the connection log on the Internet through a distribution module, the distribution module supports various load distribution algorithms, and can adapt to the issuing of the connection log in different service scenes; each storage node of the storage module receives the communication logs issued by the distribution module to carry out preprocessing, and the communication logs are classified and stored according to the IP address attributes; the retrieval module receives a retrieval command from the client, retrieves the command from the storage module, arranges the retrieved required communication logs and then sends the required communication logs to the client; the management module is used for managing each storage node in the storage module, periodically carrying out disk cleaning and fault detection on the storage nodes, updating the state information of the storage nodes in real time, and notifying the distribution module and the retrieval module of the storage node information so as to enable the global state information of the storage nodes to be consistent. The invention has the characteristics of wide application, flexible storage mode, high retrieval speed and efficiency, accurate retrieval result and stable system operation.

Description

Distributed network communication log storage and retrieval method and device
Technical Field
The application belongs to the technical field of log storage and retrieval, and particularly relates to a distributed network communication log storage and retrieval method and device.
Background
With the construction and development of the internet, the number of internet users rapidly increases, and the network flow also sharply increases, so that how to safely and efficiently store massive logs and how to quickly search the massive logs become two problems to be solved urgently in the face of massive user internet log data with billions or even billions of entries every day.
In the past, a centralized storage mode is generally adopted for storing the communication logs, although the implementation mode is simple, once a centralized storage node fails, all stored data are lost, and the problem of low stability exists.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a method and an apparatus for storing and retrieving a distributed network connection log, so as to solve or partially solve the above technical problems.
Based on the above purpose, a first aspect of the present application provides a distributed network connection log storage and retrieval method, including:
the distribution module receives a communication log on the Internet and issues the communication log according to a load distribution algorithm;
each storage node of the storage module receives the communication logs issued by the distribution module, preprocesses the communication logs, and stores the communication logs in a classified manner according to the IP address attributes;
the retrieval module receives a retrieval command from the client, retrieves the command from the storage module, arranges the retrieved required communication logs and then sends the required communication logs to the client;
and managing each storage node in the storage module through a management module, performing disk cleaning and fault detection according to preset time, updating the state information of the storage nodes in real time, and notifying the distribution module and the retrieval module of the storage node information so as to ensure that the global state information of the storage nodes is consistent.
As a preferred scheme of a storage and retrieval method of the distributed network communication log, the distribution module carries out decapsulation operation on the received communication log and reads quintuple information of the communication log;
and the distribution module sends the corresponding communication log to the storage node corresponding to the storage module according to a load distribution algorithm.
As a preferred scheme of a storage and retrieval method of the communication logs of the distributed network, for a service scene which does not need to consider the content of the communication logs and has the same processing performance of each storage node, the distribution module supports the adoption of a polling scheduling algorithm, so that the communication logs are completely and evenly distributed to different storage nodes;
for the service scenes that the contents of the communication logs do not need to be considered and the processing performance of each storage node is different, setting a weight for each storage node according to the processing performance of the storage node, and supporting the distribution of the communication logs to each storage node in a polling mode by utilizing a weighted polling scheduling algorithm according to the weight;
for a service scene needing to store the contents of the communication logs in consideration, the distribution module supports the adoption of a hash algorithm based on source and destination addresses and a hash algorithm based on a quintuple and distributes the communication logs according to the requirements of users.
As an optimal scheme of a distributed network communication log storage retrieval method, the storage nodes establish different storage directories on a magnetic disk according to different IP address attributes;
if the classification is carried out according to the geographic position attribute, the communication logs are respectively stored in different subdirectories named by cities;
if the user attributes are classified, the communication log is stored in the subdirectory named by the specific IP address.
As a preferred scheme of a distributed network communication log storage retrieval method, retrieval keywords adopted by the retrieval module comprise start time, end time, a source IP address, a destination IP address, a source port and a destination port;
each retrieval command contains at least a source IP address and a start time and an end time.
As a preferred scheme of the distributed network communication log storage and retrieval method, the retrieval module analyzes the retrieval command after receiving the retrieval command, starts a retrieval task for each retrieval command, and starts a plurality of threads for each retrieval task to complete together.
The method is characterized in that the management module sets corresponding threshold values for the disks of the storage nodes with different performances in the process of cleaning the disks of the storage nodes, and deletes the earliest communication log in the storage nodes according to a preset time unit when the utilization rate of the disks exceeds the set threshold values.
As a preferred scheme of a distributed network communication log storage and retrieval method, for the fault detection process of storage nodes, the management module monitors each storage node in real time, when the storage nodes cannot provide a disk writing function but can provide a query function, the management module judges that a fault is written, and notifies the distribution module, so that the distribution module dynamically adjusts a distribution algorithm and sends a communication log to a normal storage node;
when the storage node can normally provide disk writing operation but cannot provide query function, the storage node is judged to be read fault, and the management module informs the distribution module and the retrieval module to enable the distribution module and the retrieval module to perform distribution and retrieval operation on the normal storage node.
As a preferred scheme of a distributed network communication log storage and retrieval method, in the processes of storage node state updating and global state pushing, the storage module reports the self-maintained storage node state information to the management module in real time, when the storage node has state information change, the management module pushes the latest state information to the distribution module and the retrieval module, and the distribution module and the retrieval module are adjusted according to the change.
A second aspect of the present application provides a distributed network association log storage and retrieval apparatus, which adopts the distributed network association log storage and retrieval method of the first aspect or any possible implementation manner thereof, and includes:
the distribution module is used for receiving the communication logs on the Internet and issuing the communication logs according to a load distribution algorithm;
each storage node of the storage module receives the communication logs issued by the distribution module, preprocesses the communication logs and stores the communication logs in a classified manner according to the IP address attributes;
the retrieval module is used for receiving a retrieval command from the client, retrieving from the storage module, arranging the retrieved required communication logs and then sending the required communication logs to the client;
and the management module is used for managing each storage node in the storage module, carrying out disk cleaning and fault detection according to preset time, updating the state information of the storage nodes in real time, and informing the distribution module, the storage module and the retrieval module of the storage node information so as to ensure that the global state information of the storage nodes is consistent.
A third aspect of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for storing and retrieving the logs of the distributed network alliance according to the first aspect.
A fourth aspect of the present application proposes a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute a method of implementing the distributed network connectivity log storage retrieval method according to the first aspect.
As can be seen from the above, according to the technical scheme provided by the application, the communication logs on the internet are received through the distribution module, and the communication logs are issued according to the load distribution algorithm; each storage node of the storage module receives the communication logs issued by the distribution module, preprocesses the communication logs, and stores the communication logs in a classified manner according to the IP address attributes; the retrieval module receives a retrieval command from the client, retrieves the command from the storage module, arranges the retrieved required communication logs and sends the ordered communication logs to the client; and managing each storage node in the storage module through a management module, carrying out disk cleaning and fault detection according to preset time, updating the state information of the storage nodes in real time, and informing the distribution module and the retrieval module of the storage node information so as to ensure that the global state information of the storage nodes is consistent. The invention can be adapted to different service scenes; meanwhile, the state information of the storage nodes can be unified among the modules, and when the storage nodes are increased or fail, other modules can be quickly adjusted, so that the stability of the system is ensured; the method has the characteristic of quick retrieval, solves the problems of low storage efficiency, low retrieval speed, incapability of coordinating mutually and the like of the communication logs in the existing scheme, and has wide applicability.
Drawings
In order to more clearly illustrate the technical solutions in the present application or the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic processing flow diagram of a distribution module in an embodiment of the present invention;
FIG. 2 is a storage directory structure of a federation log in an embodiment of the present invention;
FIG. 3 is a schematic processing flow diagram of a search module according to an embodiment of the present invention;
FIG. 4 is a diagram of an embodiment of a storage node in a normal operating condition according to an embodiment of the present invention;
FIG. 5 is a diagram of an embodiment of a storage node in an embodiment of the present invention in the case of a failure;
FIG. 6 is a diagram illustrating an architecture of a distributed network communication log storage and retrieval apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.
It should be noted that technical terms or scientific terms used in the embodiments of the present application should have a general meaning as understood by those having ordinary skill in the art to which the present application belongs, unless otherwise defined. The use of the terms "comprising" or "including" and the like in the embodiments of the present application, means that the element or item appearing before the term covers the element or item listed after the term and its equivalents, without excluding other elements or items.
The federation log is a general term for network session logs that can well describe network behavior from the session level, for example: netflow, sFlow both belong to the federation log. The traditional storage of the communication logs generally adopts a centralized storage mode, although the implementation mode is simple, once a centralized storage node fails, all stored data are lost, and the problem of low stability exists.
In the related art, some storage schemes provide extensible storage performance by means of a distributed framework, but storage engines of the schemes are mostly based on a traditional relational database, the storage performance is limited, the retrieval efficiency is low, and the requirements of efficient storage and quick retrieval of data cannot be met. The existing storage scheme facing to the communication logs cannot well take storage and retrieval into account, so that the applicability of the existing storage scheme facing to the communication logs is limited.
In view of this, in order to solve the problems of slow storage speed, poor expandability and reliability, and low retrieval efficiency in the existing log storage and retrieval, embodiments of the present invention provide a distributed network communication log storage and retrieval method and apparatus, and the following is a specific content of the embodiments of the present invention.
Referring to fig. 1, fig. 2, and fig. 3, with reference to fig. 6, an embodiment of the present invention provides a distributed network communication log storage and retrieval method, including the following steps:
s1, a distribution module receives a communication log on the Internet and issues the communication log according to a load distribution algorithm;
s2, each storage node of the storage module receives the communication log issued by the distribution module, preprocesses the communication log, and stores the communication log in a classified manner according to the IP address attribute;
s3, a retrieval module receives a retrieval command from a client, retrieves the retrieval command from the storage module, arranges the retrieved required communication logs and then sends the required communication logs to the client;
and S4, managing each storage node in the storage module through the management module, carrying out disk cleaning and fault detection according to preset time, updating the state information of the storage nodes in real time, and informing the distribution module and the retrieval module of the storage node information so as to enable the global state information of the storage nodes to be consistent.
In this embodiment, the distribution module performs decapsulation operation on the received communication log, and reads quintuple information of the communication log; and the distribution module sends the corresponding communication log to the storage node corresponding to the storage module according to a load distribution algorithm. For the service scenes that the contents of the communication logs do not need to be considered and the processing performance of each storage node is the same, the distribution module adopts a polling scheduling algorithm to ensure that the communication logs are completely and evenly distributed to different storage nodes; for the service scenes that the contents of the communication logs do not need to be considered and the processing performance of each storage node is different, setting a weight for each storage node according to the processing performance of the storage node, and distributing the communication logs to each storage node in a polling mode according to the weight by using a weighted polling scheduling algorithm; for a service scene needing to consider the contents of the communication logs for storage, the distribution module adopts a hash algorithm based on a source address and a destination address and a hash algorithm based on a quintuple to distribute the communication logs according to the requirements of users.
Specifically, for distribution of the communication logs, the distribution module supports multiple load balancing strategies, and can adapt to different service scenarios:
for scenes that the contents of the communication logs do not need to be considered and the processing performance of each storage node is the same, the distribution module supports the adoption of algorithms such as polling scheduling and the like, so that the communication logs can be completely and evenly distributed to different storage nodes, and the required communication logs can be retrieved by accessing any node.
And for the scenes that the contents of the communication logs do not need to be considered but the processing performances of the storage nodes are different, the distribution module supports the adoption of algorithms such as weighted polling scheduling and the like, fully considers the processing performance of each storage node and sets a weight for each node according to the processing performance. And the weighted polling scheduling algorithm distributes the communication logs to each storage node in a polling mode according to the weight value, and the nodes with large weight values store more communication logs than the nodes with small weight values.
The distribution module supports the adoption of a hash algorithm based on source and destination addresses, a hash algorithm based on quintuple, and the like for scenes needing to be stored according to the content of the communication logs, and distributes the communication logs according to different requirements.
In this embodiment, the storage node establishes different storage directories in the disk according to different IP address attributes; if the classification is carried out according to the geographic position attribute, the communication logs are respectively stored in different subdirectories named by cities; if the user attributes are classified, the communication logs are stored in subdirectories named by specific IP addresses. The retrieval key words adopted by the retrieval module comprise start time, end time, source IP address, destination IP address, source port and destination port; each retrieval command contains at least a source IP address and a start time and an end time.
Specifically, after receiving the communication logs sent by the distribution module, each storage node of the storage module preprocesses the communication logs, and stores the communication logs in a classified manner according to the IP address attributes, for example: the different types of the communication logs have different storage paths, and an administrator can select different classification modes according to requirements, so that the retrieval efficiency of the communication logs is improved.
The storage node can establish different storage directories on the disk according to different IP attributes, for example: classifying according to the geographic position attribute, and respectively storing the communication logs into different subdirectories named by cities; the blog, sorted by user attribute, is stored in a subdirectory named by a specific IP address. The subdirectories under them are built according to time, and the storage directory structure of the communication log in the storage node is shown in the following figure 2:
specifically, the subdirectories are constructed according to year, month, day, hour and file name, and finally form a directory with the following format: CL _ ROOT/Classification mode/YYYYY/MM/DD/HH/filename. The storage node generates one file every 5 minutes and numbers each file from 00-11 to make up the file name as shown in fig. 2.
The method comprises the steps that a communication log forms a physical file in a storage node according to a fixed format, and the file format of the communication log comprises a head part and a data part. Where the head takes 128 bytes, the first 32 bits represent the length of the talk log, the next 32 bits represent the version of the talk log, and the last 64 bits represent the timestamp. The Data section contains a plurality of blocks, each block corresponding to a one-time communication log cache of a storage node. Each block comprises two parts, wherein the first part is block header information: the first 32 bits indicate whether the file is finished; the last 32 bits indicate how many of the collusion logs the current block contains, and the second part of the block is used to store the collusion logs. The format of the components of the log file of the communication is shown in table 1:
TABLE 1 Log File Format for Union
Figure 107220DEST_PATH_IMAGE001
In this embodiment, the retrieval module analyzes the retrieval command after receiving the retrieval command, starts a retrieval task for each retrieval command, and starts a plurality of threads for each retrieval task to complete together.
Specifically, the retrieval module is in charge of being in butt joint with the client and the storage module, receiving a retrieval command from the client, retrieving from the storage module, arranging the retrieved required communication logs and then sending the required communication logs to the client. In order to fully utilize the improvement of searching performance brought by classified storage and reduce searching range, a complete searching command at least contains source IP address, starting time and ending time.
Specifically, after receiving a retrieval command, a retrieval module analyzes the retrieval command, and starts a retrieval task for each retrieval command, in order to improve retrieval efficiency, each retrieval task can start multiple threads to complete together, but in the process of returning a retrieval result by multiple threads, the following four phenomena exist:
(1) The communication logs returned by the single threads are ordered;
(2) The communication logs returned by the multiple threads are unordered;
(3) The number of the communication logs returned by each thread is uncertain;
(4) The start time and the end time of each thread returning the connection log are uncertain.
In the embodiment, the above phenomena are fully considered, the retrieval module can merge and sort the communication logs returned by the plurality of retrieval threads, the accuracy of the retrieval result is ensured, the parallel processing of a plurality of retrieval tasks can be supported, and the retrieval efficiency is greatly improved.
In this embodiment, the management module is mainly used to manage each storage node in the storage module, perform disk cleaning and fault detection according to a predetermined time, update the state information of the storage node in real time, and notify the storage node information to each module, thereby ensuring consistency of the global state information of the system storage node.
Specifically, in the process of cleaning the disks of the storage nodes, the management module sets corresponding thresholds for the disks of the storage nodes with different performances, and when the utilization rate of the disks exceeds the set threshold, the management module deletes the earliest communication log in the storage nodes according to a preset time unit. In the fault detection process of the storage nodes, the management module monitors each storage node in real time, when the storage nodes cannot provide a disk writing function but can provide a query function, the management module judges that a fault is written, and notifies the distribution module, so that the distribution module dynamically adjusts a distribution algorithm and sends a communication log to a normal storage node; when the storage node can normally provide disk writing operation but cannot provide query function, the storage node is determined to be in read fault, and the management module notifies the distribution module and the retrieval module to enable the distribution module and the retrieval module to perform distribution and retrieval operation on the normal storage node. The storage node state updating and global state pushing process comprises the steps that the storage module reports storage node state information maintained by the storage module to the management module in real time, when the storage node changes the state information, the management module pushes the latest state information to the distribution module and the retrieval module, the distribution module and the retrieval module are adjusted according to the change, the state information of the storage node is updated in real time, the state of the storage node is guaranteed to be known by all the modules, the modules can be coordinated with each other, strategies can be adjusted in time when emergencies occur, and the stability of a system is maintained.
Referring to fig. 4, a normal operation situation in an application scenario where the contents of the logout log do not need to be considered and the processing performance of each storage node is the same is described.
In the application scenario, the processing performance of each storage node is the same, and as the content of the communication logs does not need to be considered, the distribution node distributes the communication logs by adopting a polling scheduling algorithm in combination with the current scenario; after receiving the communication logs, the storage nodes distinguish the communication logs according to domestic and foreign countries, and then establish a file directory according to time for storage; the retrieval server starts multithreading to retrieve in the storage node according to a retrieval command sent by the client, sorts and combines returned results and sends the returned results to the client; in the whole operation process, the management server updates the state information of the storage nodes in real time, and the unification of the global state information of the whole link is ensured.
Referring to fig. 5, a storage node failure situation in an application scenario where the contents of the logout log do not need to be considered and the processing performance of each storage node is the same is described. When a node in the storage nodes fails, fault information is reported to the management server, the management server updates the information and informs the distribution nodes and the retrieval server, the distribution nodes and the retrieval server perform corresponding adjustment after receiving the information, and the distribution nodes avoid the failed storage nodes and send the failure information to other storage nodes when distributing the communication logs; the retrieval server can avoid retrieving in the failed storage node during retrieval, and the system can still stably operate.
In summary, the present invention receives the communication logs on the internet through the distribution module, and issues the communication logs according to the load distribution algorithm; each storage node of the storage module receives the communication logs issued by the distribution module, preprocesses the communication logs, and stores the communication logs in a classified manner according to the IP address attributes; the retrieval module receives a retrieval command from the client, retrieves the command from the storage module, arranges the retrieved required communication logs and sends the ordered communication logs to the client; and managing each storage node in the storage module through a management module, carrying out disk cleaning and fault detection according to preset time, updating the state information of the storage nodes in real time, and informing the distribution module and the retrieval module of the storage node information so as to ensure that the global state information of the storage nodes is consistent. In the aspect of distribution and storage of the communication logs, the communication logs can be flexibly stored in different distribution modes according to different service requirements, and the method is high in storage efficiency and suitable for various service scenes. In the aspect of retrieval of the communication logs, a plurality of tasks can be retrieved simultaneously, each task can start a plurality of threads to finish one retrieval task together, the retrieval efficiency of the communication logs is greatly improved, and meanwhile, the retrieval results returned by the threads and aiming at the same task can be sequenced and combined, so that the accuracy of the retrieval results is ensured. The management module monitors each module, updates the state information of the storage node in real time, ensures that the state of the storage node is known by each module, enables the modules to be coordinated with each other, can adjust the strategy in time when meeting emergencies, and maintains the stability of the system.
It should be noted that the method according to the embodiment of the present application may be implemented in a distributed scenario by cooperation of multiple devices, or may be implemented by a single device, for example, a computer or a server. In the case of such a single device scenario, this is accomplished by the interaction of a plurality of functional modules. In such a distributed scenario, one of the multiple devices may only perform one or more steps of the method of the embodiment, and the multiple devices interact with each other to complete the method.
It should be noted that the above describes some embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Referring to fig. 6, based on the same inventive concept, corresponding to the method of any of the embodiments, the present application further provides a device for storing and retrieving a distributed network connection log, where the method for storing and retrieving a distributed network connection log according to the embodiments or any possible implementation manner includes:
the distribution module 1 is used for receiving the communication logs on the Internet and issuing the communication logs according to a load distribution algorithm;
the storage module 2 is used for receiving the communication logs issued by the distribution module by each storage node of the storage module, preprocessing the communication logs and storing the communication logs in a classified manner according to the IP address attributes;
the retrieval module 3 is used for receiving a retrieval command from the client, retrieving the retrieval command from the storage module, arranging the retrieved required communication logs and then sending the required communication logs to the client;
and the management module 4 is used for managing each storage node in the storage module, performing disk cleaning and fault detection according to preset time, updating the state information of the storage node in real time, and notifying the distribution module and the retrieval module of the storage node information so as to enable the global state information of the storage node to be consistent.
The apparatus of the foregoing embodiment is used to implement the corresponding method for storing and retrieving the logs of the distributed network communications in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to the method of any embodiment described above, the present application further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for storing and retrieving the logs of the distributed network alliance according to any embodiment described above.
Fig. 7 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static Memory device, a dynamic Memory device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component within the device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output devices may include a display, speaker, vibrator, indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (for example, USB, network cable, etc.), and can also realize communication in a wireless mode (for example, mobile network, WIFI, bluetooth, etc.).
The bus 1050 includes a path to transfer information between various components of the device, such as the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device of the foregoing embodiment is used to implement the corresponding method for storing and retrieving the logs of the distributed network communications in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments, the present application further provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the method for storing and retrieving the logs of the distributed network alliance according to any of the above embodiments.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, for storing information may be implemented in any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the foregoing embodiment are used to enable the computer to execute the method for storing and retrieving the logs of the distributed network communications according to any one of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the context of the present application, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present application as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the application. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the application, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the application are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the application, it should be apparent to one skilled in the art that the embodiments of the application can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present application has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.
The present embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present application are intended to be included within the scope of the present application.

Claims (8)

1. A distributed network communication log storage and retrieval method comprises the following steps:
the distribution module receives a communication log on the Internet and issues the communication log according to a load distribution algorithm;
each storage node of the storage module receives the communication logs issued by the distribution module, preprocesses the communication logs, and stores the communication logs in a classified manner according to the IP address attributes;
the retrieval module receives a retrieval command from the client, retrieves the command from the storage module, arranges the retrieved required communication logs and then sends the required communication logs to the client;
managing each storage node in the storage module through a management module, carrying out disk cleaning and fault detection according to preset time, updating the state information of the storage nodes in real time, and notifying the distribution module and the retrieval module of the storage node information so as to enable the global state information of the storage nodes to be consistent;
the distribution module carries out decapsulation operation on the received communication log and reads quintuple information of the communication log;
the distribution module sends the corresponding communication logs to the storage nodes corresponding to the storage module according to a load distribution algorithm;
for the service scenes that the contents of the communication logs do not need to be considered and the processing performance of each storage node is the same, the distribution module supports the adoption of a polling scheduling algorithm, so that the communication logs are completely and evenly distributed to different storage nodes;
for the service scenes that the contents of the communication logs do not need to be considered and the processing performance of each storage node is different, setting a weight for each storage node according to the processing performance of the storage node, and supporting the distribution of the communication logs to each storage node in a polling mode by utilizing a weighted polling scheduling algorithm according to the weight;
for a service scene needing to store the contents of the communication logs in consideration, the distribution module supports the adoption of a hash algorithm based on a source address and a destination address and a hash algorithm based on a quintuple and distributes the communication logs according to the requirements of users.
2. The distributed network communication log storage and retrieval method according to claim 1, wherein the storage nodes establish different storage directories on a disk according to different IP address attributes;
if the classification is carried out according to the geographic position attribute, the communication logs are respectively stored in different subdirectories named by cities;
if the user attributes are classified, the communication logs are stored in subdirectories named by specific IP addresses.
3. The method of claim 1, wherein the search key used by the search module comprises a start time, an end time, a source IP address, a destination IP address, a source port, and a destination port;
each retrieval command contains at least a source IP address and a start time and an end time.
4. The method as claimed in claim 3, wherein the retrieval module analyzes the retrieval command after receiving the retrieval command, and starts a retrieval task for each retrieval command, and each retrieval task starts a plurality of threads to complete together.
5. The distributed network connection log storage and retrieval method according to claim 4, wherein in a disk cleaning process of the storage nodes, the management module sets corresponding thresholds for disks of the storage nodes with different performances, and when the utilization rate of the disks exceeds the set thresholds, the management module deletes the earliest connection log in the storage nodes according to a preset time unit.
6. The method according to claim 5, wherein for the fault detection process of the storage nodes, the management module monitors each storage node in real time, when the storage node cannot provide a disk writing function but can provide an inquiry function, the management module determines that a fault is written, and notifies the distribution module to enable the distribution module to dynamically adjust a distribution algorithm and send the communication logs to normal storage nodes;
when the storage node can normally provide disk writing operation but cannot provide query function, the storage node is judged to be read fault, and the management module informs the distribution module and the retrieval module to enable the distribution module and the retrieval module to perform distribution and retrieval operation on the normal storage node.
7. The method according to claim 6, wherein in the process of updating the state of the storage node and pushing the global state, the storage module reports the state information of the storage node maintained by the storage module to the management module in real time, when the state information of the storage node changes, the management module pushes the latest state information to the distribution module and the retrieval module, and the distribution module and the retrieval module adjust according to the change.
8. A distributed network communication log storage and retrieval device, which adopts the distributed network communication log storage and retrieval method of any one of claims 1 to 7, wherein the method comprises the following steps:
the distribution module is used for receiving the communication logs on the Internet and issuing the communication logs according to a load distribution algorithm;
each storage node of the storage module receives the communication logs issued by the distribution module, preprocesses the communication logs and stores the communication logs in a classified manner according to the IP address attributes;
the retrieval module is used for receiving a retrieval command from the client, retrieving from the storage module, arranging the retrieved required communication logs and then sending the required communication logs to the client;
and the management module is used for managing each storage node in the storage module, performing disk cleaning and fault detection according to preset time, updating the state information of the storage node in real time, and notifying the distribution module and the retrieval module of the storage node information so as to ensure that the global state information of the storage node is consistent.
CN202211402538.6A 2022-11-10 2022-11-10 Distributed network communication log storage and retrieval method and device Active CN115460214B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211402538.6A CN115460214B (en) 2022-11-10 2022-11-10 Distributed network communication log storage and retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211402538.6A CN115460214B (en) 2022-11-10 2022-11-10 Distributed network communication log storage and retrieval method and device

Publications (2)

Publication Number Publication Date
CN115460214A CN115460214A (en) 2022-12-09
CN115460214B true CN115460214B (en) 2023-02-07

Family

ID=84295808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211402538.6A Active CN115460214B (en) 2022-11-10 2022-11-10 Distributed network communication log storage and retrieval method and device

Country Status (1)

Country Link
CN (1) CN115460214B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103532754A (en) * 2013-10-12 2014-01-22 北京首信科技股份有限公司 System and method for high-speed memory and distributed type processing of massive logs
CN104618343A (en) * 2015-01-06 2015-05-13 中国科学院信息工程研究所 Method and system for detecting website threat based on real-time log
CN108874614A (en) * 2017-05-11 2018-11-23 上海宏时数据系统有限公司 A kind of big data log intelligent analysis system and method
CN110442559A (en) * 2019-07-05 2019-11-12 深圳中兴网信科技有限公司 Log searching method, apparatus and server
CN111176932A (en) * 2019-12-13 2020-05-19 苏州浪潮智能科技有限公司 Method and device for recording abnormal event log and readable medium
WO2021000494A1 (en) * 2019-07-04 2021-01-07 平安科技(深圳)有限公司 Blockchain-based operation logging method and apparatus, device, and storage medium
CN113157545A (en) * 2021-05-20 2021-07-23 京东方科技集团股份有限公司 Method, device and equipment for processing service log and storage medium
CN113420032A (en) * 2021-07-20 2021-09-21 奇安信科技集团股份有限公司 Classification storage method and device for logs
CN114020893A (en) * 2021-11-05 2022-02-08 作业帮教育科技(北京)有限公司 Log retrieval method and device based on distributed storage and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019061458A (en) * 2017-09-26 2019-04-18 京セラドキュメントソリューションズ株式会社 Electronic device and log application

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103532754A (en) * 2013-10-12 2014-01-22 北京首信科技股份有限公司 System and method for high-speed memory and distributed type processing of massive logs
CN104618343A (en) * 2015-01-06 2015-05-13 中国科学院信息工程研究所 Method and system for detecting website threat based on real-time log
CN108874614A (en) * 2017-05-11 2018-11-23 上海宏时数据系统有限公司 A kind of big data log intelligent analysis system and method
WO2021000494A1 (en) * 2019-07-04 2021-01-07 平安科技(深圳)有限公司 Blockchain-based operation logging method and apparatus, device, and storage medium
CN110442559A (en) * 2019-07-05 2019-11-12 深圳中兴网信科技有限公司 Log searching method, apparatus and server
CN111176932A (en) * 2019-12-13 2020-05-19 苏州浪潮智能科技有限公司 Method and device for recording abnormal event log and readable medium
CN113157545A (en) * 2021-05-20 2021-07-23 京东方科技集团股份有限公司 Method, device and equipment for processing service log and storage medium
CN113420032A (en) * 2021-07-20 2021-09-21 奇安信科技集团股份有限公司 Classification storage method and device for logs
CN114020893A (en) * 2021-11-05 2022-02-08 作业帮教育科技(北京)有限公司 Log retrieval method and device based on distributed storage and storage medium

Also Published As

Publication number Publication date
CN115460214A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
US10474682B2 (en) Data replication in a clustered computing environment
US9971823B2 (en) Dynamic replica failure detection and healing
US8676951B2 (en) Traffic reduction method for distributed key-value store
US20170220614A1 (en) Consistent ring namespaces facilitating data storage and organization in network infrastructures
US9400800B2 (en) Data transport by named content synchronization
CN106599308B (en) distributed metadata management method and system
US8208477B1 (en) Data-dependent overlay network
EP4310689A1 (en) Data archiving method and apparatus, device, storage medium, and computer program product
EP3779692B1 (en) Blockchain data processing
CN111131079B (en) Policy query method and device
US11599547B2 (en) Data replication and site replication in a clustered computing environment
CN103501319A (en) Low-delay distributed storage system for small files
CN113760847A (en) Log data processing method, device, equipment and storage medium
CN114610680A (en) Method, device and equipment for managing metadata of distributed file system and storage medium
CN107180034A (en) The group system of MySQL database
Qi Digital forensics and NoSQL databases
CN115460214B (en) Distributed network communication log storage and retrieval method and device
CN110633322A (en) Resource information synchronization method and device, electronic equipment and storage medium
CN115914404A (en) Cluster flow management method and device, computer equipment and storage medium
JP6110354B2 (en) Heterogeneous storage server and file storage method thereof
CN109062694B (en) Method for migrating application program to cloud platform
US20150058296A1 (en) Data storage method and computing device using same
US11537559B2 (en) Client generated aggregated indices
US11947822B2 (en) Maintaining a record data structure using page metadata of a bookkeeping page
US20150215404A1 (en) Replication device, replication method, and replication system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant