CN111966736B - High-throughput low-delay large-capacity Flume channel and transmission method thereof - Google Patents

High-throughput low-delay large-capacity Flume channel and transmission method thereof Download PDF

Info

Publication number
CN111966736B
CN111966736B CN202010728788.3A CN202010728788A CN111966736B CN 111966736 B CN111966736 B CN 111966736B CN 202010728788 A CN202010728788 A CN 202010728788A CN 111966736 B CN111966736 B CN 111966736B
Authority
CN
China
Prior art keywords
storage interval
memory
cluster
storage
event packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010728788.3A
Other languages
Chinese (zh)
Other versions
CN111966736A (en
Inventor
胡永泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010728788.3A priority Critical patent/CN111966736B/en
Publication of CN111966736A publication Critical patent/CN111966736A/en
Application granted granted Critical
Publication of CN111966736B publication Critical patent/CN111966736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a high-throughput low-delay large-capacity Flume channel, which comprises a memory, wherein the memory is connected with a detection module; the memory is connected with the cluster through a data pulling module; the cluster and the memory are connected with a data source through a switching port; the transfer port is provided with an arbitration module, and the arbitration module is connected with the detection module. A transmission method of a high-throughput low-latency large-capacity Flume channel comprises the steps of configuring a first storage interval and a second storage interval in a memory; configuring the residual delay time of the event packets, and sequencing the event packets according to the residual delay time; the data source sends the event packet to a transfer port, and the transfer port sends the event packet to a first storage interval, a cluster or a second storage interval; the cluster sends the event packet to a first storage area or a second storage area through a data pulling module; the first storage interval sends the event packet to a second storage interval; and the first storage interval and the second storage interval send event packets to the Sink.

Description

High-throughput low-delay large-capacity Flume channel and transmission method thereof
Technical Field
The invention relates to the technical field of flash data transmission, in particular to a high-throughput low-delay large-capacity flash channel and a transmission method thereof.
Background
Flume is an excellent data collection tool. The framework for acquiring data by the flash comprises a data source, a channel and a Sink, wherein the data source can acquire data from various data sources such as log files, network ports, kafka clusters and the like, is packaged into an Event and is written into the channel. After the data is successfully written into the channel, the Sink actively pulls the data from the channel and writes the data into a plurality of large data assemblies such as HDFS, HBase, hive, ES and the like. The channel is a passive memory and is responsible for temporarily storing data.
The channels in flash include a File channel, a Memory channel, and a Kafka channel, each of which buffers data in a different manner. The Memory channel uses a Memory to cache data, the throughput rate is extremely high, but the Memory space is limited by the constraints of a RAM and a JVM (Java virtual machine), and a large amount of data cannot be cached in a short time; the File channel caches data through a local disk File, in order to ensure the reliability of a storage process, the process of adding a read lock and a write lock to a File is involved in caching the data in batches each time, a lock competition phenomenon occurs, and the data needs to be stored in the local disk, so that the overall throughput of the File channel is low, the local storage space is limited, and the capacity expansion is difficult to a certain extent; the Kafka channel realizes data caching based on a Kafka message queue, and as Kafka components have the characteristics of high throughput and low delay, the Kafka components store data in a cluster mode, the storage capacity can be continuously expanded, but because network communication among clusters is involved in the storage process, the throughput is slightly lower than that of a Memory channel, but is far higher than that of a File channel. One channel can receive data from multiple data sources, and in many scenarios, a very large data flow is assumed. If the stream of data collected by the flash needs high throughput and low delay, and needs a larger and expandable cache space, the existing flash channel cannot meet the requirement.
Disclosure of Invention
The invention provides a Memory-Kafka Channel of flash, and aims to realize that the flash has high throughput and low delay and has larger and expandable cache space.
To achieve the above objects, referring to FIG. 1, the present invention provides a high throughput low latency large capacity Flume channel, comprising
The memory is connected with the detection module;
the memory is connected with the cluster through a data pulling module;
the cluster and the memory are connected with a data source through a switching port;
the transfer port is provided with an arbitration module, and the arbitration module is connected with the detection module.
Preferably, the detection module polls and detects the memory status, the arbitration module obtains the memory status from the detection module, and the arbitration module controls the transit port to communicate the memory and the data source or controls the transit port to communicate the memory and the cluster according to the memory status.
Preferably, the cluster is a kafka cluster, and the cluster is deployed on a storage device.
The invention provides a transmission method of a high-throughput low-delay large-capacity Flume channel, which is applied to the high-throughput low-delay large-capacity Flume channel and comprises the following steps
S100, configuring a first storage interval and a second storage interval in a memory, wherein the first storage interval is correspondingly configured with a first data interface, and the second storage interval is correspondingly provided with a second data interface;
s200, configuring the residual delay time of the event packets, sequencing the event packets in the memory and the cluster according to the residual delay time, and processing the event packets by the first data interface, the second data interface and the data pulling module according to the sequencing;
s300, a data source sends an event packet to a switching port, and the switching port sends the event packet to a first storage interval, a cluster or a second storage interval;
s400, the cluster sends the event packet to a first storage area or a second storage area through a data pulling module;
s500, the first storage interval sends the event packet to a second storage interval;
s600, the first storage interval sends the event packet to the Sink through the first data interface, and the second storage interval sends the event packet to the Sink through the second data interface.
Preferably, S300 comprises the steps of:
s301, configuring a second threshold related to the residual delay time;
s302, the residual delay time of the event packet at the transfer port is smaller than a second threshold value; the arbitration module controls the switching port to preferentially send the event packet to the second storage interval;
s303, the residual delay time of the event packet at the transfer port is greater than or equal to a second threshold, a first threshold related to the memory occupancy rate is configured in the arbitration module, the arbitration module obtains the occupancy rate of a first storage interval measured by the detection module, and the arbitration module compares the first threshold with the occupancy rate of the first storage interval;
s304, the occupancy rate of a first storage interval is smaller than the first threshold value, and the arbitration module controls the transit port to send the event packet to the first storage interval;
s305, the occupancy rate of the first storage interval is greater than or equal to the first threshold, and the arbitration module controls the transit port to send the event packet to the cluster.
Preferably, S400 comprises the steps of:
s401, configuring a third threshold related to the residual delay time;
s402, the remaining delay time of the event packet at the cluster is smaller than a third threshold, and the data pulling module preferentially sends the event packet to the second storage interval;
s403, the remaining delay time of the event packet at the cluster is greater than or equal to a third threshold; and the data pulling module sends the event packet to the first storage interval.
Preferably, S500 comprises the steps of:
s501, configuring a fourth threshold related to the residual delay time;
s502, the remaining delay time of the event packet at the first storage interval is smaller than a fourth threshold, and the event packet is preferentially sent to the second storage interval by the first storage interval;
s503, the remaining delay time of the event packet at the first storage interval is greater than or equal to a fourth threshold value; and queuing the event packets in the first storage interval.
Preferably, the detecting module detects an occupancy rate of a second storage interval, configures a fifth threshold related to a memory occupancy rate, where the occupancy rate of the second storage interval is smaller than the fifth threshold, and executes a sending process of an event packet to be sent to the second storage interval, where the occupancy rate of the second storage interval is greater than or equal to the fifth threshold, the event packets to be sent to the second storage interval in the first storage interval and the cluster are queued in place, and the event packet to be sent to the second storage interval at the transit port is executed according to S304 or S305.
Preferably, the event package is configured with a maximum delay time of the event package, the event package is configured with a timing module, the timing module starts timing through the first data source, and the remaining delay time is obtained by subtracting the time measured by the timing module from the maximum delay time.
The high-throughput low-delay large-capacity Flume channel and the transmission method thereof have the following beneficial effects:
the high-throughput low-delay large-capacity Flume channel is provided with the memory and the cluster, the Sink can quickly fetch data from the memory and has the advantage of high throughput, and the cluster expands the storage space and has the advantage of large capacity; the transmission method of the high-throughput low-delay large-capacity Flume channel orderly processes the event packets by configuring the residual delay time as the index of the event packet processing sequence; and a special second storage interval is opened in the memory for processing the event packet with the temporary residual delay time, so that a channel is opened for the event packet which is temporary but cannot be processed in time by the first storage interval, and the timely transmission of the temporary event packet is effectively ensured. Thereby ensuring a relatively low latency in the delivery of event packets.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a diagram of the architecture of a high throughput low latency large capacity Flume channel in an embodiment of the present invention;
FIG. 2 is a diagram illustrating a transmission of event packets from a data source in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a transmission of a critical event packet from a data source according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another transmission of an event packet from a data source in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a transmission of a critical event packet in a cluster according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating the transmission of a critical event packet within a first storage interval according to an embodiment of the present invention;
FIG. 7 is a flow chart of controlling data transmission of a data source according to remaining delay time in an embodiment of the present invention;
FIG. 8 is a flow chart illustrating controlling data source data transmission according to a first storage interval status according to an embodiment of the present invention;
FIG. 9 is a flow chart illustrating controlling transmission of a critical event packet in a first storage interval according to a remaining delay time according to an embodiment of the present invention;
FIG. 10 is a flow chart of controlling transmission of a contingent event packet within a cluster according to a remaining delay time in an embodiment of the present invention;
fig. 11 is a flowchart illustrating the process of determining whether to transmit the provisional event package according to the embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to FIG. 1, the present invention provides a high throughput low latency large capacity Flume channel comprising
The system comprises a memory, a detection module and a control module, wherein a first storage interval with larger capacity and a second storage interval with smaller capacity are configured in the memory, the first storage interval is configured with a first data interface, the second storage interval is configured with a second concurrent storage interface, and the memory is connected with the detection module; the detection module detects the state of the memory through instruction polling, specifically, the instruction polls and detects the occupancy rate of the first storage interval according to the address of the first storage interval and the use condition, and the instruction polls and detects the occupancy rate of the second storage interval according to the address of the first storage interval and the use condition.
The memory is connected with the cluster through a data pulling module; the cluster is a kafka cluster, and the kafka cluster is deployed on three servers.
The cluster and the memory are connected with a data source through a transfer port;
the switching port is provided with an arbitration module, the arbitration module is connected with the detection module, the arbitration module acquires the memory state from the detection module, and the arbitration module controls the switching port to be communicated with the first storage interval and the data source according to the memory state, or controls the switching port to be communicated with the second storage interval and the data source, or controls the switching port to be communicated with the data source and the cluster.
The invention provides a transmission method of a high-throughput low-delay large-capacity Flume channel, which is applied to the high-throughput low-delay large-capacity Flume channel and mainly comprises the following steps:
s100, configuring a first storage interval and a second storage interval in a memory, where the first storage interval is correspondingly configured with a first data interface, the first data interface extracts an event packet from the first storage interval, the second storage interval is correspondingly configured with a second data interface, and the second data interface extracts an event packet from the second storage interval, where the data interface in fig. 1 includes a first data interface represented by a number and a second data interface represented by a letter.
S200, configuring the residual delay time of the event package, wherein the specific implementation process comprises the steps of configuring the maximum delay time of the event package in the event package, configuring a timing module in the event package, starting timing by the timing module through a first data source, and obtaining the residual delay time by subtracting the time measured by the timing module from the maximum delay time. The event packets are sorted according to the residual delay time in the memory and the cluster, and the lower the residual delay time of the event packets is, the higher the processing priority of the event packets is. Specifically, the first data interface first calls an event packet with a high priority level in a first storage interval to the Sink, the second data interface first calls an event packet with a high priority level in a second storage interval to the Sink, and the data pulling module first calls an event packet with a high priority level in the cluster to the memory.
S300, a data source sends an event packet to a switching port, and the switching port sends the event packet to a first storage interval, a cluster or a second storage interval; in the specific implementation process, referring to fig. 7 and 8, S300 includes the following steps:
s301, configuring a second threshold related to the residual delay time;
s302, when the event packet is at the transfer port, the residual delay time of the event packet is compared with the second threshold, if the residual delay time is less than the second threshold, the event packet is in the temporary period, and the event packet needs to be processed as soon as possible; the arbitration module controls the switching port to preferentially connect a data source with the second storage interval, the data source sends an event packet to the switching port, and the switching port preferentially sends the event packet to the second storage interval; with reference to fig. 11, in a specific implementation process, the detecting module detects an occupancy rate of a second storage interval, the arbitrating module obtains the occupancy rate of the second storage interval, and configures a fifth threshold related to a memory occupancy rate in the arbitrating module in advance, if the occupancy rate of the second storage interval is smaller than the fifth threshold, the transit port is connected to the second storage interval, and the transit port sends an event packet to the second storage interval, if the occupancy rate of the second storage interval is greater than or equal to the fifth threshold, the arbitrating module compares the occupancy rate of the first storage interval with the first threshold, and the occupancy rate of the first storage interval is smaller than the first threshold, the transit port is connected to the first storage interval, and the transit port sends the event packet to the first storage interval; if the occupancy rate of the first storage interval is greater than or equal to a first threshold value, the transit port is connected to the cluster, and the transit port sends the event packet to the cluster.
S303, a first threshold value related to memory occupancy rate is preconfigured in the arbitration module through a configuration file, when an event packet is at a transfer port, the remaining delay time of the event packet is greater than or equal to a second threshold value, the arbitration module obtains the occupancy rate of a first storage interval measured by the detection module, and the arbitration module compares the first threshold value with the occupancy rate of the first storage interval;
s304, if the occupancy rate of a first storage interval is smaller than the first threshold, the arbitration module controls the transit port to connect a data source with the first storage interval, the data source sends an event packet to the transit port, and the transit port sends the event packet to the first storage interval for storage;
s305, if the occupancy rate of the first storage interval is greater than or equal to the first threshold, the arbitration module controls the transit port to connect the data source with the cluster, the data source sends the event packet to the transit port, and the transit port sends the event packet to the cluster for storage.
S400, the cluster sends the event packet to a first storage area or a second storage area through a data pulling module; in a specific implementation process, the data pulling module is provided with a plurality of groups of data pulling threads, part of the data pulling threads are connected to the first storage interval, the other part of the data pulling threads are connected to the second storage interval, and the number of the data pulling module for processing the event packets in parallel is equal to the sum of the number of the first data interface for processing the event packets in parallel and the number of the second data interface for processing the event packets in parallel. Specifically, referring to fig. 10, S400 includes the following steps:
s401, configuring a third threshold related to the residual delay time; the third threshold value is set in consideration of the delay of the cluster to the transmission of the memory network, and the third threshold value is greater than or equal to the second threshold value.
S402, if the remaining delay time of the event packet at the cluster is smaller than a third threshold value, the data pulling thread connected to the second storage interval preferentially sends the event packet to the second storage interval; with reference to fig. 11, in a specific implementation process, the detecting module detects an occupancy rate of a second storage interval, the arbitrating module obtains the occupancy rate of the second storage interval, the arbitrating module sends the occupancy rate of the second storage interval to the cluster through a transit port, a fifth threshold related to the occupancy rate of the storage is configured in the cluster in advance, if the occupancy rate of the second storage interval is smaller than the fifth threshold, a data pull thread connected to the second storage interval sends an event packet to the second storage interval, and if the occupancy rate of the second storage interval is greater than or equal to the fifth threshold, the event packet sent to the second storage interval in the cluster is queued in the cluster until the data pull thread connected to the first storage interval is sent to the first storage interval according to a priority order or until the second storage interval is empty, the data pull thread connected to the second storage interval is sent to the second storage interval.
S403, if the remaining delay time of the event packet at the cluster is greater than or equal to a third threshold value; and the data pulling thread connected to the first storage interval sends the event packet to the first storage interval.
S500, the first storage interval sends the event packet to a second storage interval; specifically, referring to fig. 9, S500 includes the following steps:
s501, configuring a fourth threshold related to the residual delay time;
s502, the remaining delay time of the event packet at the first storage interval is less than a fourth threshold, but the first data interface is full load processing the event packet, and the first storage interval preferentially sends the event packet to the second storage interval; specifically, referring to fig. 11, the memory obtains the occupancy rate of the second storage interval, configures a fifth threshold related to the occupancy rate of the memory in advance, if the occupancy rate of the second storage interval is smaller than the fifth threshold, the first storage interval sends the event packet whose remaining delay time is smaller than the fourth threshold to the second storage interval, and if the occupancy rate of the second storage interval is greater than or equal to the fifth threshold, the event packet that should be sent to the second storage interval in the first storage interval is queued in the first storage interval until the event packet is sent to the Sink by connecting the first data interface according to the priority order or until the second storage interval has a vacancy, the event packet is sent to the second storage interval in the first storage interval.
S503, the remaining delay time of the event packet at the first storage interval is greater than or equal to a fourth threshold value; and queuing the event packets in the first storage interval.
S600, the first storage interval sends the event packet to the Sink through the first data interface, and the second storage interval sends the event packet to the Sink through the second data interface.
The invention provides a high-throughput low-delay large-capacity Flume channel and a transmission method thereof.A first-layer storage interval is a main temporary storage position of an event packet, a first threshold value is an index of the occupancy rate of the first storage interval, the occupancy rate of the first storage interval higher than the first threshold value indicates that the occupancy rate of the first storage interval is high, and further indicates that the data volume is larger, and the occupancy rate of the first storage interval lower than the first threshold value indicates that the occupancy rate of the first storage interval is low, and further indicates that the data volume is smaller. The second threshold is one of the metrics of the remaining delay time of the event packets, the setting of the second threshold is far smaller than the remaining delay time of the event packets at the transfer port, the event packets are mainly distributed to the first storage interval and the cluster through filtering, the second storage interval is relatively idle, and more capacity is ensured to provide a storage space capable of processing the event packets with high priority level at any time. The third threshold is one of the metrics of the remaining delay time of the event packets, the setting of the third threshold is mainly used for distinguishing the event packets which are in the cluster and are in the future, the event packets can exceed the delay time after passing through the cluster to the first storage interval and then passing through the first storage interval to the Sink, the first storage interval is full and cannot process the event packets, and the event packets are controlled by the judgment of the third threshold to be directly transmitted through the second storage interval. The fourth threshold is one of the metrics of the remaining delay time of the event packets, the setting of the fourth threshold is mainly used for distinguishing the event packets which are in the first storage interval and are in the future, the event packets cannot be transmitted because the first data interface is occupied, and the event packets are controlled to be directly transmitted through the second storage interval by judging the fourth threshold. The fifth threshold is a measure of the occupancy rate of the second storage interval, generally, event packages processed in the second storage interval are screened layer by layer, the flow rate of the event packages is relatively low, but in order to avoid introducing the fifth threshold into the second storage interval due to full load caused by a peak value of the flow rate of the event packages, the occupancy rate of the second storage interval is higher than the fifth threshold, and an event package processing method which is to be sent to the second storage interval is set. When the data flow is low, because the capacity of the first storage area is enough to be used, as shown in fig. 2, the data Source (Source) transmits the event packet to the first storage area, and the event packet is transmitted from the first storage area to the Sink; and when the remaining delay time of the event packet at the transit port is less than the second threshold, as shown in fig. 3, the data source transmits the event packet to the second storage area, and the event packet is transmitted to the Sink from the first storage area. When there are many data streams, because the capacity of the first storage area is insufficient, the Sink often needs to transmit the event packet in the first storage area to have capacity, and at this time, referring to fig. 4, the data source transmits the event packet to the cluster, and then the data pull module extracts the event packet from the cluster to the first storage area; if the event packet in the cluster is imminent, that is, the remaining delay time is less than the third threshold, referring to fig. 5, the data pulling module extracts the event packet from the cluster to the second storage area; if the first storage area is temporary, that is, the remaining delay time of the event packet is less than the fourth threshold, and the first data interface is full and cannot process the event packet, referring to fig. 6, the event packet in the first storage area is transferred to the second storage area and transferred to the Sink through the second storage area.
The high-throughput low-delay large-capacity Flume channel is provided with the memory and the cluster, the Sink can quickly fetch data from the memory and has the advantage of high throughput, and the cluster expands the storage space and has the advantage of large capacity; the transmission method of the high-throughput low-delay large-capacity Flume channel orderly processes the event packets by configuring the residual delay time as the index of the event packet processing sequence; and a special second storage interval is opened up in the memory for processing the event packet with the temporary residual delay time, so that a channel is opened for the event packet which is not processed by the first storage interval in time in the temporary period, and the timely transmission of the temporary event packet is effectively ensured. Thereby ensuring a relatively low latency in the delivery of event packets.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (6)

1. A high throughput low latency large capacity Flume channel is characterized by comprising
The system comprises a memory, a first data interface and a second concurrent storage interface, wherein a first storage interval with larger capacity and a second storage interval with smaller capacity are configured in the memory, the first storage interval is configured with the first data interface, and the second storage interval is configured with the second concurrent storage interface;
the memory is connected with a detection module; the detection module detects the memory state through instruction polling, wherein the memory state comprises the occupancy rates of a first storage interval and a second storage interval;
the memory is connected with the cluster through a data pulling module;
the cluster and the memory are connected with a data source through a switching port;
the switching port is provided with an arbitration module, and the arbitration module is connected with the detection module; the arbitration module acquires the memory state from the detection module, and controls the switching port to communicate with the first storage interval and the data source, or controls the switching port to communicate with the second storage interval and the data source, or controls the switching port to communicate with the data source and the cluster according to the memory state.
2. The high-throughput low-latency high-capacity Flume channel according to claim 1, wherein the detection module polls and detects a memory status, the arbitration module obtains the memory status from the detection module, and the arbitration module controls the switch port to communicate the memory and the data source or controls the switch port to communicate the memory and the cluster according to the memory status.
3. The high-throughput low-latency high-capacity Flume channel according to claim 1, wherein the cluster is a kafka cluster, and the cluster is deployed in a storage device.
4. A transmission method of a high-throughput low-latency large-capacity Flume channel, which is applied to the high-throughput low-latency large-capacity Flume channel as claimed in claim 1, is characterized by comprising the following steps
S100, configuring a first storage interval and a second storage interval in a memory, wherein the first storage interval is correspondingly configured with a first data interface, and the second storage interval is correspondingly provided with a second data interface;
s200, configuring the residual delay time of the event packets, sequencing the event packets in the memory and the cluster according to the residual delay time, and processing the event packets by the first data interface, the second data interface and the data pulling module according to the sequencing;
s300, a data source sends an event packet to a switching port, and the switching port sends the event packet to a first storage interval, a cluster or a second storage interval;
s400, the cluster sends the event packet to a first storage area or a second storage area through a data pulling module;
s500, the first storage interval sends the event packet to a second storage interval;
s600, the first storage interval sends the event packet to the Sink through the first data interface, and the second storage interval sends the event packet to the Sink through the second data interface.
5. The transmission method of high-throughput low-latency large-capacity Flume channel according to claim 4, wherein S300 comprises the following steps:
s301, configuring a second threshold related to the residual delay time;
s302, the residual delay time of the event packet at the transfer port is smaller than a second threshold value; the arbitration module controls the switching port to preferentially send the event packet to the second storage interval;
s303, the residual delay time of the event packet at the transfer port is greater than or equal to a second threshold, a first threshold related to the memory occupancy rate is configured in the arbitration module, the arbitration module obtains the occupancy rate of a first storage interval measured by the detection module, and the arbitration module compares the first threshold with the occupancy rate of the first storage interval;
s304, the occupancy rate of a first storage interval is smaller than the first threshold value, and the arbitration module controls the transit port to send the event packet to the first storage interval;
s305, if the occupancy rate of the first storage interval is greater than or equal to the first threshold value, the arbitration module controls the transit port to send the event packet to the cluster.
6. The transmission method of high-throughput low-latency large-capacity Flume channel according to claim 5, wherein S400 comprises the following steps:
s401, configuring a third threshold related to the residual delay time;
s402, the remaining delay time of the event packet at the cluster is smaller than a third threshold, and the data pulling module preferentially sends the event packet to the second storage interval;
s403, the remaining delay time of the event packet at the cluster is greater than or equal to a third threshold; and the data pulling module sends the event packet to the first storage interval.
CN202010728788.3A 2020-07-27 2020-07-27 High-throughput low-delay large-capacity Flume channel and transmission method thereof Active CN111966736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010728788.3A CN111966736B (en) 2020-07-27 2020-07-27 High-throughput low-delay large-capacity Flume channel and transmission method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010728788.3A CN111966736B (en) 2020-07-27 2020-07-27 High-throughput low-delay large-capacity Flume channel and transmission method thereof

Publications (2)

Publication Number Publication Date
CN111966736A CN111966736A (en) 2020-11-20
CN111966736B true CN111966736B (en) 2022-12-09

Family

ID=73362996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010728788.3A Active CN111966736B (en) 2020-07-27 2020-07-27 High-throughput low-delay large-capacity Flume channel and transmission method thereof

Country Status (1)

Country Link
CN (1) CN111966736B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116366519A (en) * 2021-12-28 2023-06-30 北京灵汐科技有限公司 Route transmission method, route control method, event processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073349A (en) * 2016-11-08 2018-05-25 北京国双科技有限公司 The transmission method and device of data
US20180329600A1 (en) * 2016-03-02 2018-11-15 Tencent Technology (Shenzhen) Company Limited Data processing method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180329600A1 (en) * 2016-03-02 2018-11-15 Tencent Technology (Shenzhen) Company Limited Data processing method and apparatus
CN108073349A (en) * 2016-11-08 2018-05-25 北京国双科技有限公司 The transmission method and device of data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Dynamic Load Balancing and Channel Strategy for Apache Flume Collecting Real-Time Data Stream》;Buqing Shu 等;《2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)》;20171231;全文 *
《基于Flume的MySQL数据自动收集系统》;于金良 等;《计算机技术与发展》;20161231;全文 *

Also Published As

Publication number Publication date
CN111966736A (en) 2020-11-20

Similar Documents

Publication Publication Date Title
US9444740B2 (en) Router, method for controlling router, and program
US9426099B2 (en) Router, method for controlling router, and program
US7227841B2 (en) Packet input thresholding for resource distribution in a network switch
CA2329542C (en) System and method for scheduling message transmission and processing in a digital data network
US8135004B2 (en) Multi-plane cell switch fabric system
US8601181B2 (en) System and method for read data buffering wherein an arbitration policy determines whether internal or external buffers are given preference
US7729258B2 (en) Switching device
US7406041B2 (en) System and method for late-dropping packets in a network switch
US20120072635A1 (en) Relay device
US7613849B2 (en) Integrated circuit and method for transaction abortion
US20230164078A1 (en) Congestion Control Method and Apparatus
JP2002512460A (en) System and method for regulating message flow in a digital data network
CN107770090B (en) Method and apparatus for controlling registers in a pipeline
CN114521253B (en) Dual-layer deterministic interprocess communication scheduler for input-output deterministic in solid-state drives
CN114257559B (en) Data message forwarding method and device
JP2007524917A (en) System and method for selectively influencing data flow to and from a memory device
CN111966736B (en) High-throughput low-delay large-capacity Flume channel and transmission method thereof
CN114531488A (en) High-efficiency cache management system facing Ethernet exchanger
CN106911740A (en) A kind of method and apparatus of cache management
CN107483405B (en) scheduling method and scheduling system for supporting variable length cells
EP3487132B1 (en) Packet processing method and router
CN114363269A (en) Message transmission method, system, equipment and medium
CN114401235B (en) Method, system, medium, equipment and application for processing heavy load in queue management
CN115955441A (en) Management scheduling method and device based on TSN queue
CN112559400B (en) Multi-stage scheduling device, method, network chip and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant