CN113672169A - Data reading and writing method of stream processing system and stream processing system - Google Patents

Data reading and writing method of stream processing system and stream processing system Download PDF

Info

Publication number
CN113672169A
CN113672169A CN202110813467.8A CN202110813467A CN113672169A CN 113672169 A CN113672169 A CN 113672169A CN 202110813467 A CN202110813467 A CN 202110813467A CN 113672169 A CN113672169 A CN 113672169A
Authority
CN
China
Prior art keywords
data
read
write
broker
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110813467.8A
Other languages
Chinese (zh)
Inventor
阮良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202110813467.8A priority Critical patent/CN113672169A/en
Publication of CN113672169A publication Critical patent/CN113672169A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0626Reducing size or complexity of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • G06F3/0665Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The present application relates to a data read-write method for a stream processing system and a stream processing system provided in this embodiment. Wherein, the method comprises the following steps: a Broker of the stream processing system receives a read-write data request; and responding to the read-write data request, and the Broker performs the read-write operation of the data through the read-write cache, wherein the read-write cache is a preset memory space applied by the Broker from the off-heap memory. By the method and the device, the problem of low stream processing speed is solved, and the stream processing speed is improved.

Description

Data reading and writing method of stream processing system and stream processing system
Technical Field
The present application relates to the field of stream processing system technologies, and in particular, to a data read/write method for a stream processing system and a stream processing system.
Background
The stream processing system is an important link in modern big data systems and is responsible for storing and calculating data, such as Kafka, Pulsar and the like.
The stream processing system consists of a storage part and a calculation part, a producer continuously pushes data to the stream processing system, corresponding calculation tasks are continuously carried out on the data in the stream processing system, and results are output to a specified external system for summarizing and displaying.
In the current scheme of the stream processing system, a Broker receives a request for writing data and directly writes the data into an operating system Page Cache (Page Cache), waits for the timing disk-brushing action of an OS to write a disk, finds the data from the operating system Page Cache first when receiving the request for reading the data and directly returns the data if the data is not eliminated, so that the action of reading the disk is avoided; if the operating system page cache does not have the same logic that the disk needs to be read, and the copy synchronization between the brokers of the cluster also goes, a consumption delay situation can be encountered in an actual production link, when a consumer requests, the consumed data (which is not in the operating system page cache) which is very early before can trigger the read action of the disk, and some data in the operating system page cache is actively eliminated. In addition, since the page cache of the operating system is managed by the OS, the behavior of the OS is not controlled by the Broker; and when the cache pressure of the page cache of the operating system is large, other processes except the Broker in the OS can cause the memory competition to be aggravated, or stream data in the page cache of the operating system is polluted by data of other processes, so that the number of times of reading the disk is increased, and the stream processing speed is seriously affected.
In order to solve the above problems, in the current cloud native environment, a kubernets-based deployment scheme is used, in order to ensure that the performance of the stream processing system, a Broker monopolizes a physical machine, that is, the Broker basically monopolizes an operating system page cache, so that competition with other processes for a memory is avoided, but hardware cost of infrastructure is increased.
Aiming at the problem of low stream processing speed in the related art, no effective solution is provided at present.
Disclosure of Invention
The embodiment provides a data read-write method of a stream processing system and the stream processing system, and aims to solve the problem of low stream processing speed in the related art in a Kubernetes-based deployment scheme.
In a first aspect, in this embodiment, a data reading and writing method for a stream processing system is provided, including:
a Broker of the stream processing system receives a read-write data request;
and responding to the read-write data request, and the Broker performs read-write operation of data through a read-write cache, wherein the read-write cache is a preset memory space applied by the Broker from an off-heap memory.
In some embodiments, when the Broker receives a read data request, the Broker queries whether first data exists in the read-write cache, where the first data is data requested to be read by the read data request;
if so, the Broker reads the first data from the read-write cache;
otherwise, the Broker reads the first data from the disk.
In some embodiments, the Broker reading the first data from the disk comprises:
when the first data is the cold data, the Broker writes the first data into the read-write cache from an HDD through an operating system page cache, and when the first data is the hot data, the Broker writes the first data into the read-write cache from the SSD;
and the Broker reads the first data from the read-write buffer.
In some of the embodiments described herein, the first and second,
when the Broker receives a write data request, the Broker writes second data into the read-write cache, wherein the second data is the data requested to be written by the write data request.
In some embodiments, after the Broker writes the second data to the read-write cache, the method further comprises:
when the second data is the cold data, the Broker writes the second data from the read-write cache to an HDD through an operating system page cache;
and when the second data is the hotspot data, the Broker writes the second data from the read-write cache to the SSD.
In some of these embodiments, the method further comprises:
and under the condition that the residual capacity of the read-write cache is smaller than a first preset threshold value, the Broker transfers third data to a magnetic disk, wherein the third data is the earliest written data in the data currently stored in the read-write cache.
In some of these embodiments, the rewriting the third data to disk includes:
when the third data is hot data, the Broker forwards the third data from the read-write cache to the SSD;
and when the third data is cold data, the Broker transfers the third data from the read-write cache to the HDD through an operating system page cache.
In some of these embodiments, the method further comprises:
when the Broker is used for transferring the third data from the read-write cache to the SSD, and when the SSD has insufficient memory, the Broker deletes fourth data from the SSD, wherein the fourth data is the data which is written in the SSD earliest in the data currently stored by the SSD.
In some of these embodiments, the method further comprises:
the Broker acquires configuration information;
and the Broker judges whether the currently read and written data is hot data or cold data according to the configuration information.
In some of these embodiments, the method further comprises:
the Broker judges whether the current read-write data is cold data or hot data according to the statistical information of the read-write frequency of the read-write data in a preset time period;
when the read-write frequency of the currently read-write data in the preset time period exceeds a second preset threshold, determining that the currently read-write data is hot data; otherwise, determining the current read-write data as cold data.
In some embodiments, in response to the read-write data request, the Broker performs a read-write operation on data through a read-write cache, including:
the Broker judges whether the read-write data request is a read-write data request of an external service;
under the condition that the read-write data request is judged not to be a read-write data request of an external service, the Broker performs read-write operation of read-write data through an operating system page cache and a magnetic disk;
and under the condition that the read-write data request is judged to be a read-write data request of an external service, the Broker performs read-write operation of read-write data through the read-write cache.
In a second aspect, there is provided in this embodiment a stream processing system comprising a producer, a consumer, and a Broker; wherein the producer is configured to produce data and write the produced data into the Broker, and the consumer is configured to consume the data stored in the Broker, and the Broker includes a memory and a processor, and the processor is configured to execute the computer program to implement the data reading and writing method of the stream processing system according to the first aspect.
Compared with the related art, the data reading and writing method of the stream processing system and the stream processing system provided in the embodiment receive the data reading and writing request through the Broker of the stream processing system; and responding to the data reading and writing request, and the Broker performs data reading and writing operation through the reading and writing cache, wherein the reading and writing cache is a mode of a preset memory space applied by the Broker from an off-heap memory, the problem of low stream processing speed is solved, and the stream processing speed is improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a data read/write method of the stream processing system of the present embodiment.
Fig. 2 is a schematic diagram of a data read/write method of the stream processing system according to an embodiment.
Fig. 3 is a flowchart of a data read/write method of the stream processing system of one preferred embodiment.
Detailed Description
For a clearer understanding of the objects, aspects and advantages of the present application, reference is made to the following description and accompanying drawings.
Unless defined otherwise, technical or scientific terms used herein shall have the same general meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of this application do not denote a limitation of quantity, either in the singular or the plural. The terms "comprises," "comprising," "has," "having," and any variations thereof, as referred to in this application, are intended to cover non-exclusive inclusions; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or modules, but may include other steps or modules (elements) not listed or inherent to such process, method, article, or apparatus. Reference throughout this application to "connected," "coupled," and the like is not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. In general, the character "/" indicates a relationship in which the objects associated before and after are an "or". The terms "first," "second," "third," and the like in this application are used for distinguishing between similar items and not necessarily for describing a particular sequential or chronological order.
In this embodiment, a data read-write method for a stream processing system is provided, such as a Kafka stream processing platform and a Pulsar stream processing platform, where Kafka is a high-throughput distributed publish-subscribe message system, and can process all action stream data of a consumer in a website. This action (web browsing, searching and other user actions) is a key factor in many social functions on modern networks. These data are typically addressed by handling logs and log aggregations due to throughput requirements. This is a viable solution to the limitations of Hadoop-like log data and offline analysis systems, but which require real-time processing. The purpose of Kafka is to unify online and offline message processing through the parallel loading mechanism of Hadoop, and also to provide real-time messages through clustering.
Most stream systems in the related art directly utilize the page cache of the operating system as the data cache, and the processing process of the stream system in the related art is as follows: when the Broker receives a request for writing data, the data can be directly written into a page cache of an operating system of a page cache operating system, and the data is written into a disk by waiting for a timing disk refreshing action of an operating system OS; when a Broker receives a request for reading data, a stream system processing process firstly searches from a page cache of an operating system of a page cache operating system, if the searched data in the page cache of the operating system of the page cache operating system is not eliminated, the data are directly returned to the stream system for processing, and the action of reading a disk is avoided; however, if the page cache of the operating system does not find the data to be read, the reading action of the disk is triggered, and some data are actively eliminated; because the OS-level cache application is not controllable, memory contention is aggravated after the stress comes, which affects the processing speed of the streaming system, and further causes data in the page cache of the operating system to be contaminated, and increases the number of times of disk reading, so the streaming system in the related art has a slower production and consumption speed. And the synchronization of the copies among the brokers of the cluster also adopts the same logic as the Broker receives the request for reading data.
Fig. 1 is a flowchart of a data read-write method of a stream processing system in this embodiment, and as shown in fig. 1, the data read-write method of the stream processing system includes the following steps:
step S101: the Broker of the streaming system receives the read-write data request.
The data reading request refers to that a consumer sends a data reading instruction to the stream processing system in an actual production consumption link. The data writing request refers to that a producer sends a command for writing data to the stream processing system in an actual production consumption link. The process of reading data refers to a process that a consumer reads data from a read-write cache or a disk and returns the data read from the read-write cache or the disk to the consumer; the process of writing data refers to the process of writing data to a read-write cache or a disk by a producer.
Step S102: and responding to the read-write data request, and the Broker performs the read-write operation of the data through the read-write cache, wherein the read-write cache is a preset memory space applied by the Broker from the off-heap memory.
The Kafka cluster comprises one or more servers, which are called brokers, wherein when the brokers receive read-write data requests, the brokers respond to the read-write data requests, and the brokers perform read-write operations on data through read-write caches. The read-write cache is a preset memory space applied by the Broker from an off-heap memory, wherein the off-heap memory is a memory for distributing memory objects outside a heap of the Java virtual machine.
Through the steps, the data reading and writing method Broker of the stream processing system applies for the preset memory space in the off-heap memory for the reading and writing cache of the data, and the reading and writing cache is managed by the Broker, so that the stream processing speed is improved under the condition that the Broker does not need to monopolize a physical machine. Compared with the prior art, the invention maintains the cache read-write cache by itself, does not use the page cache of the operating system, and is more controllable.
In some embodiments, when the Broker receives a read data request, the Broker queries whether first data exists in the read-write cache, where the first data is data requested to be read by the read data request; if so, reading first data from the read-write cache by the Broker; otherwise, the Broker reads the first data from the disk.
For example, when the Broker receives a read data request, the Broker first queries whether data requested to be read by the read data request exists in the read-write cache, where this embodiment refers to the data requested to be read by the read data request as first data, if the first data exists in the read-write cache, the Broker reads the first data from the read-write cache, and if the first data does not exist in the read-write cache, the Broker reads the first data from the disk.
In some of these embodiments, the Broker reading the first data from the disk includes: when the first data are cold data, the Broker writes the first data into the read-write cache from the HDD through the page cache of the operating system, and when the first data are hot data, the Broker writes the first data into the read-write cache from the SSD; the Broker reads the first data from the read-write buffer.
For example, when the Broker reads first data from the disk, the Broker first determines whether the first data is cold data or hot data, and if the first data is cold data, the Broker writes the first data into the read-write cache from the HDD through the page cache of the operating system, and then reads the first data from the read-write cache; if the first data are hot data, the Broker writes the first data into the read-write cache from the SSD, and then reads the first data from the read-write cache;
in some embodiments, when the Broker receives the write data request, the Broker writes second data into the read-write cache, where the second data is the data requested to be written by the write data request.
For example, when the Broker receives a write data request, the Broker writes the data requested to be written by the write data request into the read-write cache, and the data requested to be written by the write data request is referred to as second data in this embodiment.
In some embodiments, after the Broker writes the second data into the read-write cache, the data read-write method of the stream processing system further includes the following steps: when the second data is cold data, the Broker writes the second data into the HDD from the read-write cache through the page cache of the operating system; and when the second data is the hot spot data, the Broker writes the second data into the SSD from the read-write cache.
For example, after the Broker writes the second data into the read-write cache, the Broker further needs to determine whether the second data is hot data or cold data, if the second data is cold data, the Broker writes the second data into the HDD from the read-write cache through the operating system page cache, and if the second data is hot data, the Broker writes the second data into the SSD from the read-write cache.
In some embodiments, the data reading and writing method of the stream processing system further includes: and under the condition that the residual capacity of the read-write cache is smaller than a first preset threshold value, the Broker stores third data to the magnetic disk, wherein the third data is the earliest written data in the data currently stored in the read-write cache.
For example, a preset threshold is set for the remaining capacity of the read/write cache, in this embodiment, the preset threshold is referred to as a first preset threshold, and when the remaining capacity of the read/write cache is smaller than the first preset threshold, the Broker forwards the data written earliest in the data currently stored in the read/write cache to the disk, in this embodiment, the data written earliest in the data currently stored in the read/write cache is referred to as a third data.
In addition, when the remaining capacity of the read-write cache is smaller than the first preset threshold, the read-write cache determines to transfer the third data to the disk, that is, when an abnormal condition occurs, and when the capacity of the read-write cache is insufficient, the Broker transfers the third data to the disk.
In some of these embodiments, the transferring the third data to the disk comprises:
when the third data is the hot spot data, the Broker stores the third data from the read-write cache to the SSD; and when the third data is cold data, the Broker transfers the third data from the read-write cache to the HDD through the page cache of the operating system.
For example, the transferring the third data to the disk includes: the Broker also needs to judge whether the third data is hot data or cold data, and if the third data is the hot data, the Broker stores the third data from the read-write cache to the SSD; and if the third data is cold data, the Broker transfers the third data from the read-write cache to the HDD through the page cache of the operating system.
In some embodiments, the data reading and writing method of the stream processing system further includes: when the Broker stores the third data from the read-write cache to the SSD in a transferring manner, and when the SSD has insufficient memory, the Broker deletes the fourth data from the SSD, wherein the fourth data is the data which is written in the SSD earliest in the data currently stored in the SSD.
For example, when the Broker unloads the third data from the read-write cache to the SSD, the Broker deletes the earliest written data of the data currently stored by the SSD from the SSD when the remaining capacity of the SSD is insufficient, wherein the earliest written data of the data currently stored by the SSD is referred to as fourth data in the present embodiment.
In some embodiments, the data reading and writing method of the stream processing system further includes: the Broker acquires configuration information; and the Broker judges whether the currently read and written data is hot data or cold data according to the configuration information.
For example, the Broker acquires configuration information; the Broker judges whether the currently read and written data is hot data or cold data according to the configuration information.
In some embodiments, the data reading and writing method of the stream processing system further includes: judging whether the current read-write data is cold data or hot data by the Broker according to the statistical information of the read-write frequency of the read-write data in a preset time period; when the read-write frequency of the currently read-write data in a preset time period exceeds a second preset threshold, determining the currently read-write data as hot data; otherwise, determining the current read-write data as cold data.
For example, in a preset time period, the Broker performs statistics on the read-write frequency of the read-write data, and judges whether the currently read-write data is cold data or hot data according to the statistical information. In this embodiment, a preset threshold may be set for the read-write frequency of the data in the preset time period, where this preset threshold is referred to as a second preset threshold, and if the read-write frequency of the currently read-write data in the preset time period exceeds the second preset threshold, the currently read-write data is determined to be hot data; otherwise, determining the current read-write data as cold data.
For example, within half an hour after the Broker is started, the read-write frequency of the currently read-write data is counted, the score is given according to the read-write frequency of the latest 10min, and the Top n data is determined as thermal data.
In some embodiments, in response to the read-write data request, the Broker performs a read-write operation on the data through the read-write cache, including: the Broker judges whether the read-write data request is a read-write data request of an external service; under the condition that the read-write data request is judged not to be the read-write data request of the external service, the Broker performs read-write operation of the read-write data through the page cache of the operating system and the disk; and under the condition that the read-write data request is judged to be the read-write data request of the external service, the Broker performs the read-write operation of the read-write data through the read-write cache.
For example, when a Broker of the stream processing system receives a read-write data request, the Broker responds to the read-write data request, and before the Broker performs a read-write operation on data through a read-write cache, the Broker needs to determine whether the read-write data request is a read-write data request of an external service; external traffic here refers to production and consumption traffic.
And if the read-write data request is the read-write data request of the external service, the Broker performs read-write operation of the read-write data through the read-write cache. If the read-write data request is not the read-write data request of the external service, the Broker performs the read-write operation of the read-write data through the page cache of the operating system and the disk, that is, the Broker restricts the data which is not the external service from entering the read-write cache, and reserves all the space of the read-write cache for the external service, so as to ensure that the read-write cache is completely used by the external service, that is, all the space of the read-write cache is used for producing the consumption service.
Also provided in this embodiment is a stream processing system comprising a producer, a consumer, and a Broker; wherein the producer is configured to produce data and write the produced data to the Broker and the consumer is configured to consume data stored by the Broker, the Broker comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Fig. 2 is a flow chart of a data reading and writing method of the stream processing system according to an embodiment, and as shown in fig. 2, for example, the Kafka stream processing system may include a Broker or a stream processing cluster formed by multiple brokers. Each Broker is deployed in a virtual machine (JVM), and applies for an in-heap memory with a preset size in the virtual machine as a read-write cache.
In some embodiments, each Broker may be configured with a SSD. Broker in different virtual machines can share the SSD and HDD of the same physical machine.
Fig. 3 is a flow chart of a data read-write method of a stream processing system according to a preferred embodiment, as shown in fig. 3, after a Broker receives a data write request from a producer, the Broker writes data into a read-write cache, a background may have a thread to periodically perform a disk-flushing action, write cold data into an operating system page cache, and finally write data into an HDD disk by an OS, where hot data is directly persisted to an SSD without passing through the operating system page cache;
in the related art, data is uniformly written into an operating system page cache of the operating system, the operating system determines a disk refreshing action, and the data is persisted to a disk.
Compared with the prior art, the data are written into the read-write cache maintained by the Broker, the background program determines the disc brushing action, the Broker has the capacity of judging cold and hot data, the cold data and the hot data are respectively persisted to discs with different media, the follow-up hot data is effectively ensured to be accessed quickly, and the speed of the stream processing system is improved.
After receiving a data reading request of a consumer, the Broker searches from the read-write cache firstly, and if the data requested to be read by the consumer is stored in the read-write cache, the Broker directly returns the data requested to be read by the consumer to the consumer; if the read-write cache does not have the data requested to be read by the consumer, the Broker triggers a disk reading mechanism, the cold data Broker writes the data requested to be read by the consumer into the read-write cache firstly, then returns the data requested to be read by the consumer to the consumer from the read-write cache, and for the hot data, the Broker directly writes the data requested to be read by the consumer into the read-write cache in the SSD according to the index information, and then returns the data requested to be read by the consumer to the consumer from the read-write cache.
In the related technology, when the Broker receives a request for reading data, the Broker firstly finds the data from the page cache of the operating system, and if the data in the page cache of the operating system is not eliminated, the Broker directly returns the data to a consumer, so that the action of reading a disk is avoided; however, if the operating system page cache does not have the data, the disk needs to be read, the operating system page cache can actively eliminate some data, because the behavior of the OS is uncontrollable, the data in the operating system page cache can be polluted, the number of times of reading the disk is increased, and after the disk finds the data, the data is returned to the consumer, which seriously affects the speed of production and consumption.
Compared with the prior art, the Broker is searched from the read-write cache maintained by the Broker, the background program determines the disk brushing action, and the Broker has the capacity of judging cold and hot data, so that the quick access of subsequent hot data is effectively ensured, and the speed of the stream processing system is improved.
After the Broker receives a request of synchronizing data of the copy Broker from the main Broker, the Broker writes the data of the main Broker into a read-write cache of the copy Broker, a thread can execute a disk brushing action periodically in a background, cold data is written into an operating system page cache, and finally the data is written into an HDD disk by an OS (operating system) and directly persists hot spot data to an SSD (solid state disk) without going through the operating system page cache;
the Broker maintains the cache read-write cache by itself, does not use the operating system page cache of the operating system, is more controllable compared with the page cache of the operating system of the related art,
in the related art, when a Broker processes a data synchronization request of a Follower, data read from a disk is reloaded into an operating system page cache of an operating system, which causes the operating system page cache of the operating system to be polluted, so that real data consumed by a consumer is replaced, the number of times of reading the disk is increased, and the production and consumption business speed is greatly influenced in a scene with high read-write pressure.
The present invention increases the speed of production and consumption in a stream processing system compared to the related art.
The Broker strictly maintains the read-write cache of the memory space outside the heap according to the time sequence, when the read-write cache reaches the elimination condition, the Broker triggers an asynchronous elimination strategy LRU, hot data is stored in an SSD disk, and cold data is persisted to an HDD through an operating system page cache.
In addition, the invention adopts the out-of-pile memory to store the read-write cache, so that the pressure of the GC can be effectively reduced; because the Broker is provided with the SSD disk, when the read-write cache meets the elimination condition, the Broker triggers the asynchronous elimination strategy LRU to directly store the hot data to the SSD disk without the page cache of the operating system, and the cold data is persisted to the HDD through the page cache of the operating system, so that the quick reading and persistence of the hot data can be effectively ensured, and the response speed of production and consumption business is effectively ensured. Because the dependence on the page cache of the operating system is reduced, the hybrid deployment based on Kubernetes can be met, and the hardware cost of the server is reduced.
In some of these embodiments, the eviction condition is that the read-write cache reaches a threshold.
The read-write cache maintained by the application program adopts a cache elimination algorithm of the LRU, so that the data which is accessed least frequently can be replaced, the competition relation with other programs on an operating system can not be generated, and the performance of the stream processing system can be effectively ensured.
LRU is an abbreviation of Least Recently Used, and is a commonly Used data replacement algorithm that selects data that has not been Used Recently for elimination. The algorithm gives each data an access field for recording the time t elapsed since the data was last read and written, and when a data is to be eliminated, the data with the largest t value in the existing data, namely the data which is least used recently, is selected for elimination.
In the related art, the elimination policy application program of the operating system is not controllable, the cache of the operating system is directly used, all processes on the physical machine share the memory, various problems of intense competition exist, and the performance of the processing system cannot be effectively processed.
In some embodiments, the eviction condition is that insufficient read-write buffering occurs when the Broker processes the production request. When the read-write cache is insufficient, an elimination strategy is actively triggered, namely the oldest data are removed and persisted to a corresponding disk.
In some of these embodiments, the condition of elimination when the Broker processes the production request is that SSD disk data is not sufficient. When the SSD disk data is not enough, an elimination mechanism is actively triggered, the oldest data are also eliminated, and it is ensured that new hot data can be durably transferred to the SSD.
In some of these embodiments, the data may be designated as hot spot data before the Broker process is initiated. The cold data and the hot data are distinguished, the cold data and the hot data are stored in a grading mode, and the reading speed of the hot data is effectively guaranteed. If the service is not appointed to the hotspot data before starting, the Broker process can have a self-adaptive process after the Broker process is started, and the hotspot data is determined according to the service reading and writing frequency within a period of time. Whereas the related art does not support distinguishing between cold data and hot data.
The related art has a requirement on the page cache of the operating system of the physical machine, and generally, in order to ensure the quality of service, one physical machine is monopolized, or the page cache of the operating system is deployed together with other services which do not occupy the page cache of the operating system. Compared with the related technology, the embodiment can solve the problems of consumption lag, page cache pollution of an operating system and influence on the production consumption speed, and improve the stability of the system. By introducing the SSD disk, hybrid deployment based on Kubernets can be met, and hardware cost of the server is saved. Because the embodiment does not rely on the page cache of the operating system, the application program maintains the memory, the service quality can be ensured, and meanwhile, the mixed deployment with other services can be realized, and the hardware cost is saved.
In addition, the Broker takes the off-heap memory as read-write cache, so that the pressure of memory garbage recovery (GC) can be obviously reduced; the data synchronously pulled by the Follower is controlled to enter the read-write cache, so that the effective space of the cache can be increased, and the disk reading times can be reduced; the data are eliminated strictly according to the time sequence, consumers with different hysteresis degrees can consume at the same time, the number of times of reading the disk can be reduced to the maximum extent, the consumption speed is improved, meanwhile, the speed of a producer is not greatly influenced, the SSD is introduced to separately persist the cold data and the hot data, the hot data can be effectively read and persisted, and the response speed of production and consumption services can be effectively guaranteed. And because the embodiment reduces the dependence on the page cache of the operating system, the hybrid deployment based on Kubernetes can be satisfied.
In the related art, the heap memory applied by the conventional JVM system is managed by the JVM, which involves a problem of GC (memory reclamation), resulting in a service suspension and an impact on the quality of service. Compared with the related technology, the invention adopts the out-of-heap memory as the cache, and the JVM is not responsible for garbage collection any more, thereby effectively avoiding the problem of service pause.
There is also provided in this embodiment an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, the processor may be configured to execute the following steps by a computer program:
step 1, receiving a read-write data request;
and step 2, responding to the data reading and writing request, and performing data reading and writing operation by the Broker through a reading and writing cache, wherein the reading and writing cache is a preset memory space applied by the Broker from an off-heap memory.
It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementations, and details are not described again in this embodiment.
In addition, in combination with the data reading and writing method of the stream processing system provided in the foregoing embodiment, a storage medium may also be provided in this embodiment to implement the method. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements the data reading and writing method of any of the stream processing systems in the above embodiments.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be derived by a person skilled in the art from the examples provided herein without any inventive step, shall fall within the scope of protection of the present application.
It is obvious that the drawings are only examples or embodiments of the present application, and it is obvious to those skilled in the art that the present application can be applied to other similar cases according to the drawings without creative efforts. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
The term "embodiment" is used herein to mean that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly or implicitly understood by one of ordinary skill in the art that the embodiments described in this application may be combined with other embodiments without conflict.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the patent protection. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (12)

1. A data read-write method of a stream processing system is characterized by comprising the following steps:
a Broker of the stream processing system receives a read-write data request;
and responding to the read-write data request, and the Broker performs read-write operation of data through a read-write cache, wherein the read-write cache is a preset memory space applied by the Broker from an off-heap memory.
2. The data read/write method of the stream processing system according to claim 1,
when the Broker receives a read data request, the Broker inquires whether first data exists in the read-write cache, wherein the first data is the data requested to be read by the read data request;
if so, the Broker reads the first data from the read-write cache;
otherwise, the Broker reads the first data from the disk.
3. The data reading/writing method of the stream processing system according to claim 2, wherein the Broker reads the first data from the disk, including:
when the first data is cold data, the Broker writes the first data into the read-write cache from an HDD through an operating system page cache, and when the first data is hot data, the Broker writes the first data into the read-write cache from an SSD;
and the Broker reads the first data from the read-write buffer.
4. The data read/write method of the stream processing system according to claim 1,
when the Broker receives a write data request, the Broker writes second data into the read-write cache, wherein the second data is the data requested to be written by the write data request.
5. The data reading/writing method of the stream processing system according to claim 4, wherein after the Broker writes the second data into the read/write cache, the method further comprises:
when the second data is cold data, the Broker writes the second data into an HDD from the read-write cache through an operating system page cache;
and when the second data is the hot spot data, the Broker writes the second data into the SSD from the read-write cache.
6. The method for reading and writing data of the stream processing system according to claim 1, further comprising:
and under the condition that the residual capacity of the read-write cache is smaller than a first preset threshold value, the Broker transfers third data to a magnetic disk, wherein the third data is the earliest written data in the data currently stored in the read-write cache.
7. The data read-write method of the stream processing system according to claim 6, wherein the rewriting of the third data to a disk includes:
when the third data is hot data, the Broker forwards the third data from the read-write cache to the SSD;
and when the third data is cold data, the Broker transfers the third data from the read-write cache to the HDD through an operating system page cache.
8. The method for reading and writing data of the stream processing system according to claim 7, further comprising:
when the Broker is used for transferring the third data from the read-write cache to the SSD, and when the SSD has insufficient memory, the Broker deletes fourth data from the SSD, wherein the fourth data is the data which is written in the SSD earliest in the data currently stored by the SSD.
9. The method for reading and writing data of the stream processing system according to claim 1, further comprising:
the Broker acquires configuration information;
and the Broker judges whether the currently read and written data is hot data or cold data according to the configuration information.
10. The method for reading and writing data of the stream processing system according to claim 1, further comprising:
the Broker judges whether the current read-write data is cold data or hot data according to the statistical information of the read-write frequency of the read-write data in a preset time period;
when the read-write frequency of the currently read-write data in the preset time period exceeds a second preset threshold, determining that the currently read-write data is hot data; otherwise, determining the current read-write data as cold data.
11. The data reading and writing method of the stream processing system according to claim 1, wherein in response to the read and write data request, the Broker performs a data reading and writing operation through a read and write cache, including:
the Broker judges whether the read-write data request is a read-write data request of an external service;
under the condition that the read-write data request is judged not to be a read-write data request of an external service, the Broker performs read-write operation of read-write data through an operating system page cache and a magnetic disk;
and under the condition that the read-write data request is judged to be a read-write data request of an external service, the Broker performs read-write operation of read-write data through the read-write cache.
12. A stream processing system comprising a producer, a consumer, and a Broker; wherein the producer is configured to produce data and write the produced data to the Broker, and the consumer is configured to consume the data stored in the Broker, and the Broker comprises a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the data reading and writing method of the stream processing system according to any one of claims 1 to 11.
CN202110813467.8A 2021-07-19 2021-07-19 Data reading and writing method of stream processing system and stream processing system Pending CN113672169A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110813467.8A CN113672169A (en) 2021-07-19 2021-07-19 Data reading and writing method of stream processing system and stream processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110813467.8A CN113672169A (en) 2021-07-19 2021-07-19 Data reading and writing method of stream processing system and stream processing system

Publications (1)

Publication Number Publication Date
CN113672169A true CN113672169A (en) 2021-11-19

Family

ID=78539491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110813467.8A Pending CN113672169A (en) 2021-07-19 2021-07-19 Data reading and writing method of stream processing system and stream processing system

Country Status (1)

Country Link
CN (1) CN113672169A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438236A (en) * 2022-09-28 2022-12-06 中国兵器工业计算机应用技术研究所 Unified hybrid search method and system
CN115586869A (en) * 2022-09-28 2023-01-10 中国兵器工业计算机应用技术研究所 Ad hoc network system and stream data processing method thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438236A (en) * 2022-09-28 2022-12-06 中国兵器工业计算机应用技术研究所 Unified hybrid search method and system
CN115586869A (en) * 2022-09-28 2023-01-10 中国兵器工业计算机应用技术研究所 Ad hoc network system and stream data processing method thereof
CN115438236B (en) * 2022-09-28 2023-08-29 中国兵器工业计算机应用技术研究所 Unified hybrid search method and system

Similar Documents

Publication Publication Date Title
US9917913B2 (en) Large message support for a publish-subscribe messaging system
US10430338B2 (en) Selectively reading data from cache and primary storage based on whether cache is overloaded
CN107526546B (en) Spark distributed computing data processing method and system
CN106844740B (en) Data pre-reading method based on memory object cache system
CN111221663B (en) Message data processing method, device and equipment and readable storage medium
CN113672169A (en) Data reading and writing method of stream processing system and stream processing system
CN101833512A (en) Method and device thereof for reclaiming memory
CN111124267B (en) Method, apparatus and computer program product for writing data
EP2842040B1 (en) Collaborative caching
JPS60140446A (en) Storage hierarchy control system
CN102521279A (en) Playing method, playing system and player of streaming media files
CN107888687B (en) Proxy client storage acceleration method and system based on distributed storage system
US10223270B1 (en) Predicting future access requests by inverting historic access requests in an object storage system
CN111930316A (en) Cache read-write system and method for content distribution network
CN113377868A (en) Offline storage system based on distributed KV database
CN112051968B (en) Kafka-based distributed data stream hierarchical cache automatic migration method
US11226898B2 (en) Data caching method and apparatus
CN112148736A (en) Method, device and storage medium for caching data
CN108733585B (en) Cache system and related method
US8549274B2 (en) Distributive cache accessing device and method for accelerating to boot remote diskless computers
CN112463073A (en) Object storage distributed quota method, system, equipment and storage medium
JPH11143779A (en) Paging processing system for virtual storage device
CN111459402B (en) Magnetic disk controllable buffer writing method, controller, hybrid IO scheduling method and scheduler
EP3293625B1 (en) Method and device for accessing file, and storage system
CN115168416A (en) Data caching method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination