CN116954944A - Distributed data stream processing method, device and equipment based on memory grid - Google Patents

Distributed data stream processing method, device and equipment based on memory grid Download PDF

Info

Publication number
CN116954944A
CN116954944A CN202310864300.3A CN202310864300A CN116954944A CN 116954944 A CN116954944 A CN 116954944A CN 202310864300 A CN202310864300 A CN 202310864300A CN 116954944 A CN116954944 A CN 116954944A
Authority
CN
China
Prior art keywords
server
data
processing
operator
data stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310864300.3A
Other languages
Chinese (zh)
Inventor
石志林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310864300.3A priority Critical patent/CN116954944A/en
Publication of CN116954944A publication Critical patent/CN116954944A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/545Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching

Abstract

The application relates to the technical field of big data, and provides a method, a device and equipment for processing distributed data streams based on a memory grid, which are used for reducing percentile time delay of the distributed stream processing. According to the method, multiple operators for data stream processing are deployed on each kernel and bear one thread, and a special non-cooperative thread is respectively started for an input operator and an output operation Fu Suanzi which depend on a third-party API and cannot be cooperatively called, so that the fluency of stream processing is ensured, and the stream processing speed is improved; meanwhile, at least one cooperative thread is shared by a plurality of different types of intermediate operator operators, so that the cooperative threads can stay on the same kernel for a longer time, the context switching of an operating system is reduced, the utilization rate of a CPU is improved, and the time delay of distributed data stream processing is further reduced. In addition, the data flow and the processing result are stored in the local memory grid, so that the method has higher availability and fault tolerance compared with the disk storage.

Description

Distributed data stream processing method, device and equipment based on memory grid
Technical Field
The present application relates to the field of big data technologies, and in particular, to a method, an apparatus, and a device for processing a distributed data stream based on a memory grid.
Background
With the popularization of cloud computing technology, the generated data volume is larger and larger, and the distributed stream processing technology is becoming more and more important in facing the rapid processing requirement of mass data.
In the prior art, the data flow is calculated by a main real-time computing engine, and out-of-order data is identified from the data flow and asynchronously sent to a secondary real-time computing engine for recalculation so as to update the data flow processing result of the main real-time computing engine. According to the technology, the auxiliary real-time computing engine is introduced to asynchronously process the out-of-order data in the data stream, so that the out-of-order data is prevented from blocking the real-time computation of the data stream, the integrity of the computation of the data stream is ensured, and the time delay of the data stream processing is reduced.
However, the above technique relies on the recognition result of out-of-order data, and once the out-of-order data in the data stream is recognized incorrectly, the processing of the entire data stream will be affected.
Therefore, reducing the delay of distributed data stream processing becomes a technical problem to be solved in the big data field.
Disclosure of Invention
The embodiment of the application provides a method, a device and equipment for processing distributed data streams based on a memory grid, which are used for reducing the time delay of the distributed data stream processing.
In one aspect, an embodiment of the present application provides a distributed data stream processing method based on a memory grid, which is applied to a distributed system, where the distributed system includes at least two servers having multiple cores, each core carries a thread, multiple operators for performing data stream processing on a target task are deployed on each core, and different operators are used to perform different data stream processing operations; the method comprises the following steps:
the server adopts a first thread to call an input operation Fu Suanzi, reads the data stream of the target task, and stores the read data stream into a memory grid of the server;
the server adopts a second thread, sequentially calls intermediate operators of different types to process the data flow stored in the memory grid according to the preset data flow processing logic of the target task, and uses the intermediate processing result of the last called intermediate operator to process the data flow in the memory grid as the processing result of the second thread; wherein each processing result is obtained based on an intermediate processing result of the server processing the data stream of the target task and an intermediate processing result of the other servers processing the data stream of the target task;
When one intermediate operator is called each time, storing an intermediate processing result obtained after the intermediate operator processes the data flow in the memory grid into the memory grid, and transmitting the intermediate processing result to the next intermediate operator for processing;
and the server adopts a third thread to call an output operator, and obtains a target processing result of the target task based on the processing result corresponding to each second thread.
On the other hand, the embodiment of the application provides a stream processing device based on a memory grid, which is provided with a device with a plurality of cores, is applied to a distributed system, each core bears a thread, and is provided with a plurality of operators for carrying out data stream processing on a target task, and the different operators are used for executing different data stream processing operations;
the input module is used for calling an input operation Fu Suanzi by adopting a first thread, reading the data stream of the target task and storing the read data stream into a memory grid of the server;
the data processing module is used for processing the data flow stored in the memory grid by sequentially calling intermediate operators of different types according to the preset data flow processing logic of the target task by adopting a second thread, and taking the intermediate processing result of the data flow in the memory grid processed by the last called intermediate operator as the processing result of the second thread; wherein each processing result is obtained based on an intermediate processing result of the device processing the data stream of the target task and an intermediate processing result of the other devices processing the data stream of the target task;
When one intermediate operator is called each time, storing an intermediate processing result obtained after the intermediate operator processes the data flow in the memory grid into the memory grid, and transmitting the intermediate processing result to the next intermediate operator for processing;
and the output module is used for calling an output operator by adopting a third thread and obtaining a target processing result of the target task based on the processing result corresponding to each second thread.
Optionally, the different types of intermediate operator operators include a send operator and a receive operation Fu Suanzi for exchanging intermediate processing results with other servers;
the data processing module is specifically configured to: when the intermediate operator invoked by the second thread is the sending operation Fu Suanzi, obtaining an intermediate processing result of part of the intermediate operator from a shared queue of a local memory, and sending the obtained intermediate processing result to the shared queue of the memory of other devices;
and when the intermediate operator called by the second thread is the receiving operation Fu Suanzi, receiving an intermediate processing result of part of the intermediate operator operators from a shared queue of the device memory, and storing the received intermediate processing result into the shared queue of the local memory.
Optionally, the data processing module is specifically configured to:
when the called intermediate operator is a non-receiving operation Fu Suanzi, the second thread is adopted to process part of intermediate processing results received from other servers and stored in a shared queue of the local memory, and the processed intermediate processing results are removed from the shared queue of the local memory;
and when the called intermediate operator is the receiving operation Fu Suanzi, adopting the second thread to receive part of intermediate processing results from the shared queue of the other device memory again so as to fill the shared queue of the local memory again until the data stream processing of the target task is completed.
Optionally, the data processing module is specifically configured to:
when the called intermediate operator is a non-sending operation Fu Suanzi, the second thread is adopted to process the data stream stored in the memory grid, and the intermediate processing result of the processed part of intermediate operator is stored in a shared queue of the local memory;
and when the called intermediate operator is the sending operator, calculating a partition ID corresponding to the corresponding intermediate processing result by adopting the second thread based on keys of intermediate processing results of part of intermediate operator operators, and sending part of intermediate processing results to shared queues of other device memories corresponding to the corresponding partition ID.
Optionally, when the shared queue of the server in the local memory is full, the data processing module is specifically configured to: storing intermediate processing results corresponding to part of intermediate operations Fu Suanzi into a waiting queue of a local memory;
the output module is specifically configured to: the input operation Fu Suanzi is controlled to read the amount of data to the memory grid each time.
Optionally, the data processing module is specifically configured to:
after receiving the intermediate processing result which can be processed by the current window from the shared queue in the memory of the other device, sending a confirmation message to the other device, wherein the confirmation message is used for indicating the other device to send the data of the next window and controlling the input operation Fu Suanzi to read the data quantity of the memory grid each time;
receiving an intermediate processing result which can be processed by a next window from a shared queue of the memory of other devices; wherein the size of the data volume processed by each window is dynamically adjusted according to the processing rate.
Optionally, the apparatus further includes a storage module, for each data of the data stream, the intermediate processing result, and the processing result, the storage module is specifically configured to:
Calculating a partition ID according to the key of the data;
according to the partition ID, the data are stored on the corresponding partition of the memory grid; each partition on the memory grid corresponds to a primary copy locally, and corresponds to at least one backup copy on other devices, and the primary copy and the backup copy are used for storing state snapshots of the corresponding partition.
Alternatively, when the device fails, the partitions on the device may be reassigned to other devices, and the backup copies of the partitions on the other devices may be upgraded to primary copies, and the backup copies on the other devices may be updated.
Optionally, when a device is newly added in the distributed system, partitions on other devices are allocated to the newly added device, and a primary copy and a backup copy are added to the newly allocated partition on a memory grid of the newly added device, where the partitions on the memory grid of each device after allocation are balanced.
Optionally, the data processing module is specifically configured to:
generating a state snapshot based on data of various operators used for processing the data flow of the target task on each partition of the memory grid according to a preset snapshot period, and storing the data on each partition onto a disk, wherein the state snapshot is used for recording the processing result of the data flow stored in each partition on the memory grid at the current moment;
And stopping calculation after the calculation task fails, recovering the latest state snapshot obtained before the failure into the memory grid, calling an input operation Fu Suanzi by adopting the first thread, and reading the data flow after the latest state snapshot to the memory grid.
Optionally, when the input operator identifies a data offset, the data offset is recorded in the state snapshot;
the input module is specifically used for: and calling the input operator by adopting the first thread, and reading the data stream after the offset into the memory grid without reading the data stream before the offset.
Optionally, when data of multiple operators used for processing the data stream of the target task on each partition of the memory grid is stored to a disk, and a corresponding state snapshot is generated, the input module is specifically configured to:
a response message is sent to the first thread to indicate that the state snapshot obtained based on the data flow read by the input operation Fu Suanzi can be safely deleted.
Optionally, the response message carries an identifier of the status snapshot, and after the response message is sent, the input module is specifically configured to:
When the identifier carried by the response message is already recorded in the global state snapshot, the input operation Fu Suanzi is not called to load the data stream of the state snapshot record corresponding to the identifier, wherein the global state snapshot is used for recording the identifiers of all the state snapshots.
Optionally, the data processing module is specifically configured to:
stopping sending the processed data to a downstream application when generating a state snapshot based on the data of the output operation Fu Suanzi on each partition of the memory grid;
and after the state snapshot is generated, continuously sending the processed data to the downstream application.
Optionally, after generating the status snapshot, the data processing module is specifically configured to:
generating an identification for the state snapshot;
when the identifier is not in the global snapshot, storing the data of the state snapshot record corresponding to the identifier to a disk; wherein the identities in the global snapshot are cleaned up periodically.
Optionally, the data flow processing logic of the target task and the data flow relationships among the multiple operators are predefined in a pipeline API facing the developer, and the processing operations of the multiple operators are fine-tuned in a core API facing the distributed flow processing engine.
Optionally, when the number of cores is greater than three, the apparatus further includes a garbage collection module configured to:
and taking the fourth thread as a garbage recoverer to recover the processed data in the memory grid in the data stream processing process.
In another aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the steps of the method for processing a distributed data stream based on a memory grid.
In another aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer executable instructions that when executed by an electronic device implement the steps of the memory grid-based distributed data stream processing method described above.
In another aspect, an embodiment of the present application provides a computer program product, including a computer program, where the computer program when executed by an electronic device implements the steps of the method for processing a distributed data stream based on a memory grid.
The embodiment of the application has the following beneficial effects:
in the memory grid-based distributed data stream processing method provided by the embodiment of the application, when the data stream of the target task is processed, the input operator and the output operation Fu Suanzi in the used multiple operators depend on the third-party API and cannot be cooperatively called, so that blocking can be caused, and therefore, one thread is respectively started on multiple cores of each server of the distributed system, and thus, a special thread can be independently allocated for the input operator and the output operator, thereby ensuring the fluency of data stream processing and improving the processing speed of the data stream; meanwhile, each thread opened on the other cores can be shared by different types of intermediate operator operators, so that the shared threads can stay on the same core for a longer time, the context switching of an operating system is reduced, the utilization rate of the core is improved, and the time delay of distributed data stream processing is further reduced. On the other hand, a plurality of operators for carrying out data stream processing on the target task are deployed on each kernel, different operators are used for executing different data stream processing operations, wherein an input operator is used for reading the data stream of the target task into a memory grid of a server, different types of intermediate operator operators are used for processing the data stream stored in the memory grid, the input operator is used for outputting a target processing result of completing the data stream processing on the target service, the intermediate processing result of each intermediate operator is stored in the memory grid and is transmitted to the next intermediate operator for subsequent processing, and the intermediate processing results can be exchanged among different servers through the intermediate operator operators, so that the fusion of the distributed data stream processing results is realized. Because the multiple operators deployed on the kernel where each thread is located and the data stream processing logic of the target task is preset, each thread can call the corresponding operators in order according to the data stream processing logic to perform data stream processing, the acquisition order of the data stream is not required to be concerned, the unordered data stream processing process is realized, and the data stream processing efficiency is further improved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of an application scenario in which an embodiment of the present application is applicable;
FIG. 2 is a core component of a memory grid-based distributed stream processing method according to an embodiment of the present application;
FIG. 3 is a block diagram of the processing logic in the pipeline API and the core API according to an embodiment of the present application;
FIG. 4 is a flowchart of a distributed stream processing method based on a memory grid according to an embodiment of the present application;
FIG. 5A is a schematic diagram of a stateful flow processing process deployed on a server according to an embodiment of the present application;
FIG. 5B is a schematic diagram of thread scheduling according to an embodiment of the present application;
FIG. 6A is a schematic diagram of another stateful flow processing process deployed on a server provided by an embodiment of the present application;
FIG. 6B is a schematic diagram of another thread scheduling according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a memory grid partition according to an embodiment of the present application;
FIG. 8 is a schematic diagram of fault tolerance provided by an embodiment of the present application;
fig. 9 is a block diagram of a distributed data stream processing device based on a memory grid according to an embodiment of the present application;
fig. 10 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.
Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.
Kafka: the distributed message system is used for transmitting and processing the data stream among a plurality of application programs in a message transmission mode, is a high-performance, low-delay and high-fault-tolerance data stream processing platform, and is widely applied to the fields of big data, stream processing and the like.
Flink: is a distributed stream processing engine that can process an unlimited number of data streams and has low latency, high reliability, and high throughput data stream processing capabilities.
Percentile delay: it means that the event delay time reaches the maximum value of the percentile within a certain time interval. For example, a system that requires 99.99% of the event delay to be less than 10 milliseconds when handling events, the system cannot handle 99.99% of the event delay for a maximum of 10 milliseconds within a specified time frame.
State (State): in a distributed stream processing engine, the current data stream processing needs to rely on previous processing results, which are then states.
Snapshot checkpoints: is a mechanism for ensuring the consistency of the state of the system, and generates a state snapshot with global consistency by setting periodical check points on the system.
State backend: for storing status data.
Memory grid: the storage for distributed objects in memory provides functionality similar to a standard Java library, and the stored objects can be a variety of data structures and interfaces including, but not limited to Map, queue, ringbuffer, etc. The Map provides a distributed memory key value storage mode, and the Map is used for storing the state snapshot in the embodiment of the application.
Map state: the state is stored in a Map data structure in the system, the state is divided into disjoint partitions through the Map data structure, the partitions are distributed to single servers (namely, one physical node) in the distributed system, and each physical node stores the state corresponding to a specific key value pair.
The cooperative journey: the method can be regarded as a lightweight thread (i.e. green thread), is a small computing unit which can be suspended and restored on the programming language level, avoids expensive operating system context switching, allows an execution thread to stay on the same CPU core for a longer time, and therefore retains CPU cache data.
Task unit: the basic unit of data flow processing, which is responsible for executing short-time computing tasks and sharing execution threads on the CPU core, is similar to a coroutine, can be suspended and restored on the programming language level, and depends only on internal states, so that meaningful computing tasks (e.g., aggregation, connection, etc.) can be performed in a short execution period. Since task units have no dependencies, thousands of task units can be hosted on a single physical node of the distributed system, thereby increasing the throughput of tasks and enabling lightweight multiplexing.
Cooperative threads: a scheduling mode based on a developer level means that a plurality of task units of different types can share the same thread, and the task units execute calculation tasks by using kernels in turn, so that the purposes of high CPU utilization rate and thread scheduling optimization are achieved. Wherein each task unit is an independent small computing unit when sharing the same thread, and can pause and resume execution when needed.
Non-cooperative threads: the method is a thread scheduling mode, and refers to that task units of the same type share the same thread through alternately using cores (namely CPU cores), and the task units are similar to a cooperative thread and a green thread and can also be suspended and restored on the programming language level.
Primary copy: for describing the backup of data in the memory grid. Each partition of the memory grid has a primary copy that is stored on the local node and has the most current state of the most current data.
Backup copies: refers to the backup of data of a memory grid to other nodes in a distributed server cluster to improve fault tolerance and availability copies.
Accurate disposable: it is meant that each input data is guaranteed to have only one impact when processing the data stream, i.e. the results produced in the distributed system are accurate and not repeated or lost due to system failures or other reasons.
Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.
The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through Cloud Computing (Cloud Computing).
Cloud computing is a computing model that distributes computing tasks over a large number of computer-made resource pools, enabling various application systems to acquire computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed. As a basic capability provider of cloud computing, a cloud computing resource pool (cloud platform for short, generally referred to as IaaS (Infrastructure as a Service, infrastructure as a service) platform) is established, in which multiple types of virtual resources are deployed for external clients to select for use.
With the development of the internet, real-time data flow and diversification of connected devices, and the promotion of demands of search services, social networks, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Unlike the previous parallel distributed computing, the generation of cloud computing will promote the revolutionary transformation of the whole internet mode and enterprise management mode in concept.
The following outlines the design ideas of the embodiments of the present application.
With the popularization of cloud computing technology, the data volume generated by various application programs is larger and larger, and the real-time requirement on data processing is higher, so that a distributed stream processing system gradually becomes a research focus in the field of big data. While different applications may require different system designs, such as internet applications and financial applications having different infrastructure and system design requirements.
In order to meet the deployment requirements of different applications, multiple streaming systems need to be developed, each of which is optimized specifically for different business functions, such as Kafka focusing on storage and integration flows, and the flank implements a generic programming model with non-sequential and exactly one-time processing capabilities. However, the architecture of these streaming systems is not focused on low latency deployments, and the percentile latency of existing streaming systems can easily reach a few seconds in the presence of computational resource shortage, because existing streaming systems abstract hardware and are overly dependent on host language amenities (such as JVM default garbage collectors) in order to provide a simple programming model and non-sequential processing.
In view of this, the embodiment of the application provides a distributed stream processing method based on a memory grid, which constructs complex data stream processing logic based on the memory grid, can process unordered data streams and provides accurate and disposable processing guarantee. According to the method, a thread is respectively borne on a plurality of cores of each server of the distributed system, a plurality of operators for carrying out data stream processing on a target task are deployed on each core, and different operators are used for executing different data stream processing operations, so that the distributed processing of the data stream is completed. According to the method, the fact that the input operator and the output operator Fu Suanzi in multiple operators depend on a third-party API and cannot be cooperatively called is considered, blocking possibly occurs, therefore, a single thread is respectively distributed for the input operator and the output operator, the blocking is prevented from affecting the progress of data stream processing, the processing speed of the data stream is improved, meanwhile, different types of intermediate operator operators can share one thread, the context switching of an operating system is reduced, the thread can stay on a kernel for a long time, the utilization rate of a CPU is improved, and the time delay of distributed stream processing is reduced. According to the method, a thread is carried on each core and multiple operators for carrying out data stream processing on a target task are deployed, so that each core sequentially calls different types of intermediate operator operators for carrying out data stream processing according to preset data stream processing logic of the target task, throughput of each core is improved, meanwhile, percentile delay of the data stream can be controlled to be in a millisecond level, and processing efficiency of the data stream is effectively improved.
It should be noted that when the above embodiments of the present application are applied to specific products or technologies, the data stream used needs to be licensed or agreed upon by the subject, and the collection, use and processing of the relevant data need to comply with relevant laws and regulations and standards of the relevant country and region. Wherein the data includes, but is not limited to, user descriptive information, advertisements, multimedia assets, electronic assets, and the like.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and that the embodiments of the present application and the features of the embodiments may be combined with each other without conflict.
Fig. 1 is a schematic diagram of an application scenario according to an embodiment of the present application. The application scenario diagram includes a terminal device 110 and a server cluster 120.
In an alternative embodiment, communication between terminal device 110 and server cluster 120 may be via a wired network or a wireless network.
In the embodiment of the present application, the terminal device 110 includes, but is not limited to, a personal computer, a mobile phone, a tablet computer, a notebook, an electronic book reader, an intelligent medical device, a vehicle-mounted terminal, and the like, and various application programs including, but not limited to, a resource recommendation application, a financial application, a navigation application, a shopping application, and the like are installed on the terminal device. The server cluster 120 is a background server of an application program or a server specially used for performing data stream processing, and the present application is not limited in particular. The server cluster 120 is a distributed system formed by a plurality of physical servers, and may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms.
In the embodiment of the application, the distributed data stream processing method based on the memory grid is executed by the server cluster 120, each server in the server cluster 120 comprises a plurality of cores, each core is provided with a plurality of operators for performing data stream processing on the target task, and different operators are used for executing different data stream processing operations. The method comprises the steps that an input operator is used for reading a data stream of a target task into a memory grid of a server, different types of intermediate operator operators are used for processing the data stream stored in the memory grid, the input operator is used for outputting a target processing result of completing data stream processing of the target service, the intermediate processing result of each intermediate operator is stored into the memory grid and is transmitted to the next intermediate operator for subsequent processing, and the intermediate processing results can be exchanged among different servers through the intermediate operator operators, so that fusion of the distributed data stream processing results is achieved. One kernel of each server in the server cluster 120 carries one thread, which may be a cooperative thread or a non-cooperative thread, where the non-cooperative thread is used to call the same type of operator and the cooperative thread is used to call a different type of operator. Because the input operator and the output operation Fu Suanzi depend on the third-party API and cannot be cooperatively invoked, there may be a blockage, and therefore, the input operator and the output operation Fu Suanzi occupy one non-cooperative thread respectively, and the intermediate operation Fu Suanzi shares at least one cooperative thread, so that one thread can stay on the same kernel for a longer time, the context switching of the operating system is reduced, the utilization rate of the CPU is improved, and the time delay of distributed data stream processing is further reduced.
It should be noted that, the embodiment shown in fig. 1 is only illustrative, and the number of terminal devices and server clusters is not limited in practice, and is not particularly limited in the embodiment of the present application.
In the embodiment of the application, a plurality of servers can form a distributed system, and each server is a node in the distributed system; the embodiment of the application discloses a distributed data stream processing method based on a memory grid, wherein the related data streams, processing results and the like can be stored on nodes.
In the following, the method for processing a distributed data stream based on a memory grid according to an exemplary embodiment of the present application will be described with reference to the accompanying drawings in conjunction with the above-described application scenario, and it should be noted that the above-described application scenario is only shown for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect.
Referring to fig. 2, core components of the memory grid-based distributed stream processing method according to the embodiment of the present application include a management center, a pipeline API, a core API, an execution engine, a connector, a state backend, cluster management, and an operator. Wherein, the liquid crystal display device comprises a liquid crystal display device,
management center: a Web User Interface (UI) and a Web communication protocol (REST API) are provided for managing and observing the processing of data streams.
Pipeline API: a developer-oriented high-level API for creating pipeline logic for data streams. The management API provides mapping (Map), filtering (Filter), aggregation (agregate), etc. operations Fu Suanzi by which complex data stream processing procedures can be implemented. The input and output types of operator operators are checked at compile time, so the pipeline API is type safe. The pipeline API divides the processing of a data stream into a plurality of stages, each stage represented by an operator that performs a computational task. Each stage may be either stream processing or batch processing, with the difference between stream processing and batch processing being that the input of each operator in the stream processing is infinite and the input of each operator in the batch processing is finite. The pipeline API may also mix the stream processing with the batch processing to build a mixed stream batch process, such as performing a hash connection between a batch "build" phase and a stream processing "probe" phase, the batch processing pulling all inputs to the batch processing phase at pipeline initialization, performing complex batch processing operations, the stream processing will simply probe hash tables on each stream transfer event. The pipeline API simplifies and expands the stream processing process, and the embodiment of the application uses the pipeline API to realize data stream processing.
Core API: an infrastructure is provided that describes high-level domain-specific semantics (Domain Specific Language, DSL) of data flows and APIs directed to a distributed stream processing engine, including all functions of the execution engine, and data stream processing logic that can be used to construct operator operators, is a lower level API that can be used to specify low-level computing policies. The core API may be used to fine tune the data stream computation process, such as: defining the queue size between two servers, creating custom code in response to incoming signals, customizing the partition policy of the memory grid, etc., the core API is able to implement finer granularity data stream processing logic than the pipeline API, but adds computational complexity and lacks type security checks.
An execution engine: the method comprises efficient operator realization, which is used for partition, window aggregation, connection, basic computing task and state management, fault tolerance and the like. Embodiments of the present application provide an execution engine that can uniquely share computing resources, also referred to as task unit scheduling, including but not limited to Flink, storm, samza, spark Streaming and Dataflow.
State backend: in the embodiment of the application, unlike most streaming systems which store state snapshots in stable object storage, the state snapshots are stored in a partitioning and copying mode by using the memory grid, so that data can be quickly recovered from nodes with copies when nodes fail to realize fault tolerance, and the state snapshots are elastically expanded when the load is increased. The memory grid is provided with a memory grid, wherein the memory grid is provided with a partition matched with the partition of the execution engine, so that expensive and slow data repartitioning can be avoided, and low-delay data stream processing is kept.
Operation Fu Suanzi: for implementing different processing operations on the data stream, each operator can be seen as a task unit.
Stateless operation Fu Suanzi: operator operators that do not depend on previous computation results.
Stateful operation Fu Suanzi: operator dependent previous computation results.
A connector: the method is used for data exchange among different servers in the distributed system.
Cluster management: for providing a communication protocol between different servers.
In an embodiment of the application, the pipe API and the core API are used to create distributed, stateful, and fault tolerant data stream processing logic. As shown in fig. 3, a data stream processing logic diagram constructed for a pipeline API and a core API respectively, the pipeline API and the core API respectively use standard streaming data stream execution models to form a directed acyclic logic diagram of data stream processing through an input operation Fu Suanzi (i.e., source operator), an intermediate operation Fu Suanzi (including but not limited to a mapping operation Fu Suanzi (i.e., flash operator), a filtering operation Fu Suanzi (i.e., filter operator), a Group accumulating operation Fu Suanzi (i.e., group accept operator), an aggregation operation Fu Suanzi (i.e., combine operator), and an output operation Fu Suanzi (i.e., sink operator). The core API may fuse successive stateless operator operators together for computation, thereby simplifying the data stream processing.
As shown in FIG. 3, stateless operations Fu Suanzi, the FlatMap operator and the Filter operator, are taken as two task units in the pipeline API, while the core API fuses the FlatMap operator and the Filter operator into one task unit; the calculation task completed by the Group Accumulator in the pipeline AIP is realized by two fine-grained task units of the Group Accumulator and the Combine. The pipeline API may be considered a parallel distributed operator computation graph in the core API.
The data stream processing method based on the memory grid provided by the embodiment of the application can be applied to a distributed system comprising at least two servers, wherein each server comprises a plurality of cores (the number of the cores is greater than three), each core bears one thread, and the operator corresponding to each core in the related technology is different.
The data flow processing logic of the target task and the data flow relation among operators are predefined in the pipeline API facing the developer, and the processing operations of various operators are fine-tuned in the core API facing the distributed flow processing engine, as shown in fig. 3.
In the embodiment of the application, a plurality of operators for carrying out data stream processing on the target task are deployed on each independent kernel, so that various operators are sequentially called according to the data stream processing logic preset for the target task in the kernel API, thereby allowing the processing operation of each operator on the data stream, carrying out data transmission on a local server as much as possible, avoiding expensive network data transmission, allowing the execution based on the cooperative process and ensuring the utilization rate of a CPU.
Taking a server in a distributed system as an example, the data flow processing process implemented by the server is shown in fig. 4, and mainly includes the following steps:
s401: the server uses the first thread call input operation Fu Suanzi to read the data stream of the target task and stores the read data stream in the memory grid of the server.
In one example, servers perform data stream processing by presetting data stream processing logic for a target task in a core API, as shown in fig. 5A, each server performs data stream processing by using multiple operators. During the data stream processing process, the server can exchange data with other servers to realize distributed stream processing. Wherein the other servers are one or more servers in the distributed system that process the data stream of the target task together with the server.
For example, if the server 1 and the server 2 in the distributed system together process the data stream of the target task, the server 2 is another server with respect to the server 1, and the server 1 is another server with respect to the server 2.
In one example, the processing of the data stream by the server is implemented based on a plurality of task units with short execution time (typically less than 1 millisecond), each operator is a task unit that performs a computational task, the data stream processing operations performed by the different operators are different, and the task units are capable of maintaining high loads and extremely high task volumes on limited resources. Unlike the prior art that one thread is allocated to each task unit, the embodiment of the application carries a thread on each kernel, as shown in fig. 5B, according to the type of task unit executed by each thread, the thread is divided into a cooperative thread and a non-cooperative thread, wherein the non-cooperative thread is used for executing at least one task unit of the same type, the cooperative thread is used for executing a plurality of task units of different types, that is, one thread can execute the plurality of task units in a cyclic manner, and the plurality of task units share one set of computing resources by using the threads carried on the kernels in turn, thereby avoiding expensive context switching of an operating system, allowing one thread to keep longer time on one kernel, thereby preserving cache lines of a CPU and improving the CPU utilization rate.
To further increase the efficiency of data stream processing, the data stream processing logic of the core API may be parallelized in the server. Taking parallelism 2 as an example, as shown in fig. 6A, as a data stream processing process of the server, two branches are adopted to perform parallel processing on the data stream, wherein the processing processes of the two branches on the data stream are the same.
The number of threads opened for the parallel data stream processing procedure is shown in fig. 6B, and Java virtual machine (Java Virtual Machine, JVM) threads of the same number as the kernel number are deployed. In the process of processing the data stream of the target task, the server can receive the result of processing the data stream of the target task from other servers in the distributed system, and can also send the processing result to other servers of processing the data stream of the target task in the distributed system. In the data stream processing process, an input operator in a plurality of operators independently enables a non-cooperative thread (marked as a first thread), each intermediate operator in the plurality of operators can share a cooperative thread (marked as a second thread), and an output operator also independently enables a non-cooperative thread (marked as a third thread).
According to the embodiment of the application, the data transmission path from the input operator Souce operator to the output operator Sink operator is optimized, and for each server, the Souce operator is only connected with operators for completing local calculation tasks, and the data stream read by the Source operator is received by each intermediate operator in parallel and is subjected to corresponding calculation processing. Since the computational tasks of the concurrently invoked operators do not depend on each other, the concurrent tasks can be performed in a very short time. When the window for executing the task expires, the control right is returned to the server, and the server can call the next task unit to be executed by adopting the same thread instead of relying on the scheduling program of the operating system level, so that the context switching of the operating system is reduced, and the utilization rate of the CPU is improved.
In the embodiment of the application, the data stream processing logic of the target task optimizes the communication cost between operators for completing different calculation tasks by the following three modes: 1) The continuous stateless operators are fused together, so that the data stream processing process is simplified; 2) The cooperation thread is used, so that the data stream processing is kept to be executed locally as much as possible, and a two-stage processing mode is adopted for the data of local circulation and the data exchanged on different partitions, namely partial results are calculated locally and then are subjected to global aggregation with the processing results of other servers; 3) The various operators deployed on the kernel of each server comprise a sending operator to Combine operator and a receiving operator from accemulate operator for data exchange, so that the exchange of intermediate processing results of part of intermediate operator operators between different servers is realized.
Considering that the input operation Fu Suanzi relies on a third party API, and cannot be invoked in conjunction with other operators, there may be a blockage for which a dedicated first thread (i.e., a non-cooperative thread) must be started.
In one example, types of input operator operators include, but are not limited to, collection-based Source operators, file-based Source operators, socket-based Source operators, and custom-based Source operators (e.g., kafka, streams).
For example, when the Source operator is a collection-based Source operator, the server invokes the input operation Fu Suanzi with the first thread, reads the data stream of the target task from the collection and stores it in the memory grid; when the Source operator is a file-based Source operator, the server invokes the input operation Fu Suanzi with the first thread, reads the data stream of the target task from the file, and stores it in the memory grid.
Experiments prove that at most two Source operators can be deployed on each server of the distributed system. The two Source operators may share the same first thread, which executes the two Source operators in a round robin fashion. Specifically, each Source operator is used as a task unit executed in a short time, after one Source operator is executed, the occupation is automatically abandoned, and the execution engine schedules the first thread to execute the next Source operator. The cyclic execution mode realizes high-efficiency CPU utilization rate, avoids the context switching of an operating system, and allows the first thread to be reserved on one CPU core for a longer time, thereby reserving a CPU cache line.
S402: the server adopts each second thread, sequentially calls intermediate operations Fu Suanzi of different types to process the data streams stored in the memory grid according to the data stream processing logic of the preset target task, and uses the intermediate processing result of the last called intermediate operation Fu Suanzi to process the data streams in the memory grid as the processing result of the second thread.
Wherein each second thread is a cooperative thread, and tens of thousands of task units can be hosted on each second thread. According to the embodiment of the application, the intermediate operations Fu Suanzi for completing different calculation tasks are multiplexed on a single second thread, so that thousands of concurrent tasks are run on the same CPU core resource, and different types of intermediate operations Fu Suanzi are shared by one second thread, so that high load is maintained on limited resources and various processing operations are executed, and the additional resource expenditure is reduced.
Taking a second thread as an example, because the pipeline API and the core API have preset data stream processing logic for the data stream of the target task, the data stream processing logic can be implemented by multiple operators deployed on each kernel, so when implementing, the server can call different types of intermediate operations Fu Suanzi in sequence to process the data stream stored in the memory grid according to the data stream processing logic of the target task based on the multiple operators deployed on the kernel bearing the second thread, until all intermediate operations Fu Suanzi are invoked, and the intermediate processing result of the last called intermediate operation Fu Suanzi to process the data stream in the memory grid is used as the processing result of the second thread.
And when the second thread calls one intermediate operator each time, storing an intermediate processing result of the intermediate operator into a memory grid, and transmitting the intermediate processing result to the next intermediate operator for processing according to data stream processing logic.
For example, as shown in fig. 5A, first, a second thread invokes a fusion operator of a continuous stateless intermediate operation Fu Suanzi snap map operator and a Filter operator, reads a Source operator into a data stream in a memory grid, maps the data stream into multi-dimensional data, then filters the multi-dimensional data to obtain an intermediate processing result, and stores the intermediate processing result processed by the fusion operator into the memory grid for being transmitted to a next intermediate operator for use; when the calculation task of the fusion operator is completed, automatically giving up the occupation weight of a second thread, and determining that one calculation task after the fusion operator is processed by a stateful intermediate operation Fu Suanzi Group Aggregate operator according to the data stream processing logic by the second thread, then calling the Group Aggregate operator to process the intermediate processing result of the fusion operator stored in the memory grid, and selecting the highest value in each Group of data to store in the memory grid for being transmitted to the next intermediate operator; after the calculation task of the Group Aggregate operator is completed, the occupation weight of the second thread is automatically abandoned, the second thread determines that one calculation task behind the Group Aggregate operator is processed by the stateful intermediate operation Fu Suanzi Combine operator according to the data flow processing logic, and then the Combine operator is called to execute the next calculation task, so that the processing result of the second thread is obtained.
When the server processes the data streams in parallel, the processing results of different second threads can be mutually used.
For example, in fig. 6A, the intermediate processing result obtained by the fusion operator of the continuous stateless intermediate operation Fu Suanzi snap operator and the Filter operator is called by the second thread 1, and may be used when the second thread 1 calls the Group Aggregate operator, or may be used when the second thread 2 calls the Group Aggregate operator. Similarly, the second thread 2 calls the intermediate processing result obtained by the fusion operator of the continuous stateless FlatMap operator and the Filter operator, and the intermediate processing result can be used when the second thread 2 calls the Group Aggregate operator and can also be used when the second thread 1 calls the Group Aggregate operator.
For another example, in fig. 6A, the intermediate processing result obtained by the second thread 1 calling the Group Aggregate operator may be used when the second thread 1 calls the Combine operator, or may be used when the second thread 2 calls the Combine operator. Similarly, the second thread 2 calls the intermediate processing result obtained by the Group Aggregate operator, and the intermediate processing result can be used when the second thread 2 calls the Combine operator, and also can be used when the second thread 1 calls the Combine operator.
In the embodiment of the application, the intermediate operator operators of different types comprise a sending operator and a receiving operator Fu Suanzi, so that when a plurality of servers in the distributed system process data streams, the intermediate processing results of part of the intermediate operator operators can be exchanged between different servers through the sending operator and the receiving operator, that is, the processing result of each second thread is obtained based on the intermediate processing result of the servers processing the data streams of the target task and the intermediate processing result of other servers processing the data streams of the target task.
In one example, each server in the distributed system is in local memory, a shared queue of intermediate operator operators can be created, which can be accessed by other servers to effect the exchange of intermediate processing results. The method for exchanging the intermediate processing results between different servers comprises the following steps: the server transmits the intermediate processing results of the partial intermediate operator to the other servers, and the server receives the intermediate processing results of the partial intermediate operator from the other servers. Wherein the intermediate processing result of the part of intermediate operator operators transmitted and received is the intermediate processing result of the stateful intermediate operator.
As shown in fig. 5A and 6A, the intermediate operator for completing the intermediate processing result exchange task sends an operator to Combine operator, and the receiving operator is from accemulate operator. the to Combine operator is used for receiving processed data, the from accurate operator is used for distributing data downstream, one to Combine operator and one from accurate operator form a connector between each server and other servers, and the connector has custom processing logic (such as custom scalar functions (User Defined Scalar Function, UDF) or operator logic, connecting two pieces of data, etc.) in data stream processing.
In the data stream processing process, when the intermediate operator invoked by the second thread is the send operator, the server obtains an intermediate processing result of a part of intermediate operations Fu Suanzi (i.e., a stateful intermediate operator, such as a Group accept operator) from the shared queue of the local memory, and sends the obtained intermediate processing result to the shared queue of the other server memory.
When the method is implemented, in the process that the server sequentially calls intermediate operators of different types by adopting a second thread to process data flow, when the called intermediate operators are sending operators, the server calculates partition IDs corresponding to corresponding intermediate processing results based on keys of intermediate processing results of part of operators, and sends part of intermediate processing results to shared queues of other servers corresponding to the corresponding partition IDs; when the called intermediate operator is a non-sending operation Fu Suanzi, the server processes the data stream stored in the memory grid, and stores the intermediate processing result of the processed part of intermediate operator operators into a shared queue of the local memory.
For example, as shown in fig. 6A, the second thread 1 acquires the intermediate processing result of the Group Aggregate operator from the shared queue of the local memory of the server 1, calculates the partition ID of 2, which is the partition on the memory grid of the server 2, based on the key of the intermediate processing result, and therefore, the second thread 1 calls the to Combine operator to store the intermediate processing result in the shared queue of the server 2.
When the intermediate operator called by the second thread is the receiving operator, the server receives intermediate processing results of part of intermediate operator operators from the shared queue of the other server memory, and stores the obtained part of intermediate processing results into the shared queue of the local memory.
In the implementation, in the process that the server sequentially calls the intermediate operators of different types by using the second thread to perform data stream processing, when the called intermediate operators are non-receiving operations Fu Suanzi, the server processes part of intermediate processing results stored in the shared queue of the local memory and received from other servers, removes the processed intermediate processing results from the shared queue of the local memory, and when the called intermediate operators are receiving operations Fu Suanzi, the server receives part of intermediate processing results from the shared queue of the other server memory again to fill the shared queue of the local memory again until the data stream processing of the target task is completed.
For example, as shown in fig. 6A, the second thread 2 invokes the from account operator to pull the intermediate processing result from the shared queue of the other server memory to fill the shared queue in the local memory, and when the account operator completes the calculation task, pulls the available intermediate processing result, that is, the data dequeue, from the shared queue of the local memory, after repeatedly invoking the account operator, the intermediate processing result in the shared queue of the local memory is consumed, and then the second thread 2 invokes the from account operator again, and pulls the intermediate processing result from the shared queue of the other server memory to fill the shared queue of the local memory.
In the embodiment of the application, the shared queue of the local memory of each server in the distributed system can be regarded as a producer consumer queue, wherein the producer and the consumer are defined in terms of standing on the server, the producer means that the server can generate data stored in the shared queue, the consumer means that the server can pull and consume the data from the shared queue, and in essence, the shared queue is a connection between a pair of servers serving a data stream processing task. Different servers realize high-efficiency data exchange without waiting through queues in a shared memory, and the processing efficiency of data flows is improved.
In a distributed system, each server has limited processing power, and if the amount of data received by the server exceeds the limit of the processing power, system faults may be caused. To this end, embodiments of the present application employ a backpressure mechanism to limit the amount of data received by the server, so that the entire pipeline remains balanced and operates at the slowest speed.
In one example, the backpressure mechanism may be applied between computing tasks completed by the same server. Specifically, when the shared queue of the local memory is full, the server stores the intermediate processing result corresponding to part of intermediate operations Fu Suanzi into the waiting queue of the local memory, and controls the input operation Fu Suanzi to read the data amount of the memory grid each time, so that the whole pipeline is kept balanced and runs at the slowest speed.
In one example, the backpressure mechanism may be applied between different servers, and the server producing the data must wait for an acknowledgement message from the server consuming the data to determine the amount of data it sends. Specifically, after receiving the intermediate processing result which can be processed by the current window from the shared queue in the memory of the other server, the server sends a confirmation message to the other server. The other server controls the input operation Fu Suanzi to read the data amount of the memory grid each time according to the confirmation message, and transmits the data of the next window to the server. The server calls a from account operator and receives an intermediate processing result which can be processed by the next window from a shared queue in the memory of other servers.
The server consuming the data is tested to be always ready for data processing, and an acknowledgement message is sent to the server producing the data every 100 milliseconds, and in case of a sufficiently large window, the amount of data received within 100 milliseconds (including the network link delay time) can be accommodated.
In one example, the size of the data amount processed per window is dynamically adjusted based on the processing rate. Specifically, the server calculates the size of the windows based on the event processing rate in a given task and adaptively reduces and enlarges the windows as the traffic changes, such that each window contains approximately 300 milliseconds of data in steady state.
S403: and the server adopts a third thread to call an output operator, and obtains a target processing result of the target task based on the processing results corresponding to the second threads.
In the embodiment of the application, different types of intermediate operator operators can keep the second thread on a kernel for a longer time by sharing the second thread, and avoid expensive context switching of an operating system, thereby reserving cache lines of a CPU and realizing high CPU utilization rate. However, this premise of efficiently performing the data stream processing task is that: the task unit is not blocked because once the task is blocked, execution progress of all cooperating threads will be compromised. Thus, similar to the input operator Source operator, embodiments of the present application also open a dedicated third thread to the output operator Sink operator that relies on a third party API.
Specifically, the server adopts a third thread to call the Sink operator, and based on the processing results corresponding to the second threads, the server outputs a final target processing result after processing the data stream of the target task.
Similarly, at most two Sink operators can be deployed on each server of the distributed system. The two Sink operators may share the same third thread, which executes the two Sink operators in a round-robin fashion. Specifically, each Sink operator is used as a task unit executed in a short time, after one Sink operator is executed, the occupation weight is automatically abandoned, and the execution engine schedules a third thread to execute the next Sink operator. The cyclic execution mode realizes high-efficiency CPU utilization rate, avoids the context switching of an operating system, and allows a third thread to be reserved on one kernel for a longer time, thereby reserving a CPU cache line.
In one example, for non-cooperative threads that are individually turned on for the Source operator and Sink operator, control is forced to be returned to the execution engine frequently (at least once per second) to achieve reasonable scheduling of each thread.
The embodiment of the application dispatches the operation Fu Suanzi which possibly causes the blocking of the computing task to the independent special non-cooperative thread, thereby reducing the interference of the blocked task unit to other task units and improving the processing efficiency of the data flow.
In the data stream parallel processing method designed based on the core contents of the data stream processing logic, the execution engine, the state back end and the like, in order to support the calculation of stream data (such as real global query service), the memory grid is used as the state back end for state storage, so that the target service can execute custom service logic and SQL query data in the memory grid.
In the data stream processing, stateless operations Fu Suanzi (such as a map operator, a Filter operator, etc.) and stateful operations Fu Suanzi (such as a Group calculation operator, a Combine operator, a Join operator, etc.) are generally included, where stateless operations Fu Suanzi can process any part of the currently received data stream, and large-scale parallel operations can be performed, and stateful operator operators need to rely on previous calculation results, involving data transfer, which needs to know which operator needs to be transferred to which operator is downstream, so that the distributed stream processing is generally performed in a partition manner.
For example, as shown in fig. 6A, the intermediate processing result of the Group Aggregate operator in the server 1 needs to be aggregated with the intermediate processing result of the Group Aggregate operator in the downstream server 2, that is, the data in the partition of the server 2, and the Combine operator of the server 1 have state calculation.
In the data stream processing process, an execution engine in a server needs to control scheduling of all threads so as to trigger state snapshot once at regular intervals according to a preset snapshot period, thereby realizing fault tolerance.
In one example, for each of the data stream read by the input operation Fu Suanzi, the intermediate processing result of the data stream processing by the intermediate operator, and the target processing result output by the output operator, the process of storing the data in the memory grid includes: the server calculates the partition ID according to the key of the data, and stores the data on the corresponding partition of the memory grid according to the partition ID.
The memory grid supports various data structures and interfaces, states in the system are stored in the memory grid in the form of key value pairs, and in order to realize the management of the states, in the embodiment of the application, the Map data structure is used for dividing a key space into disjoint partitions and distributing the disjoint partitions to server instances with state calculation, so that the connector can directly run on the memory grid with the state Map data structure stored, and the connector has high expandability and usability. In order to make the states more intensively stored in the memory, each partition of the server runs on the server, and each partition of the server corresponds to a key partition of one state, so that the partitions and the states are more closely related together, and the resource overhead of transmitting data across nodes is reduced.
In one example, after the server stores the data to the corresponding partition of the memory grid based on the key of the data, the server sets snapshot checkpoints at regular intervals to generate snapshots with consistent global states, after the server stores the states used by the connectors, the snapshot signals are transmitted downwards through the data flow direction relationship among operators, when the snapshot signals reach the receiving operation Fu Suanzi of the server outputting the data, all the connectors store the states in a stable storage device (such as an HDFS), and the state snapshot is completed. Specifically, the server generates a state snapshot based on data of various operators used for processing the data stream of the target task on each partition of the memory grid according to a preset snapshot period, and stores the data on each partition onto a stable disk, wherein the state snapshot is used for recording the processing result of the data stream stored on each partition on the memory grid at the current moment.
In order to recover data after failure, when the computing task of one server fails, all servers stop computing, and the latest state snapshot obtained before failure is recovered to the memory grid to replace the instance of computing failure, and the input operation Fu Suanzi is called by adopting the first thread, so that the data after the latest state snapshot is read to the memory grid.
In one example, the states in the system are stored in memory in the form of key-value pairs, the keys being stored in partitions that replicate across multiple servers in a server cluster. Specifically, each partition on the memory grid corresponds to a primary copy locally on a server, and corresponds to at least one backup copy on other servers, where the primary copy and the backup copy are used to store state snapshots of the corresponding partition, so as to implement fault tolerance and parallel processing.
The states are key-partitioned, and each server stores states corresponding to a particular key space. By equally distributing disjoint partitions for storing states to server instances of the distributed system, each server is capable of transmitting states through the server where the partition matched with the key is located, so that interaction with the server corresponding to the state is performed, and therefore all servers in the server cluster are fully scheduled for stream processing. Meanwhile, in order to solve the percentile delay problem, the partitions of the memory grid are aligned with the partitions of the execution engine, so that expensive and slow data repartitioning is avoided, and the data stream processing efficiency is improved.
For each partition, a primary copy is stored in the memory grid of one server, and one or more backup copies are stored on the memory grid on the other servers, with the primary copy and backup copies being used to store a state snapshot of the corresponding partition.
As shown in fig. 7, for a memory grid partition schematic diagram provided in an embodiment of the present application, it is assumed that 12 partitions are divided, where partitions PART1, PART4, PART7, and PART10 are allocated to server 1 in a distributed system, partitions PART2, PART5, PART8, and PART11 are allocated to server 2 in the distributed system, and partitions PART3, PART6, PART9, and PART12 are allocated to server 3 in the distributed system. The state snapshot of the connector in the server 1 is stored in main copies PART1, PART4, PART7 and PART10 of the local memory grid and is matched with a partition storing the state snapshot; meanwhile, the state snapshot of the connector in the server 1 is stored in the backup copies PART1 and PART4 of the memory grid of the server 2 and the backup copies PART7 and PART10 of the memory grid of the server 3.
Similarly, the state snapshot of the connector in server 2 is stored in primary copies PART2, PART5, PART8 and PART11 of the local memory grid, and in backup copies PART2, PART5 of the memory grid of server 1 and backup copies PART8, PART11 of the memory grid of server 3; the state snapshot of the connector in server 3 is stored in primary copies PART3, PART6, PART9 and PART12 of the local memory grid, and in backup copies PART3, PART6 of the memory grid of server 1 and backup copies PART9, PART12 of the memory grid of server 2.
In the embodiment of the application, the state snapshot is stored in the local memory grid of the server regularly and copied into the memory of other servers to realize high availability, so that the data transmission path among operators is kept to be executed locally as far as possible through the data reserved in the memory in the data stream processing process, and the data stream processing efficiency is improved.
It should be noted that, on which server each partition's backup copy is stored, it is configured according to the cluster formed by the memory grid.
In the embodiment of the application, the backup copies of the partitions are stored on the memory grids of other servers, so that when the server fails, the data and the state of the partitions can be recovered from the backup copies, the fault tolerance and the availability of the system are improved, and the data consistency is maintained after the failure.
In one example, when one server in a server cluster fails, its primary and backup copies on the memory grid may fail, the partitions corresponding to the memory grid of that server may be reassigned to other servers, and the backup copies of the partitions on the server may be upgraded to the primary copy and the backup copies on the other servers may be updated, thereby maintaining state integrity and availability.
For example, when server 1 in fig. 7 fails, the primary and backup copies of the state snapshot stored on the memory grid fail, while the backup copies of partitions PART1 and PART4 on server 1 are on the memory grid of server 2 and the backup copies of partitions PART7 and PART10 are on the memory grid of server 3, so that the partitions PART1 and PART4 of the server are reassigned to server 2, server 2 upgrades the backup copies of partitions PART1 and PART4 to the primary copy, partitions PART7 and PART10 of server 1 to server 3, server 3 upgrades the backup copies of partitions PART7 and PART10 to the primary copy, and simultaneously, server 2 and server 3 generate backup copies for the primary copies that are newly upgraded to each other, so that the state snapshot stored in partitions PART1, PART4, PART7 and PART10 on server 1 can be restored by server 2 and server 3 at the time of the failure, as shown in fig. 8.
In the embodiment of the application, the state in the distributed stream processing is divided into the key value pair sets, and the key space is divided into the disjoint partitions by using the memory grid of the Map data structure, so that the state management and the fault recovery based on the distributed, observable and queriable mapping data structure are realized. By using the memory grid to store the state in the memory, the method has higher availability and fault tolerance compared with other distributed streaming systems, and is independent of disk storage.
In addition to fault tolerance, the memory grid is also one of the keys of the distributed streaming system with elasticity and reconfiguration functions. Specifically, when there is a newly added server, a hash consistency algorithm may be adopted, partitions on other servers may be allocated to the newly added server to minimize migration between data, and a primary copy and a backup copy are added to the newly allocated partitions on a memory grid on the newly added server, where the partitions on the memory grid of each server after allocation are balanced, and after partition reassignment, a data stream processing process will be restarted, and a connector state will be initialized from a local primary copy of a snapshot of the latest state.
The distributed data stream processing method provided by the embodiment of the application can also ensure the accurate one-time processing of the data stream, wherein the accurate one-time processing needs to design a specific mechanism in the data stream processing, and the source end and the destination end are required to provide corresponding support. Since each data entry has a single impact on state, accurate one-time processing means that no computational tasks are performed while the snapshot checkpointing is taking place. In the case where a server needs to obtain multiple intermediate processing results from other servers, once a snapshot signal reaches an operator of one intermediate processing result, the operator needs to block the computational task and wait for the snapshot signal to reach other operators of the server. The embodiment of the application ensures the accurate one-time processing effect by setting the snapshot check point and the fault tolerance mechanism based on the distributed snapshot, and operators do not need to be blocked, thereby reducing the delay of data stream processing.
In one example, when the input operator identifies a data offset, the data offset may be recorded in the state snapshot, i.e., the Source operator has a data replay function. Thus, in the failure recovery process, the server uses the first thread to invoke the input operation Fu Suanzi, and the process of reading the data stream after the latest state snapshot to the memory grid includes: the server invokes the Source operator using the first thread, does not read the data stream before the offset, and reads the data stream after the offset into the memory grid to provide accurate one-time processing assurance.
In one example, when the Source operator does not have a data replay function, but can receive a response message, such that after the entire pipeline data processing is successful and the state snapshot is stored, the server sends a response message to the Source operator to indicate that the state snapshot obtained based on the data stream read by the Source operator can be safely deleted to ensure accurate one-time processing of the data.
Since generating the state snapshot and sending the response message is not atomic, the computing task may fail before all Source operators in the distributed system receive the response message. The server may resend the unacknowledged response message. Therefore, in order to avoid the repeated processing of the data, the response message carries the identifiers of the state snapshots, and the server records the identifiers of all the state snapshots by adopting the global state snapshot, so that when the identifiers carried by the response message are already recorded in the global state snapshot, the server does not call the input operation Fu Suanzi to load the data flow recorded by the state snapshot corresponding to the identifiers, and the message is deduplicated through the identifiers of the state snapshots.
In the embodiment of the application, accurate one-time processing guarantee can be realized by outputting an operator Sink operator, and transactional and idempotent writing is supported when the output result of the Sink operator is transmitted to a downstream application program.
The transactional is that the Sink operator pauses output and provides the result to the downstream application program when the snapshot checkpoint is completed, and the output result is essentially submitted in two stages to achieve precise one-time transfer, wherein the first stage is a stage of preparation for the submission, and is executed at the beginning of the snapshot checkpoint, and the second stage of submission occurs after the snapshot checkpoint is completed, and at the moment, the corresponding state is persisted and submitted successfully. Specifically, when a state snapshot is generated based on the data of the Sink operator on each partition of the memory grid, the server stops sending the processed data to the downstream application, and after the state snapshot is produced, the server continues sending the processed data to the downstream application.
Idempotent writing means that the Sink operator uses the recorded state identifier to output, because the server needs to periodically perform state snapshot to intercept old data and clear the memory grid, so that the state identifier remains unique in a short time, thereby ensuring accurate one-time processing. Specifically, after the server generates the state snapshot, the server generates an identifier for the state snapshot, and when the identifier is not in the global snapshot, the server stores the data of the state snapshot record corresponding to the identifier on a disk, and periodically cleans the identifier in the global snapshot.
Considering that garbage collection can affect the calculation delay of data stream processing, when the CPU core number of the server is greater than three, the data stream processing and garbage collection can be respectively carried out by using a thread pool, wherein the thread number in the thread pool is the same as the CPU core number. Specifically, the server takes at least one fourth thread as a garbage collector to collect processed data in the memory grid in the data stream processing process. By using several fourth threads to perform garbage collection exclusively, the rest of the threads are used for data stream processing, so that garbage collection is allowed to be performed as a background service concurrently with data stream processing, scheduling of threads on a CPU core is not interfered, each thread is ensured to stay on the CPU core for a longer time, throughput is improved, and calculation delay of data stream processing is reduced.
The distributed data stream processing method based on the memory grid provided by the embodiment of the application realizes a stream processing mode with high throughput and low time delay. On the one hand, the JVM threads are scheduled and executed on the programming language level, and the JVM threads are similar to a small computing unit which can suspend and recover the cooperative threads and the green threads, so that expensive operating system context switching is avoided, the execution threads are allowed to stay on the same kernel for a longer time, and the utilization rate of a CPU is improved. On the other hand, a special thread is started for garbage collection, so that the interference of garbage collection on data stream processing is reduced to the greatest extent, and the processing efficiency of the data stream is improved. In yet another aspect, a complete data stream processing procedure is deployed on the kernel of each server of the server cluster, thereby minimizing network connections for data transfer and reducing data transfer delays. In addition, the data and states are stored using a distributed memory grid, thereby providing scalable and highly fault tolerant state management.
Through experimental measurement, a single server in the embodiment of the application can aggregate tens of millions of data per second, and can simultaneously support hundreds of concurrent tasks to run on the same JVM thread, so that the percentile delay of data stream processing is reduced to less than 10 milliseconds.
The data flow distributed processing method based on the memory grid provided by the embodiment of the application can be applied to various business scenes such as multimedia resource recommendation, finance and the like, such as advertising dynamic bidding, abnormal transaction detection, object behavior real-time analysis, maintenance materialized view and the like, and can effectively accelerate the processing of big data.
Based on the same technical conception, the embodiment of the application provides a structural schematic diagram of a distributed data stream processing device based on a memory grid, which is provided with a device with a plurality of cores, is applied to a distributed system, each core bears a thread, a plurality of operators for carrying out data stream processing on a target task are deployed on each core, and different operators are used for executing different data stream processing operations.
Referring to fig. 9, the processing apparatus includes an input module 901, a data processing module 902, and an output module 903;
the input module 901 is configured to invoke an input operation Fu Suanzi by using a first thread, read a data stream of the target task, and store the read data stream in a memory grid of the server;
the data processing module 902 is configured to use a second thread, call different types of intermediate operators in sequence according to preset data stream processing logic of the target task to process a data stream stored in the memory grid, and use an intermediate processing result of processing the data stream in the memory grid by using the last called intermediate operator as a processing result of the second thread; wherein each processing result is obtained based on an intermediate processing result of the device processing the data stream of the target task and an intermediate processing result of the other devices processing the data stream of the target task;
when one intermediate operator is called each time, storing an intermediate processing result obtained after the intermediate operator processes the data flow in the memory grid into the memory grid, and transmitting the intermediate processing result to the next intermediate operator for processing;
The output module 903 is configured to invoke an output operator by using a third thread, and obtain a target processing result of the target task based on a processing result corresponding to each second thread.
Optionally, the different types of intermediate operator operators include a send operator and a receive operation Fu Suanzi for exchanging intermediate processing results with other servers;
the data processing module 902 is specifically configured to: when the intermediate operator invoked by the second thread is the sending operation Fu Suanzi, obtaining an intermediate processing result of part of the intermediate operator from a shared queue of a local memory, and sending the obtained intermediate processing result to the shared queue of the memory of other devices;
and when the intermediate operator called by the second thread is the receiving operation Fu Suanzi, receiving an intermediate processing result of part of the intermediate operator operators from a shared queue of the device memory, and storing the received intermediate processing result into the shared queue of the local memory.
Optionally, the data processing module 902 is specifically configured to:
when the called intermediate operator is a non-receiving operation Fu Suanzi, the second thread is adopted to process part of intermediate processing results received from other servers and stored in a shared queue of the local memory, and the processed intermediate processing results are removed from the shared queue of the local memory;
And when the called intermediate operator is the receiving operation Fu Suanzi, adopting the second thread to receive part of intermediate processing results from the shared queue of the other device memory again so as to fill the shared queue of the local memory again until the data stream processing of the target task is completed.
Optionally, the data processing module 902 is specifically configured to:
when the called intermediate operator is a non-sending operation Fu Suanzi, the second thread is adopted to process the data stream stored in the memory grid, and the intermediate processing result of the processed part of intermediate operator is stored in a shared queue of the local memory;
and when the called intermediate operator is the sending operator, calculating a partition ID corresponding to the corresponding intermediate processing result by adopting the second thread based on keys of intermediate processing results of part of intermediate operator operators, and sending part of intermediate processing results to shared queues of other device memories corresponding to the corresponding partition ID.
Optionally, when the shared queue of the server in the local memory is full, the data processing module 902 is specifically configured to: storing intermediate processing results corresponding to part of intermediate operations Fu Suanzi into a waiting queue of a local memory;
The output module 901 is specifically configured to: the input operation Fu Suanzi is controlled to read the amount of data to the memory grid each time.
Optionally, the data processing module 902 is specifically configured to:
after receiving the intermediate processing result which can be processed by the current window from the shared queue in the memory of the other device, sending a confirmation message to the other device, wherein the confirmation message is used for indicating the other device to send the data of the next window and controlling the input operation Fu Suanzi to read the data quantity of the memory grid each time;
receiving an intermediate processing result which can be processed by a next window from a shared queue of the memory of other devices; wherein the size of the data volume processed by each window is dynamically adjusted according to the processing rate.
Optionally, the apparatus further includes a storage module 904, for each of the data stream, the intermediate processing result, and the processing result, the storage module being specifically configured to:
calculating a partition ID according to the key of the data;
according to the partition ID, the data are stored on the corresponding partition of the memory grid; each partition on the memory grid corresponds to a primary copy locally, and corresponds to at least one backup copy on other devices, and the primary copy and the backup copy are used for storing state snapshots of the corresponding partition.
Alternatively, when the device fails, the partitions on the device may be reassigned to other devices, and the backup copies of the partitions on the other devices may be upgraded to primary copies, and the backup copies on the other devices may be updated.
Optionally, when a device is newly added in the distributed system, partitions on other devices are allocated to the newly added device, and a primary copy and a backup copy are added to the newly allocated partition on a memory grid of the newly added device, where the partitions on the memory grid of each device after allocation are balanced.
Optionally, the data processing module 902 is specifically configured to:
generating a state snapshot based on data of various operators used for processing the data flow of the target task on each partition of the memory grid according to a preset snapshot period, and storing the data on each partition onto a disk, wherein the state snapshot is used for recording the processing result of the data flow stored in each partition on the memory grid at the current moment;
and stopping calculation after the calculation task fails, recovering the latest state snapshot obtained before the failure into the memory grid, calling an input operation Fu Suanzi by adopting the first thread, and reading the data flow after the latest state snapshot to the memory grid.
Optionally, when the input operator identifies a data offset, the data offset is recorded in the state snapshot;
the input module 901 is specifically configured to: and calling the input operator by adopting the first thread, and reading the data stream after the offset into the memory grid without reading the data stream before the offset.
Optionally, when data of multiple operators used for processing the data stream of the target task on each partition of the memory grid is stored to a disk, and a corresponding state snapshot is generated, the input module 901 is specifically configured to:
a response message is sent to the first thread to indicate that the state snapshot obtained based on the data flow read by the input operation Fu Suanzi can be safely deleted.
Optionally, the response message carries an identifier of the status snapshot, and after the response message is sent, the input module 901 is specifically configured to:
when the identifier carried by the response message is already recorded in the global state snapshot, the input operation Fu Suanzi is not called to load the data stream of the state snapshot record corresponding to the identifier, wherein the global state snapshot is used for recording the identifiers of all the state snapshots.
Optionally, the data processing module 902 is specifically configured to:
stopping sending the processed data to a downstream application when generating a state snapshot based on the data of the output operation Fu Suanzi on each partition of the memory grid;
and after the state snapshot is generated, continuously sending the processed data to the downstream application.
Optionally, after generating the status snapshot, the data processing module 902 is specifically configured to:
generating an identification for the state snapshot;
when the identifier is not in the global snapshot, storing the data of the state snapshot record corresponding to the identifier to a disk; wherein the identities in the global snapshot are cleaned up periodically.
Optionally, the data flow processing logic of the target task and the data flow relationships among the multiple operators are predefined in a pipeline API facing the developer, and the processing operations of the multiple operators are fine-tuned in a core API facing the distributed flow processing engine.
Optionally, when the number of cores is greater than three, the apparatus further includes a garbage collection module 905 configured to:
and taking the fourth thread as a garbage recoverer to recover the processed data in the memory grid in the data stream processing process.
In the memory grid-based distributed data stream processing device provided by the embodiment of the application, when the data stream of the target task is processed, the input operator and the output operation Fu Suanzi in the used multiple operators depend on the third-party API and cannot be cooperatively called, so that blocking can be caused, and therefore, one thread is respectively started on multiple cores of each server of the distributed system, and thus, a special thread can be independently allocated for the input operator and the output operator, thereby ensuring the fluency of data stream processing and improving the processing speed of the data stream; meanwhile, each thread opened on the other cores can be shared by different types of intermediate operator operators, so that the shared threads can stay on the same core for a longer time, the context switching of an operating system is reduced, the utilization rate of the core is improved, and the time delay of distributed data stream processing is further reduced. On the other hand, a plurality of operators for carrying out data stream processing on the target task are deployed on each kernel, different operators are used for executing different data stream processing operations, wherein an input operator is used for reading the data stream of the target task into a memory grid of a server, different types of intermediate operator operators are used for processing the data stream stored in the memory grid, the input operator is used for outputting a target processing result of completing the data stream processing on the target service, the intermediate processing result of each intermediate operator is stored in the memory grid and is transmitted to the next intermediate operator for subsequent processing, and the intermediate processing results can be exchanged among different servers through the intermediate operator operators, so that the fusion of the distributed data stream processing results is realized. Because the multiple operators deployed on the kernel where each thread is located and the data stream processing logic of the target task is preset, each thread can call the corresponding operators in order according to the data stream processing logic to perform data stream processing, the acquisition order of the data stream is not required to be concerned, the unordered data stream processing process is realized, and the data stream processing efficiency is further improved.
The embodiment of the application also provides electronic equipment based on the same conception as the embodiment of the method. In one embodiment, the electronic device may be a server in a distributed system. In this embodiment, the electronic device may be configured as shown in fig. 10, including a memory 1001, a communication module 1003, and one or more processors 1002.
Memory 1001 for storing computer programs for execution by processor 1002. The memory 1001 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an operating instruction set, and the like.
The memory 1001 may be a volatile memory (RAM) such as a random-access memory (RAM); the memory 1001 may also be a nonvolatile memory (non-volatile memory), such as a read-only memory (rom), a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or memory 1001 is any other medium that can be used to carry or store a desired computer program in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. Memory 1001 may be a combination of the above.
The processor 1002 may include one or more central processing units (central processing unit, CPU) or digital processing units, or the like. Processor 1002 is configured to implement the above-described memory grid-based distributed data stream processing method when calling a computer program stored in memory 1001.
The communication module 1003 is used for communicating with a terminal device and other servers.
The specific connection medium between the memory 1001, the communication module 1003, and the processor 1002 is not limited in the embodiment of the present application. The embodiment of the present application is shown in fig. 10, where the memory 1001 and the processor 1002 are connected by a bus 1004, where the bus 1004 is shown in bold in fig. 10, and the connection between other components is merely illustrative, and not limiting. The bus 1004 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 10, but only one bus or one type of bus is not depicted.
The memory 1001 stores a computer storage medium, and the computer storage medium stores computer executable instructions for implementing the memory grid-based distributed data stream processing method according to the embodiment of the present application. The processor 1002 is configured to perform the steps of the memory grid-based distributed data stream processing method described above.
In some possible embodiments, aspects of the data classification method provided by the present application may also be implemented in the form of a program product, which comprises a computer program for causing an electronic device to perform the steps of the memory grid based distributed data stream processing method according to the various exemplary embodiments of the application described herein above when the program product is run on the electronic device, e.g. the electronic device may perform the steps as shown in fig. 4.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a random access memory, a read-only memory, an erasable programmable read-only memory, an optical fiber, a portable compact disk read-only memory, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product of embodiments of the present application may employ a portable compact disc read-only memory and comprise a computer program and may run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.
The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer programs for performing the operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer program may execute entirely on the user's computing device, partly on the user's computing device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a local area network or a wide area network, or may be connected to an external computing device.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having a computer-usable computer program embodied therein.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (20)

1. The data stream processing method based on the memory grid is characterized by being applied to a distributed system, wherein the distributed system comprises at least two servers with a plurality of cores, each core bears a thread, a plurality of operators for performing data stream processing on a target task are deployed on each core, and different operators are used for executing different data stream processing operations; the method comprises the following steps:
the server adopts a first thread to call an input operation Fu Suanzi, reads the data stream of the target task, and stores the read data stream into a memory grid of the server;
The server adopts a second thread, sequentially calls intermediate operators of different types to process the data flow stored in the memory grid according to the preset data flow processing logic of the target task, and uses the intermediate processing result of the last called intermediate operator to process the data flow in the memory grid as the processing result of the second thread; wherein each processing result is obtained based on an intermediate processing result of the server processing the data stream of the target task and an intermediate processing result of the other servers processing the data stream of the target task;
when one intermediate operator is called each time, storing an intermediate processing result obtained after the intermediate operator processes the data flow in the memory grid into the memory grid, and transmitting the intermediate processing result to the next intermediate operator for processing;
and the server adopts a third thread to call an output operator, and obtains a target processing result of the target task based on the processing result corresponding to each second thread.
2. The method of claim 1, wherein the different types of intermediate operator operators include a send operator and a receive operation Fu Suanzi that exchange intermediate processing results with other servers;
When the intermediate operator called by the second thread is the sending operator, the server acquires an intermediate processing result of part of the intermediate operator from a shared queue of a local memory, and sends the acquired intermediate processing result to the shared queue of other server memories;
when the intermediate operator invoked by the second thread is the receiving operation Fu Suanzi, the server receives intermediate processing results of part of the intermediate operator operators from the shared queue of the other server memory, and stores the received intermediate processing results in the shared queue of the local memory.
3. The method of claim 2, wherein the server uses each second thread to sequentially call different types of intermediate operator operators to process the data stream stored in the memory grid according to the preset data stream processing logic of the target task, and the method comprises the steps of:
when the called intermediate operator is a non-receiving operation Fu Suanzi, the server processes part of intermediate processing results stored in the shared queue of the local memory and received from other servers by adopting the second thread, and removes the processed intermediate processing results from the shared queue of the local memory;
When the called intermediate operator is the receiving operation Fu Suanzi, the server adopts the second thread to receive part of intermediate processing results from the shared queue of other server memories again so as to fill the shared queue of the local memory again until the data stream processing of the target task is completed.
4. The method of claim 2, wherein the server uses each second thread to sequentially call different types of intermediate operator operators to process the data stream stored in the memory grid according to the preset data stream processing logic of the target task, and the method comprises the steps of:
when the called intermediate operator is a non-sending operation Fu Suanzi, the server adopts the second thread to process the data stream stored in the memory grid, and stores the intermediate processing result of the processed part of intermediate operator into a shared queue of the local memory;
when the called intermediate operator is the sending operator, the server adopts the second thread, calculates the partition ID corresponding to the corresponding intermediate processing result based on the key of the intermediate processing result of part of the intermediate operator, and sends the part of the intermediate processing result to the shared queue of other server memories corresponding to the corresponding partition ID.
5. The method of claim 2, wherein processing the data stream stored in the memory grid when the shared queue of the server in the local memory is full comprises:
the server stores the intermediate processing result corresponding to part of intermediate operation Fu Suanzi into a waiting queue of a local memory;
the server controls the amount of data that the input operation Fu Suanzi reads to the memory grid each time.
6. The method of claim 2, wherein the server receiving intermediate processing results of a portion of the intermediate operator from shared queues in other server memory, comprises:
after receiving the intermediate processing result which can be processed by the current window from the shared queue in the memory of the other servers, the server sends a confirmation message to the other servers, wherein the confirmation message is used for indicating the other servers to send the data of the next window and controlling the input operation Fu Suanzi to read the data quantity of the memory grid each time;
the server receives an intermediate processing result which can be processed by the next window from a shared queue of other server memories; wherein the size of the data volume processed by each window is dynamically adjusted according to the processing rate.
7. The method of claim 1, wherein storing each of the data stream, the intermediate processing result, and the processing result into the memory grid comprises:
the server calculates a partition ID according to the key of the data;
the server stores the data to the corresponding partition of the memory grid according to the partition ID; each partition on the memory grid corresponds to a primary copy locally on the server and at least one backup copy on other servers, and the primary copy and the backup copies are used for storing state snapshots of the corresponding partitions.
8. The method of claim 7, wherein when the server fails, partitions of the server are reassigned to other servers, and backup copies of partitions of the server on other servers are updated to primary copies, and backup copies on the other servers are updated.
9. The method of claim 7, wherein when a server is newly added to the distributed system, partitions on other servers are allocated to the newly added server, and a primary copy and a backup copy are added to the newly allocated partition on a memory grid of the newly added server, wherein the partitions on the memory grid of each server after allocation are balanced.
10. The method of claim 7, wherein the server stores the data to the corresponding partition on the memory grid based on the partition ID corresponding to the key of the data, the method comprising:
the server generates a state snapshot based on data of various operators used for processing the data stream of the target task on each partition of the memory grid according to a preset snapshot period, and stores the data on each partition on a disk, wherein the state snapshot is used for recording the processing result of the data stream stored in each partition on the memory grid at the current moment;
and after the calculation task fails, the server stops calculation, restores the latest state snapshot obtained before the failure to the memory grid, adopts the first thread to call an input operation Fu Suanzi, and reads the data flow after the latest state snapshot to the memory grid.
11. The method of claim 10, wherein when the input operator identifies a data offset, the data offset is recorded in the state snapshot;
the step of using the first thread to invoke the input operation Fu Suanzi, reading the data stream after the latest state snapshot to the memory grid includes:
The server invokes the input operator with the first thread, does not read the data stream before the offset, and reads the data stream after the offset into the memory grid.
12. The method of claim 10, wherein after the data on each partition of the memory grid for the plurality of operators used to process the data stream for the target task is stored to disk and the corresponding state snapshot is generated, the method comprises:
the second thread sends a response message to the first thread to indicate that the state snapshot obtained based on the data flow read by the input operation Fu Suanzi can be safely deleted.
13. The method of claim 12, wherein the response message carries an identification of a status snapshot, and wherein after the response message is sent, the method comprises:
when the identifier carried by the response message is already recorded in the global state snapshot, the server does not call the input operation Fu Suanzi to load the data stream of the state snapshot record corresponding to the identifier, wherein the global state snapshot is used for recording the identifiers of all the state snapshots.
14. The method of claim 10, wherein when the server generates a state snapshot, the method comprises:
When a status snapshot is generated based on the data of the output operation Fu Suanzi on each partition of the memory grid, the server stops sending processed data to downstream applications;
and after the state snapshot is generated, the server continues to send the processed data to the downstream application.
15. The method of claim 10, wherein after the server generates the state snapshot, the method comprises:
the server generates an identifier for the state snapshot;
when the identifier is not in the global snapshot, the server stores the data of the state snapshot record corresponding to the identifier on a disk; wherein the identities in the global snapshot are cleaned up periodically.
16. The method of any of claims 1-15, wherein the data flow processing logic of the target task and the data flow relationships between the plurality of operators are predefined in a developer-oriented pipeline API, and wherein the processing operations of the plurality of operators are fine-tuned in a core API of the distributed stream processing engine.
17. The method of any of claims 1-15, wherein when the number of cores of the server is greater than three cores, the method comprises:
And the server takes the fourth thread as a garbage recoverer to recover the processed data in the memory grid in the data stream processing process.
18. The stream processing device based on the memory grid is characterized by comprising a device with a plurality of cores, wherein the device is applied to a distributed system, each core bears a thread, a plurality of operators for carrying out data stream processing on a target task are deployed on each core, and different operators are used for executing different data stream processing operations;
the input module is used for calling an input operation Fu Suanzi by adopting a first thread, reading the data stream of the target task and storing the read data stream into a memory grid of the server;
the data processing module is used for sequentially calling intermediate operators of different types to process the data streams stored in the memory grid by adopting each second thread according to the preset data stream processing logic of the target task, and taking the intermediate processing result of the last called intermediate operators to process the data streams in the memory grid as the processing result of the second thread; wherein each processing result is obtained based on an intermediate processing result of the device processing the data stream of the target task and an intermediate processing result of the other devices processing the data stream of the target task;
When one intermediate operator is called each time, storing an intermediate processing result obtained after the intermediate operator processes the data flow in the memory grid into the memory grid, and transmitting the intermediate processing result to the next intermediate operator for processing;
and the output module is used for calling and outputting an operator by adopting a third thread and obtaining a target processing result of the target task based on the processing results corresponding to the second threads.
19. An electronic device comprising a processor and a memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-17.
20. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to perform the steps of the method of any one of claims 1-17 when said computer program is run on the electronic device.
CN202310864300.3A 2023-07-14 2023-07-14 Distributed data stream processing method, device and equipment based on memory grid Pending CN116954944A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310864300.3A CN116954944A (en) 2023-07-14 2023-07-14 Distributed data stream processing method, device and equipment based on memory grid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310864300.3A CN116954944A (en) 2023-07-14 2023-07-14 Distributed data stream processing method, device and equipment based on memory grid

Publications (1)

Publication Number Publication Date
CN116954944A true CN116954944A (en) 2023-10-27

Family

ID=88447123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310864300.3A Pending CN116954944A (en) 2023-07-14 2023-07-14 Distributed data stream processing method, device and equipment based on memory grid

Country Status (1)

Country Link
CN (1) CN116954944A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472517A (en) * 2023-12-28 2024-01-30 广州睿帆科技有限公司 Method for distributed processing of FTP files based on Flink

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472517A (en) * 2023-12-28 2024-01-30 广州睿帆科技有限公司 Method for distributed processing of FTP files based on Flink
CN117472517B (en) * 2023-12-28 2024-03-08 广州睿帆科技有限公司 Method for distributed processing of FTP files based on Flink

Similar Documents

Publication Publication Date Title
CN107479990B (en) Distributed software service system
US9792060B2 (en) Optimized write performance at block-based storage during volume snapshot operations
US10373247B2 (en) Lifecycle transitions in log-coordinated data stores
Dahiphale et al. An advanced mapreduce: cloud mapreduce, enhancements and applications
EP3195117B1 (en) Automated configuration of log-coordinated storage groups
CN110019514B (en) Data synchronization method and device and electronic equipment
CN109117252B (en) Method and system for task processing based on container and container cluster management system
CN103140842A (en) System and method for providing flexible storage and retrieval of snapshot archives
US11604682B2 (en) Pre-emptive container load-balancing, auto-scaling and placement
Chen et al. Scalable service-oriented replication with flexible consistency guarantee in the cloud
CN112463290A (en) Method, system, apparatus and storage medium for dynamically adjusting the number of computing containers
CN116954944A (en) Distributed data stream processing method, device and equipment based on memory grid
CN115878301A (en) Acceleration framework, acceleration method and equipment for database network load performance
Shrivastava et al. Supporting transaction predictability in replicated DRTDBS
Dreher et al. Manala: a flexible flow control library for asynchronous task communication
Afonso Mechanisms for providing causal consistency on edge computing
Gankevich et al. Novel approaches for distributing workload on commodity computer systems
Gu et al. Arana: A cross-domain workflow scheduling system
Kougkas et al. Bridging Storage Semantics Using Data Labels and Asynchronous I/O
Holenko et al. The impact of service semantics on the consistent recovery in SOA
Andler et al. DeeDS NG: Architecture, design, and sample application scenario
US20240134698A1 (en) Serverless computing using resource multiplexing
Ayyalasomayajula et al. Experiences running mixed workloads on cray analytics platforms
Luu et al. Spark Streaming
Chen et al. Design and Implementation of Dynamic Expansion of Vehicle Real-Time Data Transmission Architecture Based on Kubernetes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication