CN106708865B - Method and device for accessing window data in stream processing system - Google Patents

Method and device for accessing window data in stream processing system Download PDF

Info

Publication number
CN106708865B
CN106708865B CN201510783099.1A CN201510783099A CN106708865B CN 106708865 B CN106708865 B CN 106708865B CN 201510783099 A CN201510783099 A CN 201510783099A CN 106708865 B CN106708865 B CN 106708865B
Authority
CN
China
Prior art keywords
data structure
window
memory partition
data
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510783099.1A
Other languages
Chinese (zh)
Other versions
CN106708865A (en
Inventor
单卫华
杨磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510783099.1A priority Critical patent/CN106708865B/en
Publication of CN106708865A publication Critical patent/CN106708865A/en
Application granted granted Critical
Publication of CN106708865B publication Critical patent/CN106708865B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for accessing window data in a stream processing system, wherein the method comprises the following steps: receiving an access request of window data sent by a client, wherein the access request carries window indication information which indicates a distributed sliding window for storing the window data; determining a distributed data structure of the window data in the distributed sliding window according to the window indication information, wherein the distributed data structure comprises a plurality of data structure fragments; determining host information for storing each data structure fragment according to the feature identifier of each data structure fragment in the plurality of data structure fragments; and accessing each data structure fragment according to the host information for storing the each data structure fragment. The method and the device for accessing the window data in the stream processing system can realize the distributed storage of the window data, break through the bottleneck problem of limited single-machine memory capacity, and simultaneously improve the reliability of the window data.

Description

Method and device for accessing window data in stream processing system
Technical Field
The present invention relates to the field of information technology, and more particularly, to a method and apparatus for accessing window data in a stream processing system.
Background
The sliding window is a basic concept in the field of stream processing, the sliding window is a container for caching historical data of a data stream with a certain time length, the sliding window is realized in a single-host memory in the prior art, the sliding window realized in a single computer is a single-computer sliding window for short, and for the single-computer sliding window, window data are stored in task (task) context; the task fault, the execution unit (execution) fault, the process fault and the host fault all cause window data loss and cannot be recovered, namely the reliability of the window data cannot be guaranteed; for massive data processing, especially for scenes depending on massive historical data for calculation, a single-machine sliding window cannot meet the requirements, so the total capacity of the single-machine sliding window is limited by the single-machine memory capacity, and a method for accessing window data in a stream processing system, which can solve the problems, is needed.
Disclosure of Invention
The embodiment of the invention provides a method and a device for accessing window data in a stream processing system, which can realize distributed storage of the window data, thereby breaking through the bottleneck problem of limited single-machine memory capacity.
In a first aspect, a method for accessing window data in a stream processing system is provided, the method comprising: receiving an access request of window data sent by a client, wherein the access request carries window indication information, and the window indication information indicates a distributed sliding window for storing the window data; determining a distributed data structure of the window data in the distributed sliding window according to the window indication information, wherein the distributed data structure comprises a plurality of data structure fragments, and the plurality of data structure fragments are located on at least two hosts; acquiring first memory partition information for storing each data structure fragment according to the feature identifier of each data structure fragment in the plurality of data structure fragments; determining host information for storing each data structure fragment according to the first memory partition information; and accessing each data structure fragment according to the host information for storing each data structure fragment.
With reference to the first aspect, in a first implementation manner of the first aspect, the distributed data structure includes the plurality of data structure fragments and copies of the plurality of data structure fragments, and each data structure fragment of the plurality of data structure fragments and the copy of each data structure fragment are located on different hosts, and the method further includes: acquiring second memory partition information for storing a copy of each data structure fragment according to the feature identifier of each data structure fragment; determining host information for storing the copy of each data structure fragment according to the second memory partition information; and accessing each data structure fragment according to the host information for storing the copy of each data structure fragment.
With reference to the first aspect and the foregoing implementation manner of the first aspect, in a second implementation manner of the first aspect, the determining, according to the window indication information, a distributed data structure of the window data in the distributed sliding window includes: and determining the distributed data structure of the window data of the distributed sliding window in the distributed sliding window according to the window name.
With reference to the first aspect and the foregoing implementation manner of the first aspect, in a third implementation manner of the first aspect, the determining, according to the window indication information, a distributed data structure of the window data in the distributed sliding window, where the window indication information is a child window name of a child window in the distributed sliding window includes: and determining the distributed data structure of the window data of the child window in the distributed sliding window according to the child window name.
With reference to the first aspect and the foregoing implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the determining, according to the first memory partition information, host information for storing each data structure slice includes: and acquiring host information for storing each data structure fragment according to the first memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
With reference to the first aspect and the foregoing implementation manner, in a fifth implementation manner of the first aspect, the determining, according to the second memory partition information, host information for storing a copy of each data structure fragment, where the second memory partition information is a second memory partition identifier includes: and acquiring host information for storing the copy of each data structure fragment according to the second memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
With reference to the first aspect and the foregoing implementation manner, in a sixth implementation manner of the first aspect, the obtaining first memory partition information that stores each data structure fragment according to the feature identifier of each data structure fragment of the multiple data structure fragments is a first memory partition identifier, and the obtaining includes:
converting the characteristic identifier of each data structure fragment into binary data;
calculating the binary data by using a hash algorithm to obtain a hash result;
and determining the result obtained by modulo the preset value of the hash result as a first memory partition identifier of each data structure fragment.
With reference to the first aspect and the foregoing implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the data structure of the window data is one of:
distributed list structures, distributed dictionary structures, distributed collection structures.
In a second aspect, there is provided an apparatus for accessing window data in a stream processing system, the apparatus comprising modules for performing the method of the first aspect.
In a third aspect, an apparatus for accessing window data in a stream processing system is provided, the apparatus comprising a memory and a processor connected to the memory, the memory being configured to store instructions, and the processor being configured to execute the instructions stored in the memory, and when the processor executes the instructions stored in the memory, the processor is specifically configured to perform the method in the first aspect.
Based on the technical scheme, the method and the device for accessing the window data in the stream processing system can divide the window data into a plurality of data structure fragments, and store the data structure fragments on a plurality of hosts in a distributed manner, so that the bottleneck problem that the single-machine memory capacity is limited is broken through, and the reliability of the window data is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a system framework diagram of a stream processing system according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of an execution engine portion of a single host in a stream processing system according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a distributed sliding window manager according to an embodiment of the present invention.
Fig. 4 is a schematic flow chart of a method of accessing window data in a stream processing system according to an embodiment of the present invention.
Fig. 5 is a schematic block diagram of an apparatus for accessing window data in a stream processing system according to an embodiment of the present invention.
Fig. 6 is a schematic block diagram of an apparatus for accessing window data in a stream processing system according to another embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical solution of the present invention may be applied to a Stream processing system, for example, a Storm system, a Stream system of IBM, or an S4 system of Yahoo, and the like.
Before describing the method for accessing Window data in a stream processing system according to an embodiment of the present invention, first, a system framework of a next stream processing system is described, and fig. 1 shows a system framework diagram of a stream processing system according to an embodiment of the present invention, where the system includes a plurality of hosts, and optionally, a Distributed Sliding Window Manager (DSWM) may be deployed on each host for implementing Distributed Sliding windows, that is, implementing Distributed storage of Sliding Window data on the plurality of hosts, and in the stream processing system, communication may be performed between each DSWM to ensure sharing of data information between different hosts. Fig. 2 shows a schematic diagram of an execution engine part under a single host in the Storm system, in which a manager (hypervisor) listens to the Work of the host assigned to it, and can start/close multiple Work processes (Work processes) as required, each of which can be completed by one or more execution units (executors) by executing one or more tasks (tasks), each of which independently manages the storage of distributed sliding window data in a memory. As shown in FIG. 2, the manager may manage multiple work processes, each of which may include multiple tasks that may be performed by one or more execution units. The distributed sliding window mentioned in the embodiment of the present invention may belong to each task. Each task can independently apply, use and release the distributed sliding window, and the distributed sliding window can exist in the memory of each host.
Optionally, the system component for implementing the distributed sliding window in the embodiment of the present invention may be a DSWM, which is capable of implementing distributed storage of window data. Optionally, one DSWM may be deployed on each host in the stream processing system to implement sharing of window data between the hosts, one DSWM may also be shared by multiple hosts to implement sharing of window data between the hosts, and one total DSWM may also be deployed in the stream processing system to manage all DSWNs deployed by the stream processing system to implement sharing of window data between DSWM on the hosts, and the like, which is not limited by the embodiment of the present invention.
Optionally, each DSWM stores global information in the stream processing system, such as memory partition information of each host, network protocol (IP) information, memory information for storing window data, and the like, so that it is possible to cope with the problem of window data loss due to the fact that information cannot be shared between the hosts when a single host fails.
Optionally, a communication link may be maintained between each DSWM to ensure sharing of information, for example, if data stored on a host managed by a first DSWM changes, the first DSWM may notify other DSWM in the stream processing system to update the data information in a broadcast manner.
In this embodiment of the present invention, as shown in fig. 3, the distributed sliding window manager may include a distributed sliding window management module, a distributed data structure implementation module, and a memory partition management module. The distributed sliding window management module can be used for realizing the functions of creating and destroying the distributed sliding window, judging the existence of the window, writing or reading window data and the like; the distributed data structure implementation module can implement a distributed data structure of window data in a distributed sliding window, such as a distributed list structure, a distributed dictionary structure, a distributed set structure and the like, and implement binding of the distributed data structure and the memory partitions, so as to query the data structure fragments of the corresponding distributed data structure according to the memory partition identification, implement balanced storage of massive window data in the stream processing system, and the like; the memory partition management module can be used for realizing the unified management of memory resources in the stream processing system, and can also initialize the memory partitions, including a main partition and a backup partition, numbering the memory partitions, and the like.
Optionally, the DSWM may further include a cluster management module or a network management module, where the cluster management module may be configured to manage all hosts in the stream processing system, record host information, and trigger an upper management module to perform corresponding data update when a host joins the stream processing system or exits the stream processing system, and after the network service of the stream processing system is started, the cluster management module may start service management on each host in the stream processing system, for example, may initialize each DSWM and a data structure in the stream processing system, and update address information of each host in the stream processing system; the network management module may be used to maintain communication connections between hosts in the stream processing system.
FIG. 4 shows a schematic flow diagram of a method 100 for accessing window data in a stream processing system, the method 100 being executable by the DSWM, as shown in FIG. 4, the method 100 being applicable in the stream processing system of FIG. 1, the method 100 comprising:
s110, receiving an access request of window data sent by a client, wherein the access request carries window indication information, and the window indication information indicates a distributed sliding window for storing the window data;
s120, determining a distributed data structure of the window data in the distributed sliding window according to the window indication information, wherein the distributed data structure comprises a plurality of data structure fragments, and the plurality of data structure fragments are located on at least two hosts;
s130, acquiring first memory partition information for storing each data structure fragment according to the feature identifier of each data structure fragment in the plurality of data structure fragments;
s140, determining host information for storing each data structure fragment according to the first memory partition information;
s150, accessing each data structure fragment according to the host information for storing each data structure fragment.
Specifically, the DSWM receives a window data access request sent by a client, where the access request may be sent by a user through the client, and the access request may be a read request or a write request of window data issued to the DSWM by a task. The window data may be data from a network, data stored on a local hard disk, or the like. The access request carries window indication information, where the window indication information indicates a distributed sliding window in which the window data is stored, that is, the window indication information indicates which distributed sliding window the window data to be read is located in, or to which distributed sliding window the window data to be written is to be written. Optionally, the window indication information may be a window name or a window index of the distributed sliding window, which is not limited in this embodiment of the present invention. When the DSWM supports a single distributed sliding window, the window indication information may indicate the single distributed sliding window, and the DSWM may support sharing of window data among multiple distributed sliding windows, where each distributed sliding window in the multiple distributed sliding windows may correspond to a window name, or may be understood that the multiple distributed sliding windows are one large distributed sliding window, and each distributed sliding window is a sub-window of the large distributed sliding window, where the window indication information may be a sub-window name of the sub-window. After the DSWM determines which distributed sliding window the window data is to be stored in or where the window data is to be read from, the DSWM may determine a distributed data structure of the window data in the distributed sliding window according to the window indication information. For example, when the window indication information is the window name of the distributed sliding window, the DSWM may determine the distributed data structure of the window data in the distributed sliding window according to the window name of the distributed sliding window. Optionally, the distributed data structure may include a plurality of data structure fragments, and the plurality of data structure fragments may be located on at least two hosts, that is, the window data may be divided into a plurality of data structure fragments and stored in a distributed manner on a plurality of hosts, so that memory resources on the plurality of hosts can be comprehensively utilized, thereby breaking through the bottleneck of limited single-machine memory resources. Optionally, the distributed data structure may further include copy information of the window data, for example, the window data may be divided into 3 data structure fragments (data structure fragment a to data structure fragment C), and the distributed data structure corresponding to the window data may include 6 data structure fragments (data structure fragment a to data structure fragment F), where the data structure fragment D to the data structure fragment F are copies of the data structure fragment a to data structure fragment C, respectively, that is, there is a backup of the window data in the memory, and when data on a certain data structure fragment is lost, corresponding data information may be obtained from the copy of the data structure fragment, so as to improve reliability of the window data. After determining the distributed data structure corresponding to the window data, optionally, the DSWM may obtain first memory partition information of each data structure segment in the distributed data structure according to the feature identifier of each data structure segment. For example, the first memory partition information may be a first memory partition identifier, the DSWM may perform hash calculation on the feature identifier of each data structure slice to obtain a key value, and according to the key value, the DSWM may obtain the first memory partition identifier of each data structure slice. The DSWM may further determine, by combining a first memory partition identifier of the data structure segment and a memory partition table, host information for storing each data structure segment, where the memory partition table represents a correspondence between the memory partition identifier, a copy number of each data structure segment, and a host number of a memory partition corresponding to the memory partition identifier, that is, the DSWM may query the memory partition table according to the first memory partition identifier, and determine a copy number allocated to the data structure stored in the first memory partition identifier, and on which host the copy is located or on which host the memory partition storing the copy is located. After determining to store host information for each data structure slice, the DSWM may access each data structure slice based on the host information in which the data structure slice is stored. Alternatively, for a read request of window data, the DSWM may read the corresponding data structure slice from the host storing each data structure slice and then feed the read data structure slice back to the user, or for a write request of window data, the DSWM may write the corresponding data structure slice to the host storing each data structure slice.
Therefore, the method for accessing window data in the stream processing system of the embodiment of the invention can divide the window data into a plurality of data structure fragments, and store the data structure fragments on a plurality of hosts in a distributed manner, thereby breaking through the bottleneck problem of limited single-machine memory capacity.
Optionally, in this embodiment of the present invention, the distributed data structure may include a plurality of data structure fragments and copies of the plurality of data structure fragments, where each data structure fragment of the plurality of data structure fragments and a copy of each data structure fragment are located on different hosts, and the method further includes:
acquiring second memory partition information for storing a copy of each data structure fragment according to the feature identifier of each data structure fragment;
determining host information for storing a copy of each data structure fragment according to the second memory partition information;
and accessing each data structure fragment according to the host information for storing the copy of each data structure fragment.
Specifically, the distributed data structure may include the plurality of data structure fragments and copies of the plurality of data structure fragments, each data structure fragment in the plurality of data structure fragments and a copy of each data structure fragment may be located on different hosts, and then, when a failure occurs in a host, resulting in a loss of data of a data structure fragment on the host, data may be obtained from a copy of the data structure fragment stored on another host, and the copies of the data structure fragments are stored in different hosts in a distributed manner, that is, window data is stored using a distributed copy mechanism, so that reliability of the window data is improved. The DSWM may obtain second memory partition information that stores a copy of each data structure segment according to the feature identifier of each data structure segment. Because the feature identifier of each data structure slice is the same as the feature identifier of the copy of each data structure slice, the DSWM may also obtain second memory partition information that stores the copy of each data structure slice according to the feature identifier of the copy of each data structure slice. It may also be understood that each data structure slice and its copy are a set of data, the characteristic identifier of each data structure slice may be understood as a group identifier of the set of data, and the first memory partition information of each data structure slice and the second memory partition information of the copy may be determined according to the group identifier. Optionally, there is a certain correspondence between information of the memory partition storing each data structure fragment and its copy, and the DSWM may determine, according to the first memory partition information of each data structure fragment, the second memory partition information of its copy. After the DSWM determines the second memory partition information of the copy of each data structure slice, the DSWM may determine host information for storing the copy of each data structure slice according to the second memory partition information of the copy of each data structure slice, and then may access each data structure slice according to the host information for storing the copy of each data structure slice, that is, the DSWM may access the data structure slice through any one of the copies of the data structure slice.
It should be understood that each data structure slice and the copy of each data structure slice are the same for the DSWM, and there is no difference between which is the parent and which is the copy, that is, for a data structure slice, if the data structure slice stores three copies in the memory, the three copies can be regarded as the copies of the data structure slice and can be named as copy 0 to copy 2, respectively. The embodiment of the invention uses the distributed data mechanism comprising each data structure fragment and the copy of each data structure fragment to describe, which is only to explain that each data structure fragment can store multiple copies, that is, each data structure fragment can have backup in the memory, and does not distinguish which data structure fragment is the parent and which is the copy.
Optionally, in this embodiment of the present invention, the window indication information may be a window name of the distributed sliding window, and determining, according to the window indication information, a distributed data structure of the window data in the distributed sliding window includes:
and determining a distributed data structure of the window data of the distributed sliding window in the distributed sliding window according to the window name.
Specifically, the distributed sliding window and the distributed data structure may be in a one-to-one correspondence relationship, that is, the distributed data structure corresponding to the window data may be uniquely determined according to the window indication information. Optionally, the window indication information may be a window name of the distributed sliding window, and the DSWM may determine, according to the window name, a distributed data structure corresponding to the window data of the distributed sliding window in the distributed sliding window. For example, the distributed data structure of the window data may be determined as a distributed dictionary structure a according to the window name a, as a distributed set structure B according to the window name B, as a distributed list C according to the window name C, and so on.
Optionally, in this embodiment of the present invention, the window indication information may also be a name of a child window of the child window in the distributed sliding window, and determining, according to the window indication information, a distributed data structure corresponding to the window data in the distributed sliding window includes:
and determining the distributed data structure of the window data of the child window corresponding to the child window name in the distributed sliding window according to the child window name.
Specifically, the distributed sliding window may include a plurality of sub-windows, each sub-window corresponding to one distributed data structure, that is, the distributed sliding window may correspond to a plurality of distributed data structures. For example, for window a, the window length is N, and the window a may be divided into M sub-windows, and the start-stop position of each sub-window with respect to window a is recorded. Optionally, the lengths of the M sub-windows may be the same or different. The corresponding name of the sub-window may be configured for each sub-window, and a distributed data structure corresponding to the sub-window may be created for each sub-window, optionally, a corresponding relationship may be established between the name of the sub-window and the distributed data structure corresponding thereto, and a corresponding relationship may also be established between the name of the sub-window and the start-stop position of the sub-window relative to the window a. At this time, the window indication information may be the name of the child window, and according to the name of the child window, the corresponding distributed data structure of the child window in the distributed sliding window may be determined. For example, when data is written into the window a, optionally, the name B of the sub-window corresponding to the position to be written may be determined according to the position to be written of the window data, then, according to the name B of the sub-window, a mapping relationship between the name of the sub-window and the distributed data structure is queried, the name B of the sub-window may be determined to correspond to the distributed data structure C, according to a relationship between the position to be written and the starting position of the sub-window B, the position to be written of the window data in the distributed data structure C may be determined, then, the distributed data structure may be called, and the window data may be written into the corresponding position of the distributed data structure C. In general, the window data is read and written according to the method, so that the window data can be continuously read and written in the memory, and the waste of discontinuous reading and writing on the memory space is avoided.
Alternatively, for window a, the window length is N, and the window may be divided into M sub-windows in the following way, for example, for the ith data of the window data, the ith data may be stored at the position I/M of the ith% M windows. Wherein,% represents the remainder,/represents the division and the rounding, that is, the allocation method of the sub-window is realized by taking a group of window data with equal remainder to the total number M of the sub-windows in the window data as a data structure fragment and then storing the data structure fragment under one sub-window. Therefore, the method for accessing window data in the stream processing system according to the embodiment of the present invention can divide the window data into a plurality of data structure fragments and randomly store the data structure fragments in the memories corresponding to the plurality of sub-windows, that is, can realize random reading and writing of the window data in the memories, and increases flexibility of the window data reading and writing operations.
As can be seen from the above analysis, the window data may be window data of the entire distributed sliding window, or may also be window data of a sub-window of the distributed sliding window, that is, the method for accessing window data in the embodiment of the present invention may operate not only the window data of the entire distributed sliding window, but also the window data of the sub-window.
Optionally, in this embodiment of the present invention, the determining, according to the first memory partition information, host information for storing each data structure slice may include:
and acquiring host information for storing each data structure fragment according to the first memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
Specifically, after determining the distributed data structure of the window data, optionally, the DSWM may determine first memory information of each data structure slice according to the feature identifier of each data structure slice, where the first memory information may be a first memory partition identifier. For example, the feature identifier of each data structure segment may be a character string, and the DSWM may perform hash calculation on the character string to obtain a key value, and may obtain the memory partition identifier of each data structure segment according to the key value. For example, the range of the obtained key value is 1 to 10000, and the memory partition can be divided as follows: taking 100 as an interval, the memory partition identifier with the key value of 1-100 is 1, the memory partition identifier with the key value of 100-200 is 2, the memory partition identifier with the key value of 200-300 is 3, and the like, for example, when the key value of one data structure fragment obtained by calculation is 1, 2, 50, or 99, it may be determined that the first memory partition identifier of the data structure fragment is 1. Optionally, the DSWM may determine host information for each data structure slice in conjunction with the first memory partition identifier and the memory partition table of the data structure slice. The memory partition table represents a corresponding relationship between a memory partition identifier, a copy number of each data structure fragment, and a host number of a memory partition corresponding to the memory partition identifier. For example, for each data structure segment, the DSWM may query a memory partition table according to a first memory partition identifier of the data structure segment, obtain a copy number of the data structure segment corresponding to the first memory partition identifier, and store host information of the memory partition of the data structure segment.
Optionally, in this embodiment of the present invention, the determining, according to the second memory partition information, host information for storing a copy of each data structure fragment may be a second memory partition identifier, and includes:
and acquiring host information for storing the copy of each data structure fragment according to the second memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
Specifically, after determining the distributed data structure of the window data, optionally, the DSWM may determine second memory information of the copy of each data structure slice according to the feature identifier of each data structure slice, where the second memory information may be an identifier of a second memory partition. The method for determining the second memory partition identifier is similar to the method for determining the first memory partition identifier, and is not described herein again. Optionally, after determining the second memory partition identifier, the DSWM may determine host information for the copy of each data structure slice in combination with the second memory partition identifier and the memory partition table for the copy of the data structure slice. For example, for a copy of each data structure segment, the DSWM may query a memory partition table according to a second memory partition identifier of the copy of the data structure segment, so as to obtain host information where a memory partition storing the copy of the data structure segment is located, where the memory partition table represents a correspondence between the memory partition identifier, a copy number of each data structure segment, and a host number where the memory partition is located. That is, the memory partition table may be queried according to the second memory partition identifier of the copy of the data structure fragment, and a host on which the memory partition indicated by the second memory partition identifier is located is obtained.
It should be understood that, since the data structure segment and the copy of the data structure segment are in an equivalent relationship, the first memory partition identifier and the second memory partition identifier are the same, and the first memory partition identifier and the second memory partition identifier are distinguished only to indicate that the memory partition identifiers of the multiple copies are different, rather than to indicate that the first memory partition identifier is the memory partition identifier of the parent and the second memory partition identifier is the memory partition identifier of the copy.
For example, as shown in table 1, if the memory partition identifier of the data structure fragment obtained by the DSWM is 1, the memory partition table is queried according to the memory partition identifier, and it can be determined that the copy number of the data structure fragment corresponding to the memory partition identifier is 0, and the host number of the memory partition is 1. According to the memory partition identifier, the memory partition identifiers of other copies of the data structure fragment may also be determined, that is, the memory partition identifiers of each data structure fragment and its copy are bound together, and according to the memory partition identifier of one copy of the data structure partition, the memory partition identifiers of other copies may be determined. Therefore, according to the memory partition identifier 1, it can be determined that the memory partition identifiers 1' and 1 ″ are the memory partition identifiers corresponding to the other copies of the data structure fragment, that is, the data structure fragment has three copies (copy 0 to copy 2) in the memory, and then according to the memory partition identifier, the memory partition table is queried to determine that the copies 0 to 2 are respectively located on the host 1 to the host 3. If the data in the data structure fragment copy 0 is lost, the DSWM can also acquire the data information of the data structure fragment copy 1 stored on the host 2 or the data information of the data structure fragment copy 2 stored on the host 3, so that the problem that the window data is lost and cannot be recovered due to task failure, executive failure, work process failure or host failure is solved, and the reliability of the window data is improved.
TABLE 1 memory partition Table
Memory partition identification Copy numbering of data structure fragments Memory partition locationHost numbering of
1 0 1
1’ 1 2
1” 2 3
2 0 1
2’ 1 2
2” 2 3
It should be understood that table 1 only describes that the window data includes two data structure fragments, each data structure fragment is stored in a 3-copy manner, and the number of hosts is 3, but is not limited to any specific example, in the embodiment of the present invention, the window data may further include 4 or 6 data structure fragments, etc., the number of copies of the data structure fragments may also be 2 or 4, etc., and the number of hosts may also be 5 or 10, etc.
Optionally, in this embodiment of the present invention, the obtaining the first memory partition information storing each data structure fragment according to the feature identifier of each data structure fragment in the plurality of data structure fragments includes:
converting the characteristic identifier of each data structure fragment into binary data;
calculating the binary data by using a hash algorithm to obtain a hash result;
and determining the result obtained by modulo the preset value of the hash result as a first memory partition identifier of each data structure fragment.
Specifically, the DSWM may determine, according to a feature identifier of each data structure segment, first memory partition information corresponding to each data structure segment, where the first memory partition information may be a first memory partition identifier, optionally, the DSWM may convert the feature identifier of each data structure segment into binary data, and then the DSWM may calculate the binary data by using a hash algorithm to obtain a hash result, optionally, the DSWM may modulo a preset value from the hash result, and determine a value obtained after the modulo is the first memory partition identifier corresponding to the data structure segment, and optionally, the preset value may be a total number of memory partitions. For example, the hash result is 50, the total number of memory partitions is 100, and optionally, the first memory partition of the data structure slice is identified as 50.
Therefore, the method for accessing window data in the stream processing system of the embodiment of the invention can divide the window data into a plurality of data structure fragments, and store the data structure fragments on a plurality of hosts in a distributed manner, thereby breaking through the bottleneck problem of limited single-machine memory capacity and simultaneously improving the reliability of the window data.
Fig. 5 is a schematic block diagram illustrating an apparatus for accessing window data in a stream processing system according to an embodiment of the present invention, where the apparatus 500 may be the distributed sliding window manager in fig. 3, and the apparatus 500 may be configured in the stream processing system, as shown in fig. 5, and the apparatus 500 includes:
a receiving module 510, configured to receive an access request for window data sent by a client, where the access request carries window indication information, and the window indication information indicates a distributed sliding window for storing the window data;
a first determining module 520, configured to determine, according to the window indication information, a distributed data structure of the window data in the distributed sliding window, where the distributed data structure includes multiple data structure fragments, and the multiple data structure fragments are located on at least two hosts;
an obtaining module 530, configured to obtain, according to the feature identifier of each data structure fragment of the multiple data structure fragments, first memory partition information for storing each data structure fragment;
a second determining module 540, configured to determine, according to the first memory partition information acquired by the acquiring module 530, host information for storing each data structure slice;
an accessing module 550, configured to access each data structure fragment according to the host information for storing the data structure fragment determined by the second determining module 540.
Specifically, the receiving module 510, the first determining module 520, the second determining module 540, the accessing module 550, and the accessing module 550 may be equivalent to the distributed sliding window management module in fig. 3, the receiving module 510 receives an access request of window data sent by a client, where the access request carries window indication information, the first determining module 520 may determine whether a distributed sliding window to be accessed exists according to the window indication information, that is, may determine existence of the distributed sliding window, and then determine a corresponding distributed data structure of the window data in the distributed sliding window, where the distributed data structure includes a plurality of data structure fragments, and the distributed data structure may be implemented by the distributed data structure implementation module in fig. 3. Alternatively, when the distributed sliding window does not exist, the distributed sliding window management module may create a distributed sliding window, and the window indication information indicates a window name of the distributed sliding window. Alternatively, the first determining module 520 may be implemented by a software program, for example, the software program may be implemented in a process, a software module on a hardware chip, or a combination of hardware and software modules. The obtaining module 530 obtains first memory partition information for storing each data structure fragment according to the feature identifier of each data structure fragment in the plurality of data structure fragments, the second determining module 540 determines host information for storing each data structure fragment according to the first memory partition information, and the mapping relationship between the distributed data structure and the memory partition may also be implemented by the distributed data structure implementing module in fig. 3. Alternatively, the obtaining module 530 may be implemented by a software program, for example, the software program may be implemented in a process, or implemented by a software module on a hardware chip, or implemented by a combination of hardware and software modules, and the like.
Therefore, the device for accessing the window data in the stream processing system of the embodiment of the invention can divide the window data into a plurality of data structure fragments, and store the data structure fragments on a plurality of hosts in a distributed manner, thereby breaking through the bottleneck problem of limited single-machine memory capacity and simultaneously improving the reliability of the window data.
Optionally, in this embodiment of the present invention, the distributed data structure includes the multiple data structure fragments and copies of the multiple data structure fragments, where each data structure fragment in the multiple data structure fragments and a copy of each data structure fragment are located on different hosts, and the obtaining module 530 is further configured to obtain, according to the feature identifier of each data structure fragment, second memory partition information for storing the copy of each data structure fragment;
the second determining module 540 is further configured to determine, according to the second memory partition information, host information for storing a copy of each data structure fragment;
the accessing module 550 is further configured to access each data structure slice according to host information storing a copy of the each data structure slice.
Optionally, in this embodiment of the present invention, the window indication information is a window name of the distributed sliding window, and the first determining module 520 is specifically configured to:
and determining the distributed data structure of the window data of the distributed sliding window in the distributed sliding window according to the window name.
Optionally, in this embodiment of the present invention, the window indication information is a name of a sub-window in the distributed sliding window, and the first determining module 520 is further configured to:
and determining the distributed data structure of the window data of the child window corresponding to the child window name in the distributed sliding window according to the child window name.
Optionally, in this embodiment of the present invention, the first memory partition information is a first memory partition identifier, and the second determining module 540 is further configured to:
and acquiring host information for storing each data structure fragment according to the first memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
Optionally, in an embodiment of the present invention, the second memory partition information is a second memory partition identifier, and the second determining module 540 is further configured to:
and acquiring host information for storing the copy of each data structure fragment according to the second memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
Optionally, in this embodiment of the present invention, the first memory partition information may be a first memory partition identifier, and the obtaining module 530 includes:
a conversion unit, configured to convert the feature identifier of each data structure fragment into binary data;
the computing unit is used for computing the binary data by using a hash algorithm to obtain a hash result;
and the determining unit is used for determining a result obtained by modulo the hash result on a preset value as the first memory partition identifier of each data structure fragment.
Optionally, in an embodiment of the present invention, the data structure of the window data is one of the following:
distributed list structures, distributed dictionary structures, distributed collection structures.
Optionally, the stream processing system may comprise a plurality of DSWM, and in actual use, DSWM may be added or deleted according to actual needs. When a new DSWM is needed, the DSWM acquires the information of the main node in a broadcasting mode, and informs other hosts of the information of the new DSWM through the main node so that the other hosts can update the topology information of the DSWM in the stream processing system.
Therefore, the device for accessing the window data in the stream processing system of the embodiment of the invention can divide the window data into a plurality of data structure fragments, and store the data structure fragments on a plurality of hosts in a distributed manner, thereby breaking through the bottleneck problem of limited single-machine memory capacity and simultaneously improving the reliability of the window data.
The apparatus 500 for accessing window data in a stream processing system according to an embodiment of the present invention may correspond to the DSWM in the method 100 for accessing window data in a stream processing system according to an embodiment of the present invention, and the above and other operations and/or functions of each module in the apparatus 500 for accessing window data in a stream processing system are respectively for implementing corresponding flows of each aforementioned method, and are not described herein again for brevity.
As shown in fig. 6, an embodiment of the present invention further provides a schematic block diagram of an apparatus 600 for accessing window data in a stream processing system, where the apparatus 600 may be configured in the stream processing system shown in fig. 1, and the apparatus 600 includes a processor 610, a memory 620, a bus system 630, and a network interface 640. Wherein the processor 610, the memory 620, and the network interface 640 are coupled via a bus system 630, the memory 620 is configured to store instructions, and the processor 610 is configured to execute instructions, such as computer programs, stored by the memory 620. The communication connection with at least one other network element is realized through a network interface 640 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The network interface 640 is configured to receive an access request of window data sent by a client, where the access request carries window indication information, and the window indication information indicates a distributed sliding window for storing the window data; the processor 610 is configured to determine a distributed data structure of the window data in the distributed sliding window, where the distributed data structure includes multiple data structure fragments, where the multiple data structure fragments are located on at least two hosts, obtain first memory partition information for storing each data structure fragment according to a feature identifier of each data structure fragment in the multiple data structure fragments, determine host information for storing each data structure fragment according to the first memory partition information, and access each data structure fragment according to the host information for storing each data structure fragment.
Therefore, the device for accessing the window data in the stream processing system of the embodiment of the invention can divide the window data into a plurality of data structure fragments, and store the data structure fragments on a plurality of hosts in a distributed manner, thereby breaking through the bottleneck problem of limited single-machine memory capacity and simultaneously improving the reliability of the window data.
It should be understood that, in the embodiment of the present invention, the processor 610 may be a Central Processing Unit (CPU), and the processor 610 may also be other general processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 620 may include both read-only memory and random access memory, and provides instructions and data to the processor 610. A portion of the memory 620 may also include non-volatile random access memory. For example, the memory 620 may also store device type information.
The bus system 630 may include a power bus, a control bus, a status signal bus, and the like, in addition to the data bus. For clarity of illustration, however, the various buses are designated in the figure as the bus system 630.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 610. The steps of a method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 620, and the processor 610 reads the information in the memory 620 and performs the steps of the above method in combination with the hardware thereof. To avoid repetition, it is not described in detail here.
Optionally, in an embodiment of the present invention, the distributed data structure includes the plurality of data structure fragments and copies of the plurality of data structure fragments, each data structure fragment of the plurality of data structure fragments and a copy of each data structure fragment are located on different hosts, and the processor 610 is further configured to:
acquiring second memory partition information for storing a copy of each data structure fragment according to the feature identifier of each data structure fragment;
determining host information for storing a copy of each data structure fragment according to the second memory partition information;
and accessing each data structure fragment according to the host information for storing the copy of each data structure fragment.
Optionally, in this embodiment of the present invention, the window indication information is a window name of the distributed sliding window, and the processor 610 is specifically configured to:
and determining the distributed data structure of the window data of the distributed sliding window in the distributed sliding window according to the window name.
Optionally, in this embodiment of the present invention, the window indication information is a name of a sub-window in the distributed sliding window, and the processor 610 is further configured to:
and determining the distributed data structure of the window data of the child window corresponding to the child window name in the distributed sliding window according to the child window name.
Optionally, in this embodiment of the present invention, the first memory partition information is a first memory partition identifier, and the processor 610 is further configured to:
and acquiring host information for storing each data structure fragment according to the first memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
Optionally, in this embodiment of the present invention, the second memory partition information is a second memory partition identifier, and the processor 610 is further configured to:
and acquiring host information for storing the copy of each data structure fragment according to the second memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
Optionally, in this embodiment of the present invention, the first memory partition information is a first memory partition identifier, and the processor 610 is further configured to:
converting the characteristic identifier of each data structure fragment into binary data;
calculating the binary data by using a hash algorithm to obtain a hash result;
and determining the result obtained by modulo the hash result on a preset value as the memory partition identification corresponding to each data structure partition.
Optionally, in an embodiment of the present invention, the data structure of the window data is one of the following:
distributed list structures, distributed dictionary structures, distributed collection structures.
Therefore, the device for accessing the window data in the stream processing system of the embodiment of the invention can divide the window data into a plurality of data structure fragments, and store the data structure fragments on a plurality of hosts in a distributed manner, thereby breaking through the bottleneck problem of limited single-machine memory capacity and simultaneously improving the reliability of the window data.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (16)

1. A method for accessing window data in a stream processing system, the method comprising:
receiving an access request of window data sent by a client, wherein the access request carries window indication information, and the window indication information indicates a distributed sliding window for storing the window data;
determining a distributed data structure of the window data in the distributed sliding window according to the window indication information, wherein the distributed data structure comprises a plurality of data structure fragments, and the plurality of data structure fragments are located on at least two hosts;
acquiring first memory partition information for storing each data structure fragment according to the feature identifier of each data structure fragment in the plurality of data structure fragments;
determining host information for storing each data structure fragment according to the first memory partition information;
and accessing each data structure fragment according to the host information for storing each data structure fragment.
2. The method of claim 1, wherein the distributed data structure comprises the plurality of data structure slices and copies of the plurality of data structure slices, wherein each data structure slice of the plurality of data structure slices and a copy of each data structure slice are located on different hosts, the method further comprising:
acquiring second memory partition information for storing a copy of each data structure fragment according to the feature identifier of each data structure fragment;
determining host information for storing the copy of each data structure fragment according to the second memory partition information;
and accessing each data structure fragment according to the host information for storing the copy of each data structure fragment.
3. The method according to claim 2, wherein the window indication information is a window name of the distributed sliding window, and the determining the distributed data structure of the window data in the distributed sliding window according to the window indication information includes:
and determining the distributed data structure of the window data of the distributed sliding window in the distributed sliding window according to the window name.
4. The method according to claim 2, wherein the window indication information is a child window name of a child window in the distributed sliding window, and the determining the distributed data structure of the window data in the distributed sliding window according to the window indication information includes:
and determining the distributed data structure of the window data of the child window in the distributed sliding window according to the child window name.
5. The method according to any of claims 2 to 4, wherein the first memory partition information is a first memory partition identifier, and the determining the host information for storing each data structure slice according to the first memory partition information comprises:
and acquiring host information for storing each data structure fragment according to the first memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
6. The method according to any of claims 2 to 4, wherein the second memory partition information is a second memory partition identifier, and the determining, according to the second memory partition information, host information for storing a copy of each of the data structure slices comprises:
and acquiring host information for storing the copy of each data structure fragment according to the second memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
7. The method according to any one of claims 1 to 4, wherein the first memory partition information is a first memory partition identifier, and the obtaining the first memory partition information for storing each of the plurality of data structure slices according to the characteristic identifier of each of the plurality of data structure slices comprises:
converting the characteristic identifier of each data structure fragment into binary data;
calculating the binary data by using a hash algorithm to obtain a hash result;
and determining the result obtained by modulo the preset value of the hash result as a first memory partition identifier of each data structure fragment.
8. The method according to any one of claims 1 to 4, wherein the data structure of the window data is one of:
distributed list structures, distributed dictionary structures, distributed collection structures.
9. An apparatus for accessing window data in a stream processing system, the apparatus comprising:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving an access request of window data sent by a client, the access request carries window indication information, and the window indication information indicates a distributed sliding window for storing the window data;
a first determining module, configured to determine, according to the window indication information, a distributed data structure of the window data in the distributed sliding window, where the distributed data structure includes multiple data structure fragments, and the multiple data structure fragments are located on at least two hosts;
an obtaining module, configured to obtain, according to a feature identifier of each data structure fragment of the multiple data structure fragments, first memory partition information for storing each data structure fragment;
a second determining module, configured to determine, according to the first memory partition information acquired by the acquiring module, host information for storing each data structure fragment;
and the access module is used for accessing each data structure fragment according to the host information for storing each data structure fragment determined by the second determination module.
10. The apparatus according to claim 9, wherein the distributed data structure includes the plurality of data structure fragments and copies of the plurality of data structure fragments, each data structure fragment of the plurality of data structure fragments and a copy of each data structure fragment are located on different hosts, and the obtaining module is further configured to obtain second memory partition information that stores the copy of each data structure fragment according to the feature identifier of each data structure fragment;
the second determining module is further configured to determine, according to the second memory partition information, host information for storing a copy of each data structure slice;
the access module is further configured to access each data structure fragment according to host information that stores a copy of each data structure fragment.
11. The apparatus of claim 10, wherein the window indication information is a window name of the distributed sliding window, and the first determining module is specifically configured to:
and determining the distributed data structure of the window data of the distributed sliding window in the distributed sliding window according to the window name.
12. The apparatus of claim 10, wherein the window indication information is a sub-window name of a sub-window in the distributed sliding window, and wherein the first determining module is further configured to:
and determining the distributed data structure of the window data of the child window in the distributed sliding window according to the child window name.
13. The apparatus according to any one of claims 10 to 12, wherein the first memory partition information is a first memory partition identifier, and the second determining module is further configured to:
and acquiring host information for storing each data structure fragment according to the first memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
14. The apparatus according to any one of claims 10 to 12, wherein the second memory partition information is a second memory partition identifier, and the second determining module is further configured to:
and acquiring host information for storing the copy of each data structure fragment according to the second memory partition identifier and a memory partition table, wherein the memory partition table represents the corresponding relation between the memory partition identifier, the copy number of each data structure fragment and the host number of the memory partition corresponding to the memory partition identifier.
15. The apparatus according to any one of claims 9 to 12, wherein the first memory partition information is a first memory partition identifier, and the obtaining module includes:
the conversion unit is used for converting the characteristic identifier of each data structure fragment into binary data;
the computing unit is used for computing the binary data by using a hash algorithm to obtain a hash result;
and the determining unit is used for determining a result obtained by modulo the hash result on a preset value as a first memory partition identifier of each data structure fragment.
16. The apparatus according to any one of claims 9 to 12, wherein the data structure of the window data is one of:
distributed list structures, distributed dictionary structures, distributed collection structures.
CN201510783099.1A 2015-11-16 2015-11-16 Method and device for accessing window data in stream processing system Active CN106708865B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510783099.1A CN106708865B (en) 2015-11-16 2015-11-16 Method and device for accessing window data in stream processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510783099.1A CN106708865B (en) 2015-11-16 2015-11-16 Method and device for accessing window data in stream processing system

Publications (2)

Publication Number Publication Date
CN106708865A CN106708865A (en) 2017-05-24
CN106708865B true CN106708865B (en) 2020-04-03

Family

ID=58930501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510783099.1A Active CN106708865B (en) 2015-11-16 2015-11-16 Method and device for accessing window data in stream processing system

Country Status (1)

Country Link
CN (1) CN106708865B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582640B (en) * 2018-11-15 2020-12-01 深圳市酷开网络科技有限公司 Sliding window-based data deduplication storage method and device and storage medium
CN111198659B (en) * 2019-12-26 2023-09-05 天津中科曙光存储科技有限公司 Concurrent I/O stream model identification method and system based on multi-sliding window implementation
CN112685455B (en) * 2021-03-12 2021-11-23 北京每日优鲜电子商务有限公司 Real-time data classification display method and device, electronic equipment and readable medium
CN115623019B (en) * 2022-12-02 2023-03-21 杭州雅拓信息技术有限公司 Distributed operation flow scheduling execution method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101026744A (en) * 2007-03-30 2007-08-29 Ut斯达康通讯有限公司 Distributed flow media distribution system, and flow media memory buffer and scheduling distribution method
CN102521405A (en) * 2011-12-26 2012-06-27 中国科学院计算技术研究所 Massive structured data storage and query methods and systems supporting high-speed loading
CN105243140A (en) * 2015-10-10 2016-01-13 中国科学院软件研究所 High-speed train real-time monitoring oriented mass data management method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7487206B2 (en) * 2005-07-15 2009-02-03 International Business Machines Corporation Method for providing load diffusion in data stream correlations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101026744A (en) * 2007-03-30 2007-08-29 Ut斯达康通讯有限公司 Distributed flow media distribution system, and flow media memory buffer and scheduling distribution method
CN102521405A (en) * 2011-12-26 2012-06-27 中国科学院计算技术研究所 Massive structured data storage and query methods and systems supporting high-speed loading
CN105243140A (en) * 2015-10-10 2016-01-13 中国科学院软件研究所 High-speed train real-time monitoring oriented mass data management method

Also Published As

Publication number Publication date
CN106708865A (en) 2017-05-24

Similar Documents

Publication Publication Date Title
US11614990B2 (en) Automatic correlation of dynamic system events within computing devices
US11082206B2 (en) Layout-independent cryptographic stamp of a distributed dataset
US9367598B2 (en) Merging an out of synchronization indicator and a change recording indicator in response to a failure in consistency group formation
US9442791B2 (en) Building an intelligent, scalable system dump facility
CN106708865B (en) Method and device for accessing window data in stream processing system
CN105468718B (en) Data consistency processing method, device and system
JP6526235B2 (en) Data check method and storage system
CN108268344B (en) Data processing method and device
EP3147797B1 (en) Data management method, node and system for database cluster
US9870278B2 (en) Managing spaces in memory
US11210003B2 (en) Method, device and computer program product for restoring data based on replacing child node identifiers with parent node identifier
CN103716384A (en) Method and device for realizing cloud storage data synchronization in cross-data-center manner
US10209905B2 (en) Reusing storage blocks of a file system
CN109634524B (en) Data partition configuration method, device and equipment of data processing daemon
CN108133034B (en) Shared storage access method and related device
US20150212847A1 (en) Apparatus and method for managing cache of virtual machine image file
CN115756955A (en) Data backup and data recovery method and device and computer equipment
US20140380089A1 (en) Method and apparatus for recovering failed disk in virtual machine
US10613768B2 (en) Checkpointing module and method for storing checkpoints
CN107102898B (en) Memory management and data structure construction method and device based on NUMA (non Uniform memory Access) architecture
CN109254870B (en) Data backup method and device
CN114020214A (en) Storage cluster capacity expansion method and device, electronic equipment and readable storage medium
CN110865901A (en) Method and device for building EC (embedded control) strip
US20220197874A1 (en) Efficient storage of key-value data with schema integration
CN118093269A (en) Data restoration method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200420

Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee after: HUAWEI TECHNOLOGIES Co.,Ltd.

Address before: 301, A building, room 3, building 301, foreshore Road, No. 310052, Binjiang District, Zhejiang, Hangzhou

Patentee before: Huawei Technologies Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220209

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Patentee after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.