WO2023280053A1 - 数据处理方法、系统、电子设备及存储介质 - Google Patents

数据处理方法、系统、电子设备及存储介质 Download PDF

Info

Publication number
WO2023280053A1
WO2023280053A1 PCT/CN2022/103200 CN2022103200W WO2023280053A1 WO 2023280053 A1 WO2023280053 A1 WO 2023280053A1 CN 2022103200 W CN2022103200 W CN 2022103200W WO 2023280053 A1 WO2023280053 A1 WO 2023280053A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
log
time
target data
permission
Prior art date
Application number
PCT/CN2022/103200
Other languages
English (en)
French (fr)
Inventor
古青松
孟庆义
熊嘉男
沈春辉
杨成虎
Original Assignee
阿里云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里云计算有限公司 filed Critical 阿里云计算有限公司
Publication of WO2023280053A1 publication Critical patent/WO2023280053A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Definitions

  • the present application belongs to the technical field of databases, and in particular relates to a data processing method, system, electronic equipment and storage medium.
  • CDC change data capture, change data capture
  • the core idea of CDC is to monitor and capture changes in the database (including insertion, update, deletion, etc. of data or data tables), record these changes in the order in which they occur, and write them into the message middleware for other services to perform. Subscribe and consume.
  • a file that records data, a data table, or a change in a partition of a data table can be called a log.
  • the data of a data table is distributed on different servers by partition. If a partition is moved, the data of the partition and the logs generated at different times will be written to different servers.
  • the logs corresponding to the partition data from different servers, if the logs of each server are simply collected concurrently, the logs corresponding to the same partition data will not be received by the downstream subscriber/consumer device in chronological order. but out of order.
  • the embodiments of the present application provide a data processing method, system, electronic device, and storage medium.
  • a data processing method is provided, which is applicable to the first server, including:
  • a data processing method including:
  • the first time and the movement track determine whether to give the first server the right to send the first time log.
  • a data processing system including: a first server, a second server, a reader, and a manager, wherein:
  • the first server is configured to determine whether it has the permission to send the first time log when listening to the log reading event for the target data; At least one log at a time; sending the at least one log to the reader; if there is no sending permission, apply to the management party for sending permission;
  • the management party is configured to receive the permission application request sent by the first server for the target data, wherein the permission application request carries the first time; obtain the movement track of the target data moving between at least two servers; according to The first time and the movement track determine whether to give the first server the right to send the first time log.
  • an electronic device in yet another embodiment, includes a processor and a memory, at least one instruction, at least one section of program, code set or instruction set is stored in the memory, and the at least one instruction, at least one section of program, code set or instruction set is loaded by the processor And execute to realize the steps in the above method embodiments.
  • a computer program product includes computer programs/instructions, which, when executed by a processor, cause the processor to implement the steps in the foregoing method embodiments.
  • the embodiment of the present application provides a data processing method, that is, when the target data is transferred between multiple servers, the log files of each server record the corresponding log of the target data. (Such as data log subscriber/consumer) When sending the log of the target data, it needs to first determine whether it has the right to send the log at the first time. At least one log of the target data whose time stamp is greater than or equal to the first time can be acquired only when the permission is granted. Among them, for different servers, the first time is different. The first time is the time point recorded in the server's log file that meets the requirement of the reader to read the log timestamp.
  • any server among multiple servers needs to obtain the corresponding authority before it can send the log of the data to the reader;
  • the log of the target data can be sent to the reader in chronological order without confusion.
  • Figure 1 is a schematic diagram of a data table divided into multiple regions (Regions) according to a certain range through row keys;
  • Fig. 2 is a schematic diagram that multiple partitions of a data table can be distributed on multiple Region servers;
  • Figure 3 is a schematic diagram of the main components of the Region server being log files and Region blocks;
  • FIG. 4 is a schematic structural diagram of a data processing system provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of the interaction among multiple servers, managers, and readers in the data processing system provided by an embodiment of the present application;
  • FIG. 7 is a flowchart of a data processing method provided in another embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a data processing device provided in another embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • HBase is an open source non-relational database storage system.
  • Segment log sequence, which is a continuous data update log, once it is closed, it becomes read-only.
  • the log file is divided into N segments, and each segment is called a Wal segment file (log sequence file).
  • LogEntry A log that records an update operation of a data in a table, including the operation time and update content.
  • Region partition, which represents a continuous data space, and a partition contains a start key (startkey) and an end key (endkey).
  • SequenceId is the self-incrementing sequence number of a row-level transaction at the Region level.
  • the self-incrementing serial number means that it will continue to increase over time and will not decrease.
  • Row-level transactions in simple terms, are to update multiple column families and multiple columns in a row. Row-level transactions can guarantee the atomicity, consistency, durability, and isolation of settings for this update.
  • HBase assigns an auto-incrementing sequence number to a row-level transaction.
  • Each Region (partition data) maintains its own SequenceId, and the SequenceIds of different Regions (partition data) are independent of each other.
  • Partition movement track Partition movement track record, which records data going online on a certain server at a certain point in time.
  • OpenMark Data online log, including data identification, time stamp, SequenceId and other information. When each server opens a data, an OpenMark is recorded in Wal.
  • CDC change data capture, that is, change data capture.
  • change data capture that is, change data capture.
  • it mainly refers to obtaining data update content by collecting logs in the wal log file of the database.
  • Synchronization point a point in time, if the synchronization point of a server is T, it means that the data before the server T has been synchronized.
  • Key primary key, the primary key of the table in the database, and the unique identifier of a piece of data.
  • the data in each embodiment of the present application may be: data, a partition of a data table or a data set, and the like.
  • the log in each embodiment of the present application may also be called data change information, or other similar information used to record data, a partition of a data table, or a data set change, etc., which is not specifically limited in the present application.
  • the reason why logs are used in each embodiment of this application is that by the filing date of this application, those skilled in the art are accustomed to recording data, a partition of a data table or information on changes in a data set, which can be used to persist data to disk, Data, information, etc.
  • HBase is a distributed, column-oriented storage system built on top of HDFS.
  • HBase stores data in the form of tables.
  • the data table Table is composed of rows and columns, and the columns are divided into several columns (row family).
  • the data table is divided into multiple partitions (Region) according to a certain range according to the row key, and each partition is scattered in different servers (such as the Region server). That is to say, for HBase, the data (such as target data) mentioned in each embodiment of the present application is a partition of a data table.
  • the distributed database in order to output the log of a certain data to the downstream subscriber/consumer sequentially according to the data update time, the distributed database generally adopts two schemes.
  • the first scheme is: synchronous write scheme
  • the synchronous writing scheme ensures order by sending data to downstream subscribers/consumers while data is being written into the database. For example, if a coprocessor is added to the HBase server, the data is first sent to the downstream subscriber when processing the data update request; this solution occupies the data writing service resources, and needs to be written to the database and the downstream subscriber at the same time, which affects the writing performance , which reduces the stability of the system. To ensure the consistency of the database and downstream subscription data, it is necessary to ensure that both the database write and the downstream write are successful. If the downstream subscriber is unavailable, the entire write will fail.
  • each Region maintains a sequence number of the last write operation, which represents the sequence number of the last write operation successfully pushed by the Region, and judges a write in the write log according to the barrier list and the sequence number of the last write operation Whether the operation can be replicated to the standby cluster.
  • This solution is highly intrusive to the system. It needs to record barrier information in the Meta table, and at the same time strongly relies on the semantics of the serial number (strictly incremented, the serial number +1 when opening the Region, etc.).
  • the Meta table needs to be accessed every time data is synchronized. Updating the serial number of the last write operation and querying the barrier has poor synchronization performance and additional pressure on the meta table.
  • Region online Multiple Regions of a data table can be distributed and stored on multiple servers (such as Region servers). As shown in Figure 2, the Master server assigns different Regions to different Region servers. A Region with the same row key will not be split into multiple Region servers. Each Region server is responsible for managing a Region, and usually 10 to 1000 Regions are placed on each Region server.
  • Region11 of the data table Table1 is stored in Region server a; Region12 of the data table Table1 is stored in Region server c.
  • Region positioning This process of finding a Region is called Region positioning.
  • Only one Region server can be assigned to a Region.
  • the Master records which Region servers are currently available, which Regions are currently assigned to which Region servers, and which Regions have not yet been assigned.
  • the Master sends a load request to the Region server and assigns the Region to the Region server.
  • the Region server After the Region server receives the request, it begins to provide the Region with Serve.
  • the Region server starts to provide services for this Region, which can be understood as: the Region goes online, or the Region goes online on the Region server.
  • the core module of HBase is the Region server.
  • the Region server is composed of multiple Region blocks, and a series of continuous data sets are stored in the Region blocks (that is, a partition of a data table is stored).
  • the main components of the Region server are log files and Region blocks.
  • the log file records the operation logs of all Regions served by the Region Server, as shown in Figure 3.
  • the Region block contains multiple stores, each store corresponds to a column family in the current partition, and each store manages a block of memory, namely MemStore.
  • MemStore a block of memory
  • each store contains several StoreFile files.
  • StoreFile files correspond to HFile files in HDFS.
  • FIG. 4 is a schematic structural diagram of a data processing system provided by an exemplary embodiment of the present application.
  • the system at least includes: a first server 11 , a second server 12 , a reader 13 and a manager 14 .
  • the first server 11 and the second server 12 can be any two Region servers in FIGS. 2 and 3 , for example, the first server can be Region server a, and the second server can be Region server b.
  • the first server may be Region server c, and the second server may be Region server d.
  • multiple Region servers may be called a Region server cluster.
  • the first server and the second server may be any two in the Region server cluster, which is not limited in this embodiment.
  • the above-mentioned management party 14 may be the above-mentioned Master main server, or a newly added management device other than the Master main server, which is not limited in this embodiment.
  • the reader 13 may be a subscriber device, a consumer device, etc., which is not limited in this embodiment.
  • the first server 11 is configured to determine whether it has the authority to send the first time log when listening to the log reading event for the target data; when having the authority, obtain the time stamp of the target data greater than or equal to At least one log at the first time; sending the at least one log to the reader 13; when there is no sending permission, apply to the management party 14 for sending permission;
  • the management party 14 is configured to receive the permission application request sent by the first server 11 for the target data, wherein the permission application request carries a first time; based on the movement track of the target data moving between at least two servers , determine the second server 12; wherein, the target data is moved from the second server 12 to the first server 11; obtain a second time, wherein the second time is the second server 12
  • the above synchronization point about the target data reflects that the log of the target data before the second time has been synchronized; by comparing the relationship between the first time and the second time, it is determined whether to give the first Server 11 permissions.
  • each server (such as a region server) stores a log file (as shown in Figure 3), which records the data changes (such as insertion, deletion, update, etc.)
  • the data update order of is appended to the end of the log file.
  • the log in the log file can be divided into multiple segments ordered by time. Multiple logs in each segment record all data updates on the server for a period of time. All data updates here refer to: logs of all Regions served on the server.
  • the system provided in this embodiment includes not only two servers, but also three, four or more servers.
  • the management party may be the master node (such as the master server mentioned above) of the database cluster (such as the HBase cluster).
  • one or more master nodes can be configured to implement HA (Highly Available, dual-machine cluster system, improving availability cluster, which is an effective solution to ensure work continuity.
  • HA Highly Available, dual-machine cluster system, improving availability cluster, which is an effective solution to ensure work continuity.
  • the manager has the ability to allocate regions for each server (such as a Region server), is responsible for server load balancing, discovers failed servers and redistributes Regions on them.
  • the servers (such as the above-mentioned first server, second server, etc.) maintain multiple Regions, process read and write IO requests for these Regions, and are also responsible for segmenting Regions that become too large during operation.
  • each server in the distributed system may correspond to a reading unit.
  • the reading unit 1' corresponding to the server 1; the reading unit 2' corresponding to the server 2; the reading unit 3' corresponding to the server 3.
  • the reading unit can collect the log of its corresponding server, send permission application to the management side based on the log, and the synchronization point corresponding to the server.
  • the foregoing log reading event may be triggered by a log acquisition request received from the reading party, or may be triggered by an instruction issued by an upstream management device from the first server.
  • the management party can correspond to a coordinator (Coordinator), and the coordinator can obtain the movement trajectory of the target data, such as the movement trajectory of partition 1; receive the synchronization point reported by the reading unit of the server; receive the request of the server reading unit to apply for sending permission , and issue permissions to the reading unit of the corresponding server according to the movement track of the target data.
  • a coordinator Coordinator
  • FIG. 5 is a schematic flowchart of a data processing method provided by an exemplary embodiment of the present application.
  • the execution subject of the method may be the first server in the distributed system.
  • the method at least includes the following steps:
  • the target data may be a Region (partition) of a data table, or a data cluster, which is not limited in this embodiment.
  • the log reading event for the target data may be initiated by the reader, for example, a log reading request for the target data sent by the reader.
  • the log reading event for the target data is initiated by the management side.
  • the management side sends the target data to the downstream reader (such as the subscriber/consumer) regularly or irregularly, and the management side sends the target data to each server.
  • each server (such as the first server in the method embodiment) triggers a log reading event for the target data after receiving the delivery instruction.
  • the right to send the first time log can be granted by the management party in the above system.
  • the executive body (such as the first server) of the method of this embodiment may actively apply, and the management side coordinates the sending order of each server to determine the log assigned to each server to send the target data.
  • the management side can also actively issue permissions to each server sequentially, and each server can execute step 202 after obtaining the permissions.
  • the “first time” in step 201 needs to be explained here.
  • the first time refers to the time stamp corresponding to the log corresponding to the first target data contained in the log file of the server.
  • the target data in this embodiment is partition 1 .
  • different partitions are distinguished according to the filling pattern. for example, Represents the log of partition 1; “ ⁇ ” represents the log of partition 2; “ ⁇ ” represents the log of partition 3, and so on, which are not listed here.
  • server 1 starts to provide services for partition 1 at time t2; server 2 starts to provide services for partition 1 at time t1; server 3 starts to provide services for partition 1 at time t3.
  • t1 is earlier than t2, and t2 is earlier than t1.
  • the server will record the log of partition 1 in the log file from the moment it starts to provide services for partition 1 until the partition 1 is transferred to other servers. That is to say, the log file on the server includes the online log (openmark) of the partition 1 and at least one log after the time stamp corresponding to the online log of the partition 1.
  • the timestamp corresponding to the online log is the first time. If the log file of the server does not include the online log, then the timestamp corresponding to the log of the first target data in the log file is the first time in this embodiment.
  • the reader wants to read the log of partition 1 after time t1.
  • server 1 After server 1 listens to the log reading event, server 1 needs to determine whether it has the authority to send the t4 time log. After having the permission to send logs at time t4 (that is, the first time corresponding to server 1), server 1 can obtain two logs of partition 1 with timestamps greater than or equal to t4, namely the logs with timestamp t4, and Logs with timestamp t5.
  • server 2 after server 2 listens to the log reading event, server 2 needs to determine whether it has the authority to send the log at time t6 (ie, the first time corresponding to server 2).
  • server 2 After having the permission to send the t6 time log, server 2 can obtain two logs of partition 1 with a timestamp greater than or equal to t6, namely the log with the timestamp of t6 and the log with the timestamp of t7. Similarly, for server 3, after server 3 has the authority to send t8 time logs, server 3 can obtain a log of partition 1 with a time stamp equal to t8.
  • each server such as server 1, server 2 and server 3, the right to send the logs of a certain partition (such as partition 1) of the corresponding time period saved in its own log file, need to be given in order according to the transfer track of the partition, and then Ensure that the time when the downstream reader receives the partition log is ordered, not out of order.
  • a certain partition such as partition 1
  • This embodiment provides a data processing method, that is, when the target data is transferred between multiple servers, and the log files of each server record the corresponding log of the target data, when any server wants to send the data to the reader (for example, the data log subscriber/consumer) needs to determine whether it has the right to send the log at the first time when sending the log of the target data.
  • At least one log of the target data whose time stamp is greater than or equal to the first time can be acquired only when the permission is granted.
  • the first time is different. The first time is the time point recorded in the server's log file that meets the requirement of the reader to read the log time stamp.
  • any server among multiple servers needs to obtain the corresponding authority before it can send the log of the data to the reader;
  • the log of the target data can be sent to the reader in chronological order without confusion.
  • the method provided in this embodiment may also include the following steps:
  • the sending permission is determined based on the relationship between the second time and the first time
  • the second time is a synchronization point corresponding to the target data on the second server
  • the synchronization point reflects the The log of the target data before the second time on the second server has been synchronized
  • the second server is obtained by moving the target data between at least two servers.
  • the above-mentioned second time is the synchronization point, see the front part of this detailed description, the explanation of nouns and terms.
  • the synchronization point can be reported to the management party after the server has sent the corresponding log, so that the management side can determine the corresponding authority for each server based on the synchronization point of each server and the movement track of the target data among multiple servers. opportunity. That is, the method provided in this embodiment also includes the following steps:
  • the storage space has permission information for the target data, and has the permission.
  • the method provided in this embodiment may also include the following steps:
  • the method provided in this embodiment may also include the following steps:
  • Fig. 7 shows a schematic flowchart of a data processing method provided by another embodiment of the present application. As shown in Figure 7, the method includes:
  • the permission application request may also include a target data identifier (for a partition, it may be a RegionID), a first server identifier, and the like.
  • the movement track of the target data moving between at least two servers may include: a track item.
  • a trace item (RegionTraceInfo) of a region records the information that the region is online on a server, that is, the trace item includes: region ID (RegionID), server ID, online timestamp, etc.
  • the above-mentioned 303 "determine whether to give the first server the authority to send the first time log based on the first time and the movement track" may specifically include the following steps:
  • the movement track of partition 1 includes: track item 1 , track item 2 and track item 3 .
  • the track item 1 is reported to the management side by the server 2 when the partition 1 goes online.
  • the server 2 goes online in the partition 1, it reports the online information of the partition 1 to the management side through a reading unit 2 corresponding to the server 2, so that the management side can generate the track item 1 of the partition 1.
  • server 1 reports when partition 1 goes online, and the management side generates track item 2 of partition 1;
  • server 3 reports when partition 1 goes online, and the management side generates track item 3 of partition 1.
  • Track item 1 at least includes: server 2 identifier, partition 1 identifier, and online timestamp t1.
  • Track item 2 at least includes: server 1 identifier, partition 1 identifier, and online timestamp t2.
  • the track item 3 at least includes: the server 3 identifier, the partition 1 identifier, and the online timestamp t3.
  • the second time is the synchronization point corresponding to the target data on the second server.
  • partition 1 is the target data in this embodiment
  • server 2 is the second server in this embodiment
  • server 1 is the first server in this embodiment.
  • the synchronization point corresponding to the partition 1 on the server 2 is t2, that is, the logs before the time t2 have been synchronized, or the logs before the time t2 have been sent.
  • the above-mentioned 3033 may be specifically implemented as follows:
  • the first server fails to apply for permission.
  • the latter server can have the corresponding authority to synchronize the log of the target data stored in the local log file to the reader. Before the previous server completes the synchronization of the log corresponding to the target data, the latter server does not obtain the permission.
  • the log files of Server 1, Server 2, and Server 3 all contain logs of Partition 1.
  • the logs of Partition 1 on Server 1, Server 2, and Server 3 will be collected concurrently. mode, sent to the reader.
  • the log for partition 1 received by the reader on receive is out of order.
  • the management side records the moving track of partition 1 between different servers, and can sequentially create a log for each server according to the moving track and the time stamp or time period of the log to be sent requested by each server.
  • the distribution authority of the server enables each server to send the logs of partition 1 in the corresponding log file in sequence, so that the read can easily receive the logs of partition 1 arranged in chronological order.
  • the method provided in this embodiment may also include the following steps:
  • the management party may be a master server (Master) in the distributed database system, or a master server communicated with the management party or the like.
  • Master master server
  • the main server is mainly responsible for the management of data tables and partitions in terms of functions, including:
  • the server (such as the Region server) is down, it is responsible for the partition migration on the failed server.
  • the execution subject of step 304 in this embodiment can know the distribution of partitions on each server, which server the partition is migrated to, and so on. Therefore, when a partition moves, the main server can generate the track item corresponding to the partition according to the movement information (including but not limited to: partition ID, moving target server ID, moving timestamp (or online timestamp), etc.), and The track item is added to the mobile track corresponding to the partition.
  • the master server Master
  • the execution subject of step 304 in this embodiment can know the distribution of partitions on each server, which server the partition is migrated to, and so on. Therefore, when a partition moves, the main server can generate the track item corresponding to the partition according to the movement information (including but not limited to: partition ID, moving target server ID, moving timestamp (or online timestamp), etc.), and The track item is added to the mobile track corresponding to the partition.
  • the movement information including but not limited to: partition ID, moving target server ID, moving timestamp (or online timestamp), etc.
  • the execution subject of step 304 in this embodiment is another management party (such as a management device, also called a management server) that communicates with the main server and is responsible for maintaining the movement track and assigning permissions.
  • a management device also called a management server
  • the master server detects that there is a partition moving event, it sends partition moving event information to the management side.
  • the partition moving event includes but not limited to: partition ID, server ID before moving, server ID after moving, moving timestamp (or online timestamp).
  • the pre-moving server ID may or may not be present. Because the management side maintains the moving track corresponding to the partition, by traversing each track item in the moving track, it can know the ID of the server where the partition was located before this move (ie, the server ID before moving).
  • the example shown in FIG. 6 is also used for illustration.
  • the data processing system includes: a server (such as server 1, server 2, and server 3), a management side, and a reading side.
  • the managing party may include a coordinating unit (Corordinator), and each server corresponds to a reading unit (Reader).
  • the method of this embodiment includes the following steps:
  • the reading unit corresponding to the server reads the log of the partition 1 in the log file of the server.
  • wal.hasNext() may be used to check whether each log sequence of the log file contains the log of partition 1.
  • wal.next() may be used to obtain the first log of the partition 1 in the log sequence of the log file.
  • the first time in the permission of the server to send the first time log that is, the timestamp corresponding to the first log obtained by wal.next().
  • step S12 Determine whether the first log of the partition 1 is an online log; if the first log of the partition 1 is an online log, then mark the partition 1 as a new online partition; when the authority corresponding to the partition 1 is stored on the server information, it is also necessary to clear the permission information corresponding to the partition 1 given to the server before the online log. If the first log of partition 1 is not an online log, execute step S13.
  • S13 Determine whether the permission information corresponding to the partition 1 is stored on the server, and if so, obtain the timestamp of the partition 1, which is greater than or equal to the timestamp corresponding to the first log (that is, the first log mentioned above. time), and send the at least one log to a reader (such as a subscriber/consumer). Otherwise, apply to the administrator for permission to send.
  • the reading unit of the server After sending the at least one log of the partition 1 to the reader, the reading unit of the server determines the synchronization point corresponding to the partition 1 according to the timestamp of the at least one log.
  • the latest time of the timestamp in at least one log may be used as the synchronization point corresponding to the partition 1 .
  • the reading unit of the server reports the synchronization point corresponding to the partition 1 to the management side.
  • the servers in the above steps may be server 1, server 2 and server 3 in FIG. 6 .
  • the three servers will simultaneously or successively monitor the log reading events for partition 1. For example, the reader sends a read request to read the logs of partition 1 to the management side. Multiple servers send corresponding instructions.
  • the Corordinator (coordination unit) of the management side maintains the movement track of partition 1 and the synchronization points corresponding to the corresponding partitions reported by each server. Assuming that the reading unit of a certain server (for the convenience of explanation, the following steps are referred to as the first server) applies for the sending authority of partition 1 and the first time is T1, correspondingly, the Corordinator processes the reading unit of the first server
  • the sending permission application process is as follows:
  • S6 Determine whether T1 is greater than or equal to sever_synctime, if so, determine to give the first server permission, and issue permission to the reading unit of the first server, so that after the reading unit of the first server obtains the permission, Send at least one log of partition 1 on the first server to the reader; otherwise, the application for permission fails, and the next application is waiting.
  • the timestamp corresponding to the first log obtained by reading unit 1 of server 1 through wal.next() is t2; the first log of partition 1 on server 1 is the online log, and server 1
  • the permission information corresponding to the partition 1 is not stored.
  • the reading unit 1' of the server 1 needs to apply for the sending permission from the Corordinator of the management side. If Server 1 applies for the sending permission to the Corordinator of the management side, but Server 2 has not uploaded the synchronization point corresponding to Partition 1, it means that Server 2 has not yet completed the synchronization, and Server 1 does not yet have the sending permission for Partition 1 logs.
  • server 1 Only when server 2 uploads the synchronization point corresponding to partition 1, and the timestamp corresponding to the log of partition 1 sent by server 1 is greater than or equal to the synchronization point corresponding to partition 1 on server 2, can the sending permission be obtained. That is, server 1 needs to wait for server 2 to send the logs of the two partitions 1 with timestamps t6 and t7 before obtaining permission, and after obtaining the permissions, send the logs of the two partitions 1 with timestamps t4 and t5 Logs are sent to readers.
  • server 3 it is necessary to wait until server 1 sends the logs of the two partitions 1 with timestamps t4 and t5 before obtaining permission, and after obtaining the permission, send the logs of partition 1 with timestamp t8 sent to the reader.
  • each embodiment of the present application provides a mechanism for determining the transmission authority based on the movement track, and the mechanism can ensure that the data is output in chronological order.
  • each log in the log sequence contained in the log file of each server is in the form of a key-value pair, such as: key (key) 1-value (value) 1, key 2- Value 2, key 3-value 3, etc., wherein each key-value pair may include SequenceId, data identifier, and writing time (ie, timestamp).
  • key-value pair may include SequenceId, data identifier, and writing time (ie, timestamp).
  • Fig. 8 is a schematic structural diagram of a data processing device provided by an exemplary embodiment of the present application.
  • the data processing device is suitable for the first server in the above data processing system.
  • the data processing device includes: a determination module 21 , an acquisition module 22 and a sending module 23 .
  • the determination module 21 is configured to determine whether to have the permission to send the first time log when listening to the log reading event for the target data; wherein, the permission is moved between at least two servers according to the target data Movement trajectories are given sequentially.
  • the obtaining module 22 is configured to obtain at least one log of the target data whose time stamp is greater than or equal to the first time when having the permission.
  • the sending module 23 is used to send the at least one log to the reader.
  • the device provided in this embodiment may further include an application module, which is used to apply for the sending permission when there is no sending permission.
  • the sending permission is determined based on the relationship between the second time and the first time; the second time is a synchronization point corresponding to the target data on the second server, and the synchronization point reflects the The log of the target data before the second time on the second server has been synchronized; the second server is obtained by moving the target data between at least two servers.
  • monitoring module 21 when used to determine whether it has the authority to send the log within the first time, it is specifically used for:
  • the storage space has permission information for the target data, and has the permission.
  • the above device may further include a query module and a delete module.
  • the query module is used to query whether there is permission information for the target data in the storage space when the first time log of the target data is an online log; when the permission information is stored, delete the permission information .
  • the device provided in this embodiment may further include a storage module.
  • the storage module is used to store the applied permission information for the target data in the storage space after the application for the sending permission is successful.
  • the determination module 21 in this embodiment is further configured to determine the synchronization point of the target data according to the timestamp of the at least one log after sending the at least one log to the reader.
  • the sending module 23 is also used to send the synchronization point of the target data to the management side.
  • Fig. 9 is a schematic structural diagram of another data processing device provided by an exemplary embodiment of the present application.
  • the device is suitable for the manager in the above data processing system.
  • the device includes: a receiving module 31 , an acquiring module 32 and a determining module 33 .
  • the receiving module 31 is configured to receive the permission application request sent by the first server for the target data, and the permission application request carries the first time.
  • the acquiring module 32 is configured to acquire a movement track of the target data moving between at least two servers.
  • the determination module 33 is configured to determine whether to give the first server the right to send the first time log according to the first time and the movement track.
  • the determination module 33 determines whether to give the first server the authority to send the first time log according to the first time and the movement track, it is specifically used to:
  • the second time is a synchronization point corresponding to the target data on the second server, and the synchronization point reflects all the synchronization points before the second time on the second server
  • the log of the target data has been synchronized
  • the determination module 33 compares the relationship between the first time and the second time to determine whether to give the first server permission to send, it is specifically used for:
  • the device provided in this embodiment may further include a generating adding module.
  • the generating and adding module is used to generate a corresponding track item after listening to the event that the target data is moved from the second server to the first server; and add the track item to the moving track.
  • the present application also provides an electronic device.
  • An instruction at least one program, set of codes, or sets of instructions is loaded and executed by said processor for:
  • the aforementioned memory 41 may be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method to operate on the electronic device.
  • Memory 41 can be realized by any type of volatile or nonvolatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • processor 42 executes the program in the memory 41, in addition to the above functions, it can also realize other functions, for details, please refer to the descriptions of the previous embodiments.
  • the electronic device further includes: a communication component 43 , a display 44 , a power supply component 45 , an audio component 46 and other components.
  • FIG. 10 only schematically shows some components, which does not mean that the electronic device only includes the components shown in FIG. 6 .
  • the electronic device provided in this embodiment may be a server in a distributed database system, more specifically, it may be a partition server in a partition server cluster, and the server may be a physical server or a virtual server. This embodiment does not specifically limit it.
  • the electronic device includes a processor and a memory, the memory stores at least one instruction, at least one program, code set or instruction set, and the at least one instruction, at least one program, code set or instruction set is controlled by the The above processor is loaded and executed for:
  • the first time and the movement track determine whether to give the first server the right to send the first time log.
  • the electronic device provided in this embodiment may be the manager in the data processing system, more specifically, it may be the main server in the distributed database system, and the coordinating unit is deployed in the main server to realize the above-mentioned information based on the target data.
  • Mobile track the function of giving the corresponding server permission to send.
  • the embodiments of the present application also provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a computer, the steps or functions of the data processing methods provided in the foregoing embodiments can be realized.
  • the embodiment of the present application also provides a computer program product.
  • the computer program product comprises computer programs or instructions.
  • the processor is enabled to implement the steps or functions of the data processing methods provided in the foregoing embodiments.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.
  • each implementation can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware.
  • the essence of the above technical solution or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic discs, optical discs, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

Abstract

一种数据处理方法、系统、电子设备及存储介质。其中,方法包括:监听到针对目标数据的日志读取事件时,确定是否具有发送第一时间日志的权限;所述权限是按照所述目标数据在至少两个服务器间移动的移动轨迹被顺序给予的;具有权限时,获取目标数据的、时间戳大于或等于第一时间的至少一个日志;将至少一个日志发送至读取方。一种数据处理方法,即针对目标数据在多个服务器之间发生转移,各服务器的日志文件中均记录有该目标数据对应日志的情况,在任一服务器欲向读取方(如数据日志订阅方/消费方)发送该目标数据的日志时需具有权限,以实现所述目标数据的日志按时间顺序发送至读取方而不出现混乱。

Description

数据处理方法、系统、电子设备及存储介质
本申请要求于2021年07月07日提交中国专利局、申请号为202110766115.1、申请名称为“数据处理方法、系统、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于数据库技术领域,尤其涉及一种数据处理方法、系统、电子设备及存储介质。
背景技术
数据库中数据变动后,很多场景是需要实时订阅表的数据变更的,如同步到消息队列、应用间消息通信、实时计算场景等。CDC(change data capture,变化数据捕捉)能力是数据库的一项常用的功能。CDC的核心思想是:监测并捕获数据库的变动(包括数据或数据表的插入、更新、删除等),将这些变更按发生的顺序完整记录下来,写入到消息中间件中以供其他服务进行订阅及消费。记录数据、数据表或一数据表的一个分区一次变动内容的文件可称为日志。
类似于HBase的分布式数据库,一个数据表的数据按分区分布在不同的服务器。如果分区发生移动,则该分区的数据不同时间产生的日志会写入到不同服务器中。从不同服务器中采集该分区数据对应的日志时,如果只是简单的并发采集每台服务器的日志,则会发生同一个分区数据对应的日志不能按照时间顺序被下游订阅方/消费方设备收到,而是乱序的。
发明内容
针对现有技术存在的问题,本申请实施例提供一种数据处理方法、系统、电子设备及存储介质。
具体的,在本申请的一个实施例中,提供了一种数据处理方法,适用于第一服务器,包括:
监听到针对目标数据的日志读取事件时,确定是否具有发送第一时间日志的权限;其中,所述权限是按照所述目标数据在至少两个服务器间移动的移动轨迹被顺序给予的;
具有所述权限时,获取所述目标数据的、时间戳大于或等于所述第一时间的至少一个日志;
将所述至少一个日志发送至读取方。
在本申请的另一个实施例中,提供了一种数据处理方法,包括:
接收第一服务器针对目标数据发送的权限申请请求,其中,所述权限申请请求携带有第一时间;
获取所述目标数据在至少两个服务器间移动的移动轨迹;
根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时间日志的权限。
在本申请的又一个实施例中,提供了一种数据处理系统,包括:第一服务器、第二服务器、读取方以及管理方,其中:
第一服务器,用于监听到针对目标数据的日志读取事件时,确定是否具有发送第一时间日志的权限;具有所述权限时,获取所述目标数据的、时间戳大于或等于所述第一时间的至少一个日志;将所述至少一个日志发送至所述读取方;无所述发送权限时,向所述管理方申请发送权限;
管理方,用于接收所述第一服务器针对目标数据发送的权限申请请求,其中,所述权限申请请求携带有第一时间;获取所述目标数据在至少两个服务器间移动的移动轨迹;根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时间日志的权限。
本申请的又一个实施例中,提供了一种电子设备。该电子设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由所述处理器加载并执行以实现上述各方法实施例中的步骤。
本申请的又一个实施例中,提供了一种计算机程序产品。该计算机程序产品包括计算机程序/指令,当所述计算机程序/指令被处理器执行时,致使所述处理器能够实现上述各方法实施例中的步骤。
本申请实施例提供了一种数据处理方法,即针对目标数据在多个服务器之间发生转移,各服务器的日志文件中均记录有该目标数据对应日志的情况,在任一服务器欲向读取方(如数据日志订阅方/消费方)发送该目标数据的日志时,需先确定自身是否具有发送第一时间日志的权限。在具有该权限时,才能获取该目标数据的、时间戳大于或等于所述第一时间的至少一个日志。其中,针对不同服务器,第一时间是不同的。第一时间是服务器的日志文件中记录的、符合读取方读取日志时间戳要求的时间点。可见,本申请实施例提供的方案中,多个服务器中任一服务器都需要在获取到相应的权限后,才能向读取方发送数据的日志;在具体实施时,可基于目标数据在不同服务器间移动的移动轨迹,来管控各服务器获取权限的顺序,便能实现所述目标数据的日志按时间顺序发送至读取方,而不会出现混乱。
这里需要补充的是:下文具体实施方式中将会以举例的方式进一步的对本申请提供的方案所带来的有益效果进行说明。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1为数据表通过行键按照一定范围被分割为多个分区(Region)后的示意图;
图2为数据表的多个分区可分布在多个Region服务器上的示意图;
图3为Region服务器主要构成部分是日志文件和Region块的示意图;
图4为本申请一实施例提供的数据处理系统的结构示意图;
图5为本申请一实施例提供的一种数据处理方法的流程示意图;
图6为本申请一实施例提供的数据处理系统中多个服务器、管理方及读取方交互的原理性示意图;
图7为本申请另一实施例提供的数据处理方法的流程意图;
图8为本申请一实施例提供的数据处理装置的结构示意图;
图9为本申请另一实施例提供的数据处理装置的结构示意图;
图10为本申请一实施例提供的电子设备的结构示意图。
具体实施方式
首先,在对本申请实施例进行描述的过程中出现的部分名词或术语适用于如下解释:
HBase:HBase是一个开源的非关系型数据库存储系统。
Wal:在计算机领域中,Write-ahead logging,预写式日志,是关系数据库系统中用于提供原子性和持久化的一系列技术。在使用WAL的系统中,所有的修改在提交之前都要先写入日志(log)文件中。
Segment:日志序列,是一段连续数据更新日志,一但关闭则变为只读。为了便于管理,把日志文件划分为N个segment,每个segment称为Wal segment file(日志序列文件)。
LogEntry:一个日志,记录一张表的一个数据的一次更新操作,包含操作时间及更新内容。
Region:分区,表示一段连续的数据空间,一个分区包含一个起始键(startkey)和一个结束键(endkey)。
SequenceId:sequenceId是Region级别的一次行级事务的自增序号。自增序号就是随着时间推移不断自增,不会减小。行级事务,简单来说就是更新一行中的多个列族、多个列,行级事务能够保证这次更新的原子性、一致性、持久性以及设置的隔 离性。HBase会为一次行级事务分配一个自增序号。每个Region(分区数据)都维护属于自己的SequenceId,不同的Region(分区数据)的SequenceId相互独立。
分区移动轨迹:分区移动轨迹记录,记录数据在某一个时间点在某台服务器上的一次上线。
OpenMark:数据上线日志,包含数据标识、时间戳、SequenceId等信息,每台服务器打开一个数据时,在Wal中记录一个OpenMark。
CDC:change data capture,即变化数据捕捉,本申请文件中主要指通过采集数据库的wal日志文件中的日志来获取数据更新内容。
同步点位:一个时间点,如果一台服务器的同步点位是T,代表该服务器T之前的数据都已经同步完成。
Key:primary key,数据库中表的主键,一条数据的唯一标示。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义,“多种”一般包含至少两种,但是不排除包含至少一种的情况。应当理解,本文中的“第一”、“第二”等描述,是用于区分不同的元件、设备等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。取决于语境,如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于监测”。类似地,取决于语境,短语“如果确定”或“如果监测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当监测(陈述的条件或事件)时”或“响应于监测(陈述的条件或事件)”。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的商品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种商品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的商品或者系统中还存在另外的相同要素。
在详细介绍如下各实施例之前,对本申请各实施例中提及的数据进行一个说明。本申请各实施例中的数据可以是:数据、数据表的一个分区或数据集等。本申请各实施例中的日志也可以称为数据变更信息,或其他类似的用于记录数据、数据表的一个分区或数据集变动的信息等等,本申请对此不在具体限定。本申请各实施例中之所以 使用日志,是因为到本申请的申请日,本领域技术人员习惯将记录有数据、数据表的一个分区或数据集变动的信息、可用于将数据持久化磁盘、可用于作为数据修复的数据、信息等,称为日志。对于不同类型的数据库来说,数据库内存储数据方式会不一样,相应的本申请各实施例提及的数据可能会存在不一样。比如,HBase是一种构建在HDFS之上的分布式、面向列的存储系统。HBase以表的形式存储数据。如图1所示,数据表Table由行和列组成,列划分为若干个列(row family)。数据表通过行键按照一定范围被分割为多个分区(Region),每个分区分散在不同的服务器(如Region服务器)中。也就是说,对于HBase,本申请各实施例中提及的数据(如目标数据)为一个数据表的一个分区。
现有技术中,分布式数据库为了实现按数据更新时间向下游订阅方/消费方顺序输出某一个数据的日志,一般通过两种方案。
第一种方案为:同步写方案
同步写方案是通过在数据写入数据库的同时,将数据发送到下游订阅方/消费方来保证有序。如在HBase服务端添加协处理器,处理数据更新请求时先将数据发送到下游订阅方;该方案占用了数据的写入服务资源,需要同时写入数据库和下游订阅方,影响了写入性能,降低了系统的稳定性,要保证数据库和下游订阅数据的一致性需要保证数据库写入和下游写入都成功。下游订阅方不可用的情况下,整个写入都会失败。
第二种方案:
引入了Barrier(分界线)的概念,每当服务器中有Region上线时,就会写入一个新的Barrier到Meta表,其值是Region上线时读到的最大序号加1。HBase中每个Region都有一个序号,且严格递增,同时序号会随着每次写入操作一起写入到日志中。所以当Region发生移动的时候,Region会在新的服务器中重新上线,这时就会写入一个新的Barrier,Region被移动多次之后,就会写入多个Barrier,来将Region的写入操作划分成为多个区间。同时每个Region都维护了一个最后一次写操作的序号,其代表该Region当前推送成功的最后一个写操作的序号,根据Barrier列表和最后一次写操作的序号来判断写式日志中的一个写入操作是否能够复制到备集群。该方案,对系统侵入性较强,需要在Meta表中记录Barrier信息,同时强依赖序号的语义(严格递增,打开Region时序号+1等),另外每次同步数据时都需要访问Meta表,更新最后一次写操作的序号、查询Barrier,同步性能较差,也额外增加了meta表的压力。
这里对Region上线的概念进行简单的解释:一个数据表的多个Region可分布存储到多台服务器上(如Region服务器)。如图2所示,Master主服务器把不同的Region分配到不同的Region服务器上。同一行键的Region不会被拆分到多个Region服务器上。每个Region服务器负责管理一个Region,通常在每个Region服务器上会放置10~1000个Region。
例如,图2所示,数据表Table1的Region11存储在Region服务器a;数据表Table1的Region12存储在Region服务器c。客户端在插入、删除、查询数据时需要知道哪个Region服务器上存储所需的Region,这个查找Region的过程称为Region定位。任何时刻,一个Region只能分配一个Region服务器。Master记录了当前有哪些可用的Region服务器,以及当前哪些Region分配给了哪些Region服务器,哪些Region还没有分配。当需要分配的新的Region,并且有一个Region服务器上有可用空间时,Master就给这个Region服务器发送一个装载请求,把Region分配给这个Region服务器,Region服务器得到请求后,就开始对此Region提供服务。Region服务器开始为此Region提供服务即可理解为:Region上线,或Region在该Region服务器上线。
Region从一个Region服务器转移至另一个Region服务器的情况,即分区转移。对于移除该Region的Region服务器需下线(或卸载)该Region;对于增加该Region的Region服务器需上线(或装载)该Region,还会生成一个针对该Region的OpenMark。
如图3所示,HBase的核心模块是Region服务器。Region服务器由多个Region块构成,Region块中存储一系列连续的数据集(即存储有一个数据表的一个分区)。Region服务器主要构成部分是日志文件和Region块。日志文件中记录Region服务器所服务的所有Region的操作日志,如图3所示。
Region块包含有多个store,每个store对应当前分区中的一个列族,每个store管理一块内存,即MemStore。当MemStore中的数据达到一定条件时会写入StoreFile文件中,因此每个store包含若干个StoreFile文件。StoreFile文件对应HDFS中的HFile文件。
下面将结合附图,详细说明本申请各实施例提供的技术方案。
图4为本申请一示例性实施例提供的一种数据处理系统的结构示意图,该系统至少包括:第一服务器11、第二服务器12、读取方13以及管理方14。其中,第一服务器11和第二服务器12可以图2和3中的任意两个Region服务器,比如,第一服务器可以是Region服务器a,第二服务器可以是Region服务器b。或者,第一服务器可以是Region服务器c,第二服务器可以是Region服务器d。如图3所示,多个Region服务器可以称为Region服务器集群。第一服务器和第二服务器可以是Region服务器集群中的任意两个,本实施例对此不做限定。
上述管理方14可以是上文中提及的Master主服务器,或是除Master主服务器外新增的管理设备,本实施例对此不作限定。读取方13可以是订阅方设备、消费方设备等等,本实施例对此不做限定。
第一服务器11,用于监听到针对目标数据的日志读取事件时,确定是否具有发送第一时间日志的权限;具有所述权限时,获取所述目标数据的、时间戳大于或等于所述第一时间的至少一个日志;将所述至少一个日志发送至所述读取方13;无所述发送 权限时,向所述管理方14申请发送权限;
管理方14,用于接收所述第一服务器11针对目标数据发送的权限申请请求,其中,所述权限申请请求携带有第一时间;基于所述目标数据在至少两个服务器间移动的移动轨迹,确定所述第二服务器12;其中,所述目标数据从所述第二服务器12移动至所述第一服务器11;获取第二时间,其中,所述第二时间是所述第二服务器12上有关所述目标数据的同步点位,反映第二时间之前的所述目标数据的日志已完成同步;通过比较所述第一时间及所述第二时间的关系,确定是否给予所述第一服务器11权限。
本申请实施例提供的所述系统适用于比如HBase、Lindorm等分布式数据库系统。分布式系统中,每个服务器(如region服务器)中都存储有一份日志文件(如图3所示),记录当前服务器负责的多个Region的数据变更(如插入、删除、更新等),新的数据更新顺序追加至日志文件的末尾。日志文件中的日志可划分为多个按时间有序的segment。每个segment中多个日志记录了该服务器上一段时间内的所有数据更新。这里的所有数据更新是指:服务器上服务的所有Region的日志。
这里需要说明的是:本实施例提供的所述系统中不只包含有两个服务器,还可包含有三个、四个或更多个。管理方可以是数据库集群(如HBase集群)的主节点(如上文中提及的的主服务器)。其中,主节点可以配置一个或多个,用来实现HA(Highly Available,双机集群系统,提高可用性集群,是保证工作连续性的有效解决方案,一般有两个或两个以上的节点,且分为主节点及备用节点)。该管理方具有为各服务器(如Region服务器)分配region,负责服务器的负载均衡,发现失效的服务器并重新分配其上的Region。
服务器(如上述的第一服务器、第二服务器等)均维护有多个Region,处理对这些Region的读写IO请求,还负责切分在运行过程中变得过大的Region。
具体地,分布式系统中的各服务器可对应有读取单元。如图6中,服务器1对应的读取单元1’;服务器2对应的读取单元2’;服务器3对应的读取单元3’。读取单元可采集其对应的服务器的日志、基于该日志向管理方发送权限申请,以及服务器对应的同步点位。前述日志读取事件,可以由接收自读取方的日志获取请求触发,也可以由来自第一服务器的上游管理设备下发的指令触发。管理方可对应有协调单元(Corordinator),协调单元可获取目标数据的移动轨迹,如分区1的移动轨迹;接收服务器的读取单元上报的同步点位;接收服务器读取单元申请发送权限的请求,并依据目标数据的移动轨迹向相应的服务器的读取单元下发权限。
本系统实施例中的各组成单元,如第一服务器11、第二服务器12、读取方13以及管理方14的执行原理及交互过程可参见如下各方法实施例的描述。
图5为本申请一示例性实施例提供的一种数据处理方法的流程示意图,该方法的执行主体可以为分布式系统中的第一服务器,该方法至少包括以下步骤:
201、监听到针对目标数据的日志读取事件时,确定是否具有发送第一时间日志的权限,其中,所述权限是按照所述目标数据在至少两个服务器间移动的移动轨迹被顺序给予的。
202、具有所述权限时,获取所述目标数据的、时间戳大于或等于所述第一时间的至少一个日志。
203、将所述至少一个日志发送至读取方。
上述201中,目标数据可以是一个数据表的一个Region(分区),或是数据集群,本实施例对此不做限定。针对目标数据的日志读取事件,可以是读取方发起的,比如读取方发送的针对所述目标数据的日志读取请求。或者,针对目标数据的日志读取事件是管理方发起的,比如管理方定期或不定期的将目标数据的发送至下游的读取方(比如订阅方/消费方),管理方向各服务器发送针对所述目标数据的下发指令,各服务器(比如本方法实施例中的第一服务器)在接收到该下发指令后,便触发针对目标数据的日志读取事件。
其中,发送第一时间日志的权限,可由上述系统中的管理方赋予。具体实施时可由本实施例方法的执行主体(如第一服务器)主动申请,管理方统筹各服务器的发送顺序,来确定赋予各服务器发送目标数据的日志。当然,也可由管理方主动地顺序的为各服务器下发权限,各服务器获取到权限后,便可执行步骤202。
这里需要说明一下步骤201中的“第一时间”。所述第一时间是指服务器的日志文件中包含的第一个目标数据对应日志对应的时间戳。为了方便理解,将结合具体示例进行说明。如图6所示例子,假设本实施例中的目标数据为分区1。图6中每个方块中,按照填充的图案区分不同分区。比如,
Figure PCTCN2022103200-appb-000001
代表分区1的日志;“□”代表分区2的日志;“■”代表分区3的日志,等等,此处不一一列举。根据分区1的移动轨迹可知,服务器1是在t2时刻开始为分区1提供服务的;服务器2是在t1时刻开始为分区1提供服务的;服务器3是在t3时刻开始为分区1提供服务的。其中,按照时间顺序,t1早于t2,t2早于t1。服务器会在开始为分区1提供服务的时刻起在日志文件中记录分区1的日志,直至该分区1转移至其他服务器。也就是说,服务器上的日志文件中包含有分区1的上线日志(openmark)以及该分区1的上线日志对应时间戳之后的至少一个日志。上线日志对应的时间戳即所述第一时间。若服务器的日志文件中不包含上线日志,那么日志文件中第一个目标数据的日志对应的时间戳即为本实施例中的第一时间。
比如,读取方欲读取t1时间后的分区1的日志。对于服务器1来说,服务器1监听到该日志读取事件后,服务器1需确定自身是否具有发送t4时间日志的权限。在具有发送t4时间(即服务器1对应的第一时间)日志的权限后,服务器1便可获取分区1的、时间戳大于或等于t4的两个日志,分别为时间戳为t4的日志,以及时间戳为t5的日志。再比如,对于服务器2来说,服务器2监听到该日志读取事件后, 服务器2需确定自身是否具有发送t6时间(即服务器2对应的第一时间)日志的权限。在具有发送t6时间日志的权限后,服务器2便可获取分区1的、时间戳大于或等于t6的两个日志,分别为时间戳为t6的日志,以及时间戳为t7的日志。同样的,对于服务器3来说,服务器3在具有发送t8时间日志的权限后,服务器3便可获取分区1的、时间戳等于t8的一个日志。
而上述各服务器,如服务器1、服务器2和服务器3,发送自身日志文件中保存的相应时间段的某一分区(如分区1)日志的权限,需按照该分区的转移轨迹来顺序赋予,进而保证下游读取方接收到分区日志的时间有序,而不是乱序的。
本实施例提供了一种数据处理方法,即针对目标数据在多个服务器之间发生转移,各服务器的日志文件中均记录有该目标数据对应日志的情况,在任一服务器欲向读取方(如数据日志订阅方/消费方)发送该目标数据的日志时,需先确定自身是否具有发送第一时间日志的权限。在具有该权限时,才能获取该目标数据的、时间戳大于或等于所述第一时间的至少一个日志。其中,针对不同服务器,第一时间是不同的。第一时间是服务器的日志文件中记录的、符合读取方读取日志时间戳要求的时间点。可见,本申请实施例提供的方案中,多个服务器中任一服务器都需要在获取到相应的权限后,才能向读取方发送数据的日志;在具体实施时,可基于目标数据在不同服务器间移动的移动轨迹,来管控各服务器获取权限的顺序,便能实现所述目标数据的日志按时间顺序发送至读取方,而不会出现混乱。
进一步的,本实施例提供的所述方法还可包括如下步骤:
204、无所述发送权限时,申请发送权限;
其中,所述发送权限是基于第二时间与所述第一时间的关系确定的,所述第二时间是第二服务器上所述目标数据对应的同步点位,所述同步点位反映所述第二服务器上第二时间之前的所述目标数据的日志已完成同步;所述第二服务器是通过所述目标数据在至少两个服务器间移动的移动轨迹得到。
有关申请发送权限的详细内容将在下文中阐释,请参见下文中的相应部分。
另外,这里需要说明的是:上述第二时间即同步点位,见本具体实施方式部分的前部,名词和术语的解释部分。同步点位可在服务器发送完相应日志后向管理方上报,以便于管理方基于各服务器的同步点位,并结合目标数据在多个服务器间移动的移动轨迹来确定为各服务器赋予相应权限的时机。即,本实施例提供的所述方法还包括如下步骤:
205、将所述至少一个日志发送至读取方之后,根据所述至少一个日志的时间戳确定所述目标数据的同步点位;
206、向管理方发送所述目标数据的同步点位。
进一步的,前述201中,“确定是否具有发送第一时间日志的权限”,可包括:
2011、若所述目标数据的第一时间日志为上线日志,则不具所述权限;
2012、若所述目标数据的第一时间日志不为上线日志,则存储空间内存有针对所述目标数据的权限信息时具有所述权限。
再进一步的,本实施例提供的所述方法还可包括如下步骤:
2013、若所述目标数据的第一时间日志为上线日志,则查询存储空间内是否存有针对所述目标数据的权限信息;
2014、存有所述权限信息时,删除所述权限信息。
进一步的,本实施例提供的所述方法还可包括如下步骤:
207、申请发送权限成功后,在存储空间内存储申请到针对所述目标数据的权限信息。
图7示出了本申请另一实施例提供的数据处理方法的流程示意图。如图7所示,所述方法包括:
301、接收第一服务器针对目标数据发送的权限申请请求,所述权限申请请求携带有第一时间;
302、获取所述目标数据在至少两个服务器间移动的移动轨迹;
303、根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时间日志的权限。
上述301中,所述权限申请请求中除包含有第一时间外,还可包含有目标数据标识(如是分区的话,可以是RegionID)、第一服务器标识等。
上述302中,所述目标数据在至少两个服务器间移动的移动轨迹中可包含有:轨迹项。一个分区的一个轨迹项(RegionTraceInfo)记录了该分区在一个服务器上上线的信息,即该轨迹项内包含有:分区标识(RegionID)、服务器标识、上线时间戳等。通过将多个轨迹项按照上线时间戳进行顺序排列,便可得到按照时间顺序推移、分区在不同服务器间移动的过程。
在一具体的实施方案中,上述303“根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时间日志的权限”可具体包括如下步骤:
3031、根据移动轨迹,确定第二服务器;其中,所述目标数据从所述第二服务器移动至所述第一服务器;
3032、获取第二时间,其中,所述第二时间是所述第二服务器上所述目标数据对应的同步点位,所述同步点位反映了所述第二服务器上所述第二时间之前的所述目标数据的日志已完成同步;
3033、通过比较所述第一时间及所述第二时间的关系,确定是否给予所述第一服务器权限。
参见图6所示实例,分区1的移动轨迹包括:轨迹项1、轨迹项2和轨迹项3。其中,轨迹项1是服务器2在分区1上线时上报至管理方的。例如,服务器2在分区 1上线时,通过服务器2对应的一个读取单元2向管理方上报分区1的上线信息,以便管理方生成分区1的轨迹项1。同理,服务器1在分区1上线时上报,管理方生成分区1的轨迹项2;服务器3在分区1上线时上报,管理方生成分区1的轨迹项3。
轨迹项1至少包含有:服务器2标识、分区1标识、上线时间戳t1。轨迹项2至少包含有:服务器1标识、分区1标识、上线时间戳t2。轨迹项3至少包含有:服务器3标识、分区1标识、上线时间戳t3。
按照时间顺序排列,可得出分区1的移动轨迹为:服务器2—>服务器1—>服务器3。
上述3032中,所述第二时间即所述第二服务器上所述目标数据对应的同步点位。参见图6所示的实例,假设分区1为本实施例中的目标数据、服务器2为本实施例中的第二服务器,服务器1为本实施例中的第一服务器。结合图6可以看出,服务器2上所述分区1对应的同步点位为t2,即在t2时刻之前的日志均已完成同步,或是说t2时刻之前的日志均已完成发送。
上述3033,在具体实施时,可具体为:
所述第一时间大于或等于所述第二时间时,给予所述第一服务器所述权限。
相对的,所述第一时间小于所述第二时间时,所述第一服务器申请权限失败。
简单理解就是,在移动轨迹的前一服务器完成目标数据对应日志的同步工作后,后一服务器才能具有相应的权限,以向读取方同步本地日志文件中存储的该目标数据的日志。在前一服务器未完成目标数据对应日志的同步工作前,后一服务器是不同获取到权限的。
举个反例来说,假设图6所示的实例中没有采用本实施例提供的数据处理方法。服务器1、服务器2和服务器3中的日志文件中均包含有分区1的日志。当服务器1、服务器2和服务器3分别对应的读取单元1’、2’和3’采用并行采集的方式采集时,各服务器1、服务器2和服务器3上的分区1的日志会采用并发的方式,发送至读取方。读取方在接收时接收到的分区1的日志是乱序的。而采用本实施例提供的方案,管理方记录有分区1在不同服务器间的移动轨迹,且能根据移动轨迹以及各服务器申请的欲发送的日志的时间戳或时间段,来为各服务器顺次的下发权限,使得各服务器能按照顺序发送相应日志文件中的分区1的日志,这样读取方便能接收到按照时间顺序排列的分区1的日志。
进一步的,本实施例提供的所述方法还可包括如下步骤:
304、监听到所述目标数据由第二服务器移动至第一服务器的事件后,生成相应的轨迹项;
305、将所述轨迹项添加至所述移动轨迹中。
本实施例所述方法的执行主体:管理方可以是分布式数据库系统中的主服务器(Master),或者是与管理方通信连接的主服务器等等。
以分布式数据库系统,如HBase为例,主服务器在功能上主要负责数据表和分区的管理工作,具体包括:
管理用户对数据表的增、删、改、查操作;
管理服务器(如Region服务器)的负载均衡,调整分区的分布;
在分区分裂后,负责分裂出的新分区的分配;
在服务器(如Region服务器)停机后,负责失效服务器上的分区迁移。
相应的,本实施例步骤304的执行主体,如主服务器(Master)是能获知各服务器上分区的分布,分区从哪个服务器迁移至哪个服务器等。因此,主服务器可在出现分区移动时,根据移动信息(包括但不限于:分区标识、移动目标服务器标识、移动时间戳(或是上线时间戳)等)生成该分区对应的轨迹项,并将该轨迹项添加到该分区对应的移动轨迹中。
或者,本实施例步骤304的执行主体为与主服务器通信的另一个专门负责维护移动轨迹、赋权限的管理方(如管理设备,也可称为管理服务器)。主服务器在监听到存在有分区移动的事件时,向管理方发送分区移动事件信息。例如,该分区移动事件包括但不限于:分区标识、移动前服务器标识、移动后服务器标识、移动时间戳(或是上线时间戳)。其中,移动前服务器标识可有,可无。因为,管理方维护有该分区对应的移动轨迹,通过遍历移动轨迹中的各轨迹项,便可知道分区此次移动前,所在的服务器标识(即移动前服务器标识)。
上述各方法实施例分别站在服务器角度(即服务器是执行主体)、管理方角度(即管理方是执行主体)来阐述本申请的技术方案。下面结合另一具体的实施例,不分角度地对本申请实施例提供的技术方案进行说明。
还以图6所示的例子进行说明,该数据处理系统包括:服务器(如服务器1、服务器2、服务器3)、管理方及读取方。其中,所述管理方中可包含有协调单元(Corordinator),各服务器对应有一个读取单元(Reader)。具体的,本实施例方法包括如下步骤:
S1、服务器监听到针对分区1的日志读取事件时,服务器对应的读取单元读取所述服务器的日志文件中所述分区1的日志。
读取单元读取分区1日志的过程为:
S11、检查日志文件中是否含有分区1的日志。当日志文件中含有分区1的日志时,获取日志文件中的分区1的日志。
例如,在具体实施时,可使用wal.hasNext()检查日志文件的各日志序列中是否含有分区1的日志。检查出含有分区1的日志时,使用wal.next()获得日志文件的日志序列中所述分区1的第一个日志。所述服务器的发送第一时间日志的权限中的第一时间,即wal.next()获得的所述第一个日志对应的时间戳。
S12、判断所述分区1的第一个日志是否为上线日志;若分区1的第一个日志为 上线日志,则标记该分区1为新上线分区;当服务器上存储有该分区1对应的权限信息时,还需清理所述上线日志之前所述服务器被给予的所述分区1对应的权限信息。若分区1的第一个日志不为上线日志,则执行步骤S13。
S13、判断服务器上是否存储所述分区1对应的权限信息,若有,则获取所述分区1的、时间戳大于或等于所述第一个日志对应时间戳(即上文中提及的第一时间)的至少一个日志,并将所述至少一个日志发送至读取方(如订阅方/消费方)。否则,向管理方申请发送权限。
S2、服务器的读取单元在将所述分区1的至少一个日志发送至读取方后,根据所述至少一个日志的时间戳,确定所述分区1对应的同步点位。
具体实施时,可将至少一个日志中时间戳最晚的时间,作为所述分区1对应的同步点位。
S3、服务器的读取单元向管理方上报所述分区1对应的同步点位。
上述各步骤中的服务器可以是图6中的服务器1、服务器2和服务器3。三个服务器会同时或相继监听到针对分区1的日志读取事件,比如,读取方向管理方发送读取分区1日志的读请求,此时管理方将会同时向为分区1提供过服务的多个服务器发送相应的指令。
管理方的Corordinator(协调单元)维护有分区1的移动轨迹以及各服务器上报的相应分区对应的同步点位。假设某一服务器(为方便说明,下述各步骤称为第一服务器)的读取单元申请分区1、第一时间为T1的发送权限,相应的,Corordinator处理该第一服务器的读取单元的发送权限申请过程如下:
S4、根据分区1的移动轨迹找到T1之前,分区1所在的前一服务器(为了方便说明,下述各步骤将其称为第二服务器)。
S5、获取第二服务器上分区1对应的同步点位sever_synctime。
S6、判断T1是否大于或等于sever_synctime,若是,则确定给予所述第一服务器权限,并向所述第一服务器的读取单元下发权限,以便第一服务器的读取单元在得到权限后,将所述第一服务器上分区1的至少一个日志发送至读取方;否则,申请权限失败,等待下次申请。
对于服务器1来说,服务器1的读取单元1通过wal.next()获得的第一个日志对应的时间戳为t2;服务器1上的分区1的第一个日志为上线日志,且服务器1未存储该分区1对应的权限信息,此时服务器1的读取单元1’需向管理方的Corordinator申请发送权限。若服务器1向管理方的Corordinator申请发送权限时,服务器2还未上传分区1对应的同步点位,则说明服务器2还未同步完成,此时服务器1还不能具备针对分区1日志的发送权限。只有在服务器2上传了分区1对应的同步点位,且服务器1发送的分区1的日志对应的时间戳大于或等于服务器2上分区1对应的同步点位,才能获取到发送权限。即,服务器1需要等待服务器2发送了时间戳为t6和 t7的两个分区1的日志之后,才可以获得权限,并在获取到权限后,将时间戳为t4和t5的两个分区1的日志发送给读取方。同理,对于服务器3来说,需要等到服务器1发送了时间戳为t4和t5的两个分区1的日志之后,才能获得权限,并在获得权限后,将时间戳为t8的分区1的日志发送给读取方。
由上述各实施例可以看出,本申请各实施例提供了一种基于移动轨迹确定发送权限的机制,该机制能保证数据按照时间顺序输出。
进一步的,如在分布式数据库系统中,各服务器的日志文件包含的日志序列中的各日志的格式是键值对的形式,如:键(key)1-值(value)1、键2-值2、键3-值3等,其中,各键值对可包含SequenceId、数据标识,以及写入时间(即时间戳)。由此可知,本申请各实施例提供的方案能保证数据在键(即key)级别按数据更新时间(即日志时间戳)输出。
图8为本申请一示例性实施例提供的一种数据处理装置的结构示意图。该数据处理装置适用于上述数据处理系统中的第一服务器。具体的,所述数据处理装置包括:确定模块21、获取模块22及发送模块23。其中,确定模块21用于在监听到针对目标数据的日志读取事件时,确定是否具有发送第一时间日志的权限;其中,所述权限是按照所述目标数据在至少两个服务器间移动的移动轨迹被顺序给予的。获取模块22用于在具有所述权限时,获取所述目标数据的、时间戳大于或等于所述第一时间的至少一个日志。发送模块23用于将所述至少一个日志发送至读取方。
进一步的,本实施例提供的所述装置还可包括申请模块,该申请模块用于无所述发送权限时,申请发送权限。其中,所述发送权限是基于第二时间与所述第一时间的关系确定的;所述第二时间是第二服务器上所述目标数据对应的同步点位,所述同步点位反映所述第二服务器上第二时间之前的所述目标数据的日志已完成同步;所述第二服务器是通过所述目标数据在至少两个服务器间移动的移动轨迹得到。
进一步地,前述监听模块21在用于确定是否具有发送第一时间内的日志的权限时,具体用于:
若所述目标数据的第一时间日志为上线日志,则不具所述权限;
若所述目标数据的第一时间日志不为上线日志,则存储空间内存有针对所述目标数据的权限信息时具有所述权限。
再进一步地,上述装置还可包括查询模块及删除模块。其中,查询模块用于在所述目标数据的第一时间日志为上线日志时,查询存储空间内是否存有针对所述目标数据的权限信息;存有所述权限信息时,删除所述权限信息。
进一步的,本实施例提供的所述装置还可包括存储模块。该存储模块用于申请发送权限成功后,在存储空间内存储申请到针对所述目标数据的权限信息。
进一步的,本实施例中所述确定模块21还用于将所述至少一个日志发送至读取方之后,根据所述至少一个日志的时间戳,确定所述目标数据的同步点位。所述发送 模块23还用于向管理方发送所述目标数据的同步点位。
这里需要说明的是:上述实施例提供的定位装置可实现上述各方法实施例中描述的技术方案,上述各模块或单元具体实现的原理可参见上述各方法实施例中的相应内容,此处不再赘述。
图9为本申请一示例性实施例提供的另一种数据处理装置的结构示意图。该装置可适用于上述数据处理系统中的管理方。该装置包括:接收模块31、获取模块32及确定模块33。其中:接收模块31用于接收第一服务器针对目标数据发送的权限申请请求,所述权限申请请求携带有第一时间。所述获取模块32用于获取所述目标数据在至少两个服务器间移动的移动轨迹。所述确定模块33用于根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时间日志的权限。
进一步的,所述确定模块33在根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时间日志的权限时,具体用于:
根据移动轨迹,确定第二服务器;其中,所述目标数据从所述第二服务器移动至所述第一服务器;
获取第二时间,其中,所述第二时间是所述第二服务器上所述目标数据对应的同步点位,所述同步点位反映了所述第二服务器上所述第二时间之前的所述目标数据的日志已完成同步;
通过比较所述第一时间及所述第二时间的关系,确定是否给予所述第一服务器权限。
再进一步的,所述确定模块33在通过比较所述第一时间及所述第二时间的关系,确定是否给予所述第一服务器发送权限时,具体用于:
所述第一时间大于或等于所述第二时间时,给予所述第一服务器所述权限。
进一步的,本实施例提供的所述装置还可包括生成添加模块。该生成添加模块用于监听到所述目标数据由第二服务器移动至第一服务器的事件后,生成相应的轨迹项;将所述轨迹项添加至所述移动轨迹中。
这里需要说明的是:上述实施例提供的定位装置可实现上述各方法实施例中描述的技术方案,上述各模块或单元具体实现的原理可参见上述各方法实施例中的相应内容,此处不再赘述。
本申请还提供了一种电子设备,如图10所示,该电子设备包括处理器42和存储器41,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由所述处理器加载并执行,以用于:
监听到针对目标数据的日志读取事件时,确定是否具有发送第一时间日志的权限;其中,所述权限是按照所述目标数据在至少两个服务器间移动的移动轨迹被顺序给予的;
具有所述权限时,获取所述目标数据的、时间戳大于或等于所述第一时间的至少 一个日志;
将所述至少一个日志发送至读取方。
上述存储器41可被配置为存储其它各种数据以支持在电子设备上的操作。这些数据的示例包括用于在的电子设备上操作的任何应用程序或方法的指令。存储器41可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
上述处理器42在执行存储器41中的程序时,除了上面的功能之外,还可实现其它功能,具体可参见前面各实施例的描述。
进一步,如图10所示,电子设备还包括:通信组件43、显示器44、电源组件45、音频组件46等其它组件。图10中仅示意性给出部分组件,并不意味着电子设备只包括图6所示组件。具体实施时,本实施例提供的所述电子设备可以是分布式数据库系统中的服务器,更具体的可以是分区服务器集群中的一分区服务器,该服务器可以是实体服务器、也可以是虚拟服务器,本实施例对此不作具体限定。
本申请又一个实施例提供一种电子设备,该电子设备的结构同图10。具体的,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由所述处理器加载并执行,以用于:
接收第一服务器针对目标数据发送的权限申请请求,所述权限申请请求携带有第一时间;
获取所述目标数据在至少两个服务器间移动的移动轨迹;
根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时间日志的权限。
上述处理器在执行存储中的程序时,除了上面的功能之外,还可实现其它功能,具体可参见前面各实施例的描述。
本实施例提供的电子设备可以是数据处理系统中的管理方,更具体的,可以是分布式数据库系统中的主服务器,该主服务器中部署有协调单元,以用于实现上述根据目标数据的移动轨迹,给予相应服务器发送权限的功能。
相应的,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,所述计算机程序被计算机执行时能够实现上述各实施例提供的数据处理方法的步骤或功能。
本申请实施例还提供一种计算机程序产品。该计算机程序产品包括计算机程序或指令。当计算机程序或指令被处理器执行时,致使处理器能够实现上述各实施例提供的数据处理方法的步骤或功能。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (14)

  1. 一种数据处理方法,适用于第一服务器,包括:
    监听到针对目标数据的日志读取事件时,确定是否具有发送第一时间日志的权限;其中,所述权限是按照所述目标数据在至少两个服务器间移动的移动轨迹被顺序给予的;
    具有所述权限时,获取所述目标数据的、时间戳大于或等于所述第一时间的至少一个日志;
    将所述至少一个日志发送至读取方。
  2. 根据权利要求1所述的方法,还包括:
    无所述发送权限时,申请发送权限;
    其中,所述发送权限是基于第二时间与所述第一时间的关系确定的;所述第二时间是第二服务器上所述目标数据对应的同步点位,所述同步点位反映所述第二服务器上第二时间之前的所述目标数据的日志已完成同步;所述第二服务器是通过所述目标数据在至少两个服务器间移动的移动轨迹得到。
  3. 根据权利要求1或2所述的方法,确定是否具有发送第一时间日志的权限,包括:
    若所述目标数据的第一时间日志为上线日志,则不具所述权限;
    若所述目标数据的第一时间日志不为上线日志,则存储空间内存有针对所述目标数据的权限信息时具有所述权限。
  4. 根据权利要求3所述的方法,还包括:
    若所述目标数据的第一时间日志为上线日志,则查询存储空间内是否存有针对所述目标数据的权限信息;
    存有所述权限信息时,删除所述权限信息。
  5. 根据权利要求2至4中任一项所述的方法,还包括:
    申请发送权限成功后,在存储空间内存储申请到针对所述目标数据的权限信息。
  6. 根据权利要求1至5中任一项所述的方法,还包括:
    将所述至少一个日志发送至读取方之后,根据所述至少一个日志的时间戳,确定所述目标数据的同步点位;
    向管理方发送所述目标数据的同步点位。
  7. 一种数据处理方法,包括:
    接收第一服务器针对目标数据发送的权限申请请求,所述权限申请请求携带有第一时间;
    获取所述目标数据在至少两个服务器间移动的移动轨迹;
    根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时间日志的权限。
  8. 根据权利要求7所述的方法,根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时间日志的权限,包括:
    根据移动轨迹,确定第二服务器;其中,所述目标数据从所述第二服务器移动至所述第一服务器;
    获取第二时间,其中,所述第二时间是所述第二服务器上所述目标数据对应的同步点位,所述同步点位反映了所述第二服务器上所述第二时间之前的所述目标数据的日志已完成同步;
    通过比较所述第一时间及所述第二时间的关系,确定是否给予所述第一服务器权限。
  9. 根据权利要求8所述的方法,通过比较所述第一时间及所述第二时间的关系,确定是否给予所述第一服务器发送权限,包括:
    所述第一时间大于或等于所述第二时间时,给予所述第一服务器所述权限。
  10. 根据权利要求8至9中任一项所述的方法,还包括:
    监听到所述目标数据由第二服务器移动至第一服务器的事件后,生成相应的轨迹项;
    将所述轨迹项添加至所述移动轨迹中。
  11. 一种数据处理系统,包括:第一服务器、第二服务器、读取方以及管理方,其中:
    第一服务器,用于监听到针对目标数据的日志读取事件时,确定是否具有发送第一时间日志的权限;具有所述权限时,获取所述目标数据的、时间戳大于或等于所述第一时间的至少一个日志;将所述至少一个日志发送至所述读取方;无所述发送权限时,向所述管理方申请发送权限;
    管理方,用于接收所述第一服务器针对目标数据发送的权限申请请求,其中,所述权限申请请求携带有第一时间;获取所述目标数据在至少两个服务器间移动的移动轨迹;根据所述第一时间及所述移动轨迹,确定是否给予所述第一服务器发送第一时 间日志的权限。
  12. 一种电子设备,包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、至少一段程序、代码集或指令集由所述处理器加载并执行以实现权利要求1至6中任一项,或权利要求7至10中任一项所述的数据处理方法。
  13. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至6中任一项,或权利要求7至10中任一项所述的数据处理方法。
  14. 一种计算机程序产品,包括计算机程序或指令,当所述计算机程序或指令被处理器执行时,致使所述处理器能够实现权利要求1至6中任一项所述方法中的步骤,或权利要求7至10中任一项所述的方法中的步骤。
PCT/CN2022/103200 2021-07-07 2022-06-30 数据处理方法、系统、电子设备及存储介质 WO2023280053A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110766115.1 2021-07-07
CN202110766115.1A CN113254460B (zh) 2021-07-07 2021-07-07 数据处理方法、系统、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023280053A1 true WO2023280053A1 (zh) 2023-01-12

Family

ID=77190884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/103200 WO2023280053A1 (zh) 2021-07-07 2022-06-30 数据处理方法、系统、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN113254460B (zh)
WO (1) WO2023280053A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093406A (zh) * 2023-10-18 2023-11-21 浙江印象软件有限公司 日志中心的维护方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254460B (zh) * 2021-07-07 2022-01-11 阿里云计算有限公司 数据处理方法、系统、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017041638A1 (zh) * 2015-09-08 2017-03-16 阿里巴巴集团控股有限公司 日志数据处理方法及装置
CN106663103A (zh) * 2014-06-18 2017-05-10 微软技术许可有限责任公司 使用逻辑文档日志的可扩展最终一致性系统
CN108304704A (zh) * 2018-02-07 2018-07-20 平安普惠企业管理有限公司 权限控制方法、装置、计算机设备和存储介质
CN108365971A (zh) * 2018-01-10 2018-08-03 深圳市金立通信设备有限公司 日志解析方法、设备及计算机可读介质
CN111597270A (zh) * 2020-05-22 2020-08-28 深圳前海微众银行股份有限公司 数据同步方法、装置、设备及计算机存储介质
CN111782416A (zh) * 2020-06-08 2020-10-16 Oppo广东移动通信有限公司 数据上报方法、装置、系统、终端及计算机可读存储介质
CN113254460A (zh) * 2021-07-07 2021-08-13 阿里云计算有限公司 数据处理方法、系统、电子设备及计算机程序产品

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184152B (zh) * 2015-10-13 2018-03-30 四川中科腾信科技有限公司 一种移动终端数据处理方法
CN107665219B (zh) * 2016-07-28 2021-01-29 华为技术有限公司 一种日志管理方法及装置
CN107103249A (zh) * 2017-02-21 2017-08-29 上海青橙实业有限公司 日志文件读写权限的设置方法、日志文件的读取方法
CN108089971B (zh) * 2017-11-27 2021-03-16 上海华元创信软件有限公司 基于嵌入式实时系统的日志服务方法和系统
CN109039782A (zh) * 2018-09-25 2018-12-18 郑州云海信息技术有限公司 一种集群日志记录方法及相关装置
US10664848B2 (en) * 2018-10-10 2020-05-26 Capital One Services, Llc Methods, mediums, and systems for document authorization
CN111258964A (zh) * 2018-12-03 2020-06-09 北京京东尚科信息技术有限公司 日志处理方法及装置、存储介质、电子设备
CN110502507B (zh) * 2019-08-29 2022-02-08 上海达梦数据库有限公司 一种分布式数据库的管理系统、方法、设备和存储介质
CN112000971B (zh) * 2020-08-21 2022-07-15 浪潮电子信息产业股份有限公司 一种文件权限记录方法、系统及相关装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106663103A (zh) * 2014-06-18 2017-05-10 微软技术许可有限责任公司 使用逻辑文档日志的可扩展最终一致性系统
WO2017041638A1 (zh) * 2015-09-08 2017-03-16 阿里巴巴集团控股有限公司 日志数据处理方法及装置
CN108365971A (zh) * 2018-01-10 2018-08-03 深圳市金立通信设备有限公司 日志解析方法、设备及计算机可读介质
CN108304704A (zh) * 2018-02-07 2018-07-20 平安普惠企业管理有限公司 权限控制方法、装置、计算机设备和存储介质
CN111597270A (zh) * 2020-05-22 2020-08-28 深圳前海微众银行股份有限公司 数据同步方法、装置、设备及计算机存储介质
CN111782416A (zh) * 2020-06-08 2020-10-16 Oppo广东移动通信有限公司 数据上报方法、装置、系统、终端及计算机可读存储介质
CN113254460A (zh) * 2021-07-07 2021-08-13 阿里云计算有限公司 数据处理方法、系统、电子设备及计算机程序产品

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093406A (zh) * 2023-10-18 2023-11-21 浙江印象软件有限公司 日志中心的维护方法及系统
CN117093406B (zh) * 2023-10-18 2024-02-09 浙江印象软件有限公司 日志中心的维护方法及系统

Also Published As

Publication number Publication date
CN113254460B (zh) 2022-01-11
CN113254460A (zh) 2021-08-13

Similar Documents

Publication Publication Date Title
US11397721B2 (en) Merging conflict resolution for multi-master distributed databases
US10078682B2 (en) Differentiated secondary index maintenance in log structured NoSQL data stores
WO2023280053A1 (zh) 数据处理方法、系统、电子设备及存储介质
US9460185B2 (en) Storage device selection for database partition replicas
RU2591169C2 (ru) Система управления базой данных
US10853242B2 (en) Deduplication and garbage collection across logical databases
US20130110873A1 (en) Method and system for data storage and management
CN105138571B (zh) 分布式文件系统及其存储海量小文件的方法
US11797491B2 (en) Inofile management and access control list file handle parity
US11449260B2 (en) Persistent hole reservation
US11797213B2 (en) Freeing and utilizing unused inodes
EP3788489B1 (en) Data replication in a distributed storage system
US11907261B2 (en) Timestamp consistency for synchronous replication
US10152493B1 (en) Dynamic ephemeral point-in-time snapshots for consistent reads to HDFS clients
JP2023541298A (ja) トランザクション処理方法、システム、装置、機器、及びプログラム
US20230259529A1 (en) Timestamp consistency for synchronous replication
EP3788501B1 (en) Data partitioning in a distributed storage system
Chaturvedi et al. FMS scheduling using goal-directed conceptual aggregation
CN117851359A (zh) 基于数据库集群的数据处理方法、装置和计算机设备
CN111782634A (zh) 数据分布式存储方法、装置、电子设备及存储介质
WO2016001482A1 (en) A method and system for database replication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22836804

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE