CN116841759A

CN116841759A - Data processing method, device, computer equipment and computer readable storage medium

Info

Publication number: CN116841759A
Application number: CN202210290199.0A
Authority: CN
Inventors: 朱锋; 秦江
Original assignee: Tencent Technology Chengdu Co Ltd
Current assignee: Tencent Technology Chengdu Co Ltd
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2023-10-03

Abstract

The application discloses a data processing method, a device, a computer device and a computer readable storage medium, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, and the method comprises the following steps: acquiring a plurality of messages from a message queue; synchronously storing a plurality of messages and the update time of each message into a first database, and adding the plurality of messages into a plurality of memory queues; in the process that the target data source consumes the messages from the memory queues, storing a first head time and a first tail time corresponding to each memory queue in the memory queues into a second database, wherein the first head time comprises the update time of the head message, the first tail time comprises the update time of the tail message, and the first database and the second database are used for recovering the messages in the memory queues from the target data source when a restarting event occurs. According to the application, the data synchronization efficiency can be improved, and the data loss can be prevented, so that the data security is improved.

Description

Data processing method, device, computer equipment and computer readable storage medium

Technical Field

The present application relates to the field of computer technology, and in particular, to a data processing method, a data processing apparatus, a computer device, and a computer readable storage medium.

Background

Message Queue (MQ) is a distributed system middleware designed based on first-in first-out data structure, and mainly solves the problems of application coupling, asynchronous Message, flow cutting and the like. At present, in a data synchronization scheme, data is usually put into a message queue, the message queue is stored in a local disk in a file mode by adopting a persistence technology, and the message queue is used for synchronizing the data to different data sources; however, on one hand, the read-write speed of the magnetic disk is slower, so that the data synchronization efficiency is lower; on the other hand, when the virtualized container technology is adopted for data synchronization, more and more services are independent of specific machines, and if machine reconstruction occurs, a host machine may be replaced, so that a backup disk file is lost.

Therefore, how to improve the data synchronization efficiency and the data security is a problem to be solved at present.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, computer equipment and a computer readable storage medium, which can improve the data synchronization efficiency and prevent data loss so as to improve the data security.

In one aspect, an embodiment of the present application provides a data processing method, where the method includes:

obtaining a plurality of messages from a message queue, the message queue comprising messages read from a database log file;

synchronously storing a plurality of messages and the update time of each message into a first database, and adding the plurality of messages into a plurality of memory queues;

storing a first head time and a first tail time corresponding to each memory queue in the plurality of memory queues into a second database in the process that the target data source consumes the messages from the plurality of memory queues, wherein the first head time comprises the update time of the head message, the first tail time comprises the update time of the tail message, and the first database and the second database are used for recovering the messages in the plurality of memory queues to the target data source when a restart event occurs.

In one aspect, an embodiment of the present application provides a data processing apparatus, including:

an acquisition unit configured to acquire a plurality of messages from a message queue, the message queue including messages read from a database log file;

the processing unit is used for synchronously storing a plurality of messages and the update time of each message into the first database and adding the plurality of messages into a plurality of memory queues;

The processing unit is further configured to store, in a second database, a first header time and a first tail time corresponding to each of the plurality of memory queues during consumption of the messages by the target data source from the plurality of memory queues, where the first header time includes an update time of the head-of-queue message, and the first tail time includes an update time of the tail-of-queue message, and the first database and the second database are configured to restore the messages in the plurality of memory queues to the target data source when a restart event occurs.

In one aspect, an embodiment of the present application provides a computer apparatus, where the computer apparatus includes a memory and a processor, and the memory stores a computer program, and when the computer program is executed by the processor, causes the processor to execute the data processing method described above.

In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program that, when read and executed by a processor of a computer device, causes the computer device to perform the above-described data processing method.

In one aspect, embodiments of the present application provide a computer program product, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the data processing method described above.

The embodiment of the application firstly obtains a plurality of messages from a message queue; synchronously storing the plurality of messages and the update time of each message to a first database, and adding the plurality of messages to a plurality of memory queues; and storing a first head time and a first tail time corresponding to each of the plurality of memory queues in a second database during consumption of the messages by the target data source from the plurality of memory queues, wherein the first head time comprises an update time of the head-of-line messages, and the first tail time comprises an update time of the tail-of-line messages. In addition, the first database and the second database are used for recovering the messages in the memory queues to the target data source when a restart event occurs, the first database is used for backing up the messages and the update time of each message, the second database is used for storing the first head time and the first tail time corresponding to each memory queue, and when the restart event occurs, the messages in the memory queues can be recovered by using the data stored in the first database and the second database, so that the data synchronization efficiency can be improved, the data loss can be prevented, and the data security can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 3 is a flowchart of another data processing method according to an embodiment of the present application;

FIG. 4 is a flowchart of another data processing method according to an embodiment of the present application;

FIG. 5 is a flowchart of another data processing method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the descriptions of "first," "second," and the like in the embodiments of the present application are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a technical feature defining "first", "second" may include at least one such feature, either explicitly or implicitly.

The embodiment of the application relates to Cloud technology (Cloud technology), which is a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is a generic term of network technology, information technology, integration technology, management platform technology, application technology and the like based on cloud computing business model application, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and a distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network to work cooperatively through application software or application interfaces through functions such as cluster application, grid technology, and a distributed storage file system, so as to provide data storage and service access functions for the outside. At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data identifier (ID entity), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can access the data according to the storage location information of each object. The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided into stripes in advance according to the group of capacity measures for objects stored on a logical volume (which measures tend to have a large margin with respect to the capacity of the object actually to be stored) and redundant array of independent disks (Redundant Array of Independent Disk, RAID), and a logical volume can be understood as a stripe, whereby physical storage space is allocated to a logical volume.

The Database (Database), which can be considered as an electronic filing cabinet, is a place for storing electronic files, and users can perform operations such as adding, inquiring, updating, deleting and the like on the data in the files. A "database" is a collection of data stored together in a manner that can be shared with multiple users, with as little redundancy as possible, independent of the application. The database management system (Database Management System, DBMS) is a computer software system designed for managing databases, and generally has basic functions of storage, interception, security, backup, and the like. The database management system may classify according to the database model it supports, e.g., relational, extensible markup language (Extensible Markup Language, XML); or by the type of computer supported, e.g., server cluster, mobile phone; or by classification according to the query language used, such as structured query language (Structured Query Language, SQL); or by performance impact emphasis, such as maximum scale, maximum speed of operation; or other classification schemes. Regardless of the manner of classification used, some DBMSs are able to support multiple query languages across categories, for example, simultaneously. The key-value (kv) database is a database storing data in key value pairs, and each key corresponds to a unique value. For example, the remote dictionary service (remote dictionary Server, redis) database, which is an open-source, memory-based, high-performance, data-persistable key-value storage system, is commonly referred to as database, cache, and message middleware.

Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which needs a new processing mode to have stronger decision-making ability, insight discovery ability and flow optimization ability. With the advent of the cloud age, big data has attracted more and more attention, and special techniques are required for big data to effectively process a large amount of data within a tolerant elapsed time. Technologies applicable to big data include massively parallel processing databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the internet, and scalable storage systems.

Based on the cloud technology mentioned above, the embodiment of the application provides a data processing method to ensure the security of data. Specifically, the general principle of the data processing method is as follows: firstly, acquiring a plurality of messages from a message queue; synchronously storing the plurality of messages and the update time of each message to a first database, and adding the plurality of messages to a plurality of memory queues; and storing a first head time and a first tail time corresponding to each of the plurality of memory queues in a second database during consumption of the messages by the target data source from the plurality of memory queues, wherein the first head time comprises an update time of the head-of-line messages, and the first tail time comprises an update time of the tail-of-line messages. In addition, the first database and the second database are used for recovering the messages in the plurality of memory queues to the target data source when a restart event occurs. The term "consuming" refers to that a target data source fetches a message from a memory queue and performs corresponding data logic processing.

In a specific implementation, the above mentioned data processing method may be performed by a computer device, which may be a terminal device or a server. The terminal device may be, for example, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, an aircraft, etc., but is not limited thereto; the server may be, for example, a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery servers (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.

In particular, the above mentioned data processing method may be performed by a server. See, for example, fig. 1: the server 101 acquires a plurality of messages from the message queues, then synchronously stores the plurality of messages and the update time of each message into the first database, and adds the plurality of messages into the plurality of memory queues; and storing the first head time and the first tail time corresponding to each of the plurality of memory queues in the second database during the process that the synchronous data terminal 102 consumes the messages from the plurality of memory queues. The synchronous data terminal 102 may be considered as a target data source, where the target data source may refer to the above-mentioned server, or may be a terminal device, and may specifically correspond to each service platform or different functional departments of an enterprise, and may consume a message from the server 101, so as to implement data synchronization.

According to the embodiment of the application, the first database is utilized to backup a plurality of messages and the update time of each message, the second database is utilized to store the first head time and the first tail time corresponding to each memory queue, and when a restarting event occurs, the messages in the plurality of memory queues can be recovered by utilizing the data stored in the first database and the second database, so that the data synchronization efficiency can be improved, the data loss can be prevented, and the data security is improved.

It may be understood that the schematic diagram of the system architecture described in the embodiment of the present application is for more clearly describing the technical solution of the embodiment of the present application, and does not constitute a limitation on the technical solution provided by the embodiment of the present application, and those skilled in the art can know that, with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided by the embodiment of the present application is equally applicable to similar technical problems.

Based on the foregoing, a data processing method according to an embodiment of the present application is further described below with reference to the flowchart shown in fig. 2. In the embodiments of the present application, the above-mentioned computer device is mainly used as an example to execute the data processing method. Referring to fig. 2, the data processing method specifically includes steps S201 to S203:

S201, acquiring a plurality of messages from a message queue.

In an embodiment of the present application, the message queue includes messages that are read from a database log file. The Message Queue (MQ) is a distributed system middleware designed based on a first-in first-out data structure, and mainly solves the problems of application coupling, asynchronous Message, flow cutting and the like. The computer device reads the plurality of messages from the database log file and then adds the plurality of messages to the message queue from which the plurality of messages can be subsequently retrieved directly. The database log file comprises a plurality of messages and an update time of each message, and is used for triggering the change of other data sources; the message queue adopts a single partition mode in order to ensure the sequence, and can be applied to various services, such as message distribution service, collaborative data recording service and the like.

S202, synchronously storing a plurality of messages and the update time of each message into a first database, and adding the plurality of messages into a plurality of memory queues.

In the embodiment of the application, the memory queue can be regarded as a container for storing the messages, and the memory queue adopts a first-in first-out data structure and is fetched from the container when the messages need to be used. Wherein each message has an update time. The computer equipment can perform collaborative data recording service through a message queue, wherein the collaborative data recording service refers to synchronously storing a plurality of messages and the update time of each message into a first database, so that the backup of the plurality of messages and the update time of each message is realized; and a message distribution service may be performed through the message queues, where the message distribution service refers to adding a plurality of messages to the plurality of memory queues such that the target data source consumes messages from the plurality of memory queues. Specifically, the computer device may perform modulo or hash computation on the primary key identifier of each message, and add the messages with the same primary key identifier to the same memory queue. It should be appreciated that the computer device performs the collaborative data recording service and the message distribution service in a synchronized manner, i.e., stores each message distributed and the update time of each message with the first database while the message distribution service is performed. By the mode, the data synchronization efficiency can be improved, data loss can be prevented, and therefore the data safety is improved.

It should be noted that the first database is a database for temporarily storing messages, and that each message can be guaranteed at least once (at-least-once) by using the transaction characteristics of the first database, i.e. each message can be synchronized at least once. When the amount of data stored in the first database is large, expired messages may be deleted at regular intervals. For example, at 12 pm, the computer device may delete messages from the first database that have been confirmed to be synchronized to the target data source; or the computer device may delete the earliest 100 messages stored from the first database.

In one possible implementation, the computer device adds a plurality of messages to a plurality of memory queues, specifically implemented as: determining a target memory queue from the plurality of memory queues based on the primary key identification of any message for any message in the plurality of messages; any message is added to the target memory queue. After the computer device obtains a plurality of messages from the message queue, the computer device performs modulo or hash calculation on the primary key identifier, and adds each message to the corresponding memory queue.

Illustratively, messages in the message queue are message a, message B, and message C in that order from first to last. The primary key of message A is identified as 1, the primary key of message B is identified as 2, the primary key of message C is identified as 1, and the primary key of message D is identified as 2. Assuming that a message with a primary key identification of 1 is added to the memory queue a, a message with a primary key identification of 2 is added to the memory queue b. According to the main key identification of each message, adding the message A into the memory queue a, adding the message B into the memory queue B, adding the message C into the memory queue a, and adding the message D into the memory queue B. Therefore, the messages in the memory queue a are the message A and the message C in sequence from first to last, and the messages in the memory queue B are the message B and the message D in sequence from first to last.

S203, storing the first head time and the first tail time corresponding to each memory queue in the memory queues into a second database in the process that the target data source consumes the messages from the memory queues.

In an embodiment of the present application, the first header time includes an update time of the head-of-line message, and the first tail time includes an update time of the tail-of-line message. The first database and the second database are used for restoring the messages in the memory queues to the target data source when a restart event occurs. The first database is used for storing a plurality of messages and the update time of each message, and the second database is used for storing the first head time and the first tail time corresponding to each memory queue in the plurality of memory queues. The first database can be a temporary database, and a plurality of messages and the updating time of each message are backed up, so that the data security is ensured; the second database only needs to store the first head time and the first tail time corresponding to each memory queue, so that the storage pressure of the second database is reduced. The target data source may consume messages from multiple memory queues, each with a separate thread synchronized to the target data source. For example, if a restart event (such as a power-off restart or machine rebuilding) occurs, the first header time and the first tail time corresponding to each memory queue may be obtained from the second database, and by using the first header time and the first tail time corresponding to each memory queue and combining the plurality of messages stored in the first database and the update time of each message, the computer device may obtain the message corresponding to each memory queue from the first database, so as to implement recovery of the messages in the plurality of memory queues, thereby ensuring security of data.

In one possible implementation manner, the computer device stores the first head time and the first tail time corresponding to each memory queue in the plurality of memory queues into the second database, where the specific implementation manner is as follows: acquiring a head message and a tail message of each memory queue in a plurality of memory queues; determining a first head time corresponding to each memory queue according to the update time of the queue head message, and determining a first tail time corresponding to each memory queue according to the update time of the queue tail message; and storing the first head time and the first tail time corresponding to each memory queue into a second database based on the queue identifier, the head identifier and the tail identifier of each memory queue.

It should be noted that the second database may be a redis database, which is a database storing data in key value pairs. The computer device may use the queue identifier and the head identifier of each memory queue as a key field (key) corresponding to the first head time, and use the update time of the queue head message as a value field (value) corresponding to the first head time; the queue identifier and the tail identifier of each memory queue may be used as key fields corresponding to the first tail time, and the update time of the tail message may be used as a value field corresponding to the first tail time.

In one possible implementation manner, the computer device stores the first head time and the first tail time corresponding to each memory queue into the second database based on the queue identifier, the head identifier and the tail identifier of each memory queue, and the specific implementation manner is as follows: acquiring a second head time corresponding to each memory queue stored in a second database based on the queue identifier and the head identifier of each memory queue; and if the first head time corresponding to each memory queue is different from the second head time, storing the first head time corresponding to each memory queue into a second database.

It should be noted that, before the computer device stores the first header time corresponding to each memory queue in the second database, the computer device needs to acquire the second header time corresponding to each memory queue already stored in the second database, and if the first header time is different from the second header time, the computer device stores the first header time corresponding to each memory queue in the second database; if the first head time is the same as the second head time, the first head time corresponding to each memory queue is not stored in the second database. That is, when the first head time of the memory queue is the same, only one record is needed, and when the message is recovered, all the messages in the update time interval stored in the second database are recovered, so that the characteristic of at least one time of the message is ensured. In the comparison of the first head time and the second head time, the second may be the minimum time unit, the minute may be the minimum time unit, or the millisecond may be the minimum time unit, which is not limited herein. By the mode, the reading and writing times of the computer equipment can be reduced, and the data processing efficiency is improved.

For example, in the minimum time unit of minutes, the first head time is 9 points, 10 minutes and 20 seconds, the second head time is 9 points, 10 minutes and 45 seconds, the first head time and the second head time are compared, the first head time and the second head time are 9 points, 10 minutes, respectively, and the first head time and the second head time are the same, so that the first head time does not need to be stored in the second database.

For another example, in the minimum time unit of minutes, the first head time is 9 points 10 minutes 20 seconds, the second head time is 9 points 15 minutes 45 seconds, the first head time is 9 points 10 minutes corresponding to the second head time, the second head time is 9 points 15 minutes corresponding to the first head time, and the first head time is different from the second head time, so that the first head time needs to be stored in the second database.

In one possible implementation, the method further includes: acquiring a second tail time corresponding to each memory queue stored in a second database based on the queue identifier and the tail identifier of each memory queue; and if the first tail time and the second tail time corresponding to each memory queue are different, storing the first tail time corresponding to each memory queue into a second database. Similarly, before the computer device stores the first tail time corresponding to each memory queue in the second database, the computer device needs to acquire the second tail time corresponding to each memory queue already stored in the second database, and if the first tail time is different from the second tail time, the computer device stores the first tail time corresponding to each memory queue in the second database; and if the first tail time is the same as the second tail time, the first tail time corresponding to each memory queue is not stored in the second database. That is, when the first tail time of the memory queue is the same, only one record is needed, and when the message is recovered, all the messages in the update time interval stored in the second database are recovered, so that the characteristic of at least one time of the message is ensured. In the comparison of the first tail time and the second tail time, the second may be the minimum time unit, the minute may be the minimum time unit, or the millisecond may be the minimum time unit, which is not limited herein. By the mode, the reading and writing times of the computer equipment can be reduced, and the data processing efficiency is improved.

In summary, in the embodiment of the present application, a plurality of messages are first acquired from a message queue; synchronously storing the plurality of messages and the update time of each message to a first database, and adding the plurality of messages to a plurality of memory queues; and storing a first head time and a first tail time corresponding to each of the plurality of memory queues in a second database during consumption of the messages by the target data source from the plurality of memory queues, wherein the first head time comprises an update time of the head-of-line messages, and the first tail time comprises an update time of the tail-of-line messages. In addition, the first database and the second database are used for recovering the messages in the memory queues to the target data source when a restart event occurs, the first database is used for backing up the messages and the update time of each message, the second database is used for storing the first head time and the first tail time corresponding to each memory queue, and when the restart event occurs, the messages in the memory queues can be recovered by using the data stored in the first database and the second database, so that the data synchronization efficiency can be improved, the data loss can be prevented, and the data security is improved.

The data processing method according to the embodiment of the present application is further described below with reference to the flowchart shown in fig. 3. In the embodiments of the present application, the above-mentioned computer device is mainly used as an example to execute the data processing method. Referring to fig. 3, the data processing method specifically may include steps S301 to S306.

Wherein:

s301, acquiring a plurality of messages from a message queue.

S302, synchronously storing a plurality of messages and the update time of each message into a first database, and adding the plurality of messages into a plurality of memory queues.

S303, storing the first head time and the first tail time corresponding to each memory queue in the memory queues into a second database in the process that the target data source consumes the messages from the memory queues.

The specific implementation manner of steps S301 to S303 may refer to the specific implementation manner of steps S201 to S203, and will not be described herein.

S304, if a restarting event occurs, acquiring a first head time and a first tail time corresponding to each memory queue from a second database.

In the embodiment of the application, when a restart event (such as power-off restart or machine rebuilding) occurs in the computer device, the message in the memory queues is lost, so that the data synchronization is affected, and therefore, the message corresponding to each memory queue before the recovery to the target data source is required. First, the computer device obtains the first head time and the first tail time corresponding to each memory queue from the second database, that is, obtains the update time interval corresponding to each memory queue, and then needs to recover all the messages in the update time interval.

S305, acquiring the information corresponding to each memory queue from a first database according to the first head time and the first tail time corresponding to each memory queue.

In the embodiment of the application, after the computer equipment acquires the first head time and the first tail time corresponding to each memory queue from the second database, the computer equipment can acquire the message corresponding to each memory queue from the first database by combining a plurality of messages stored in the first database and the update time of each message by using the first head time and the first tail time corresponding to each memory queue.

In one possible implementation manner, the computer device obtains, from the first database, a message corresponding to each memory queue according to the first head time and the first tail time corresponding to each memory queue, where the specific implementation manner is that: determining a target time interval corresponding to each memory queue based on the first head time and the first tail time corresponding to each memory queue; determining a message with corresponding update time within the target time interval from the messages stored in the first database; and taking the message with the corresponding update time in the target time interval as the message corresponding to each memory queue. It should be noted that, the computer device determines, by using the second database, a target time interval corresponding to each memory queue, and searches, in the first database, a message whose corresponding update time is located in the target time interval, where the message whose corresponding update time is located in the target time interval is a message corresponding to each memory queue, so as to obtain a message corresponding to each memory queue.

For example, assuming that the first head time corresponding to the memory queue a stored in the second database is 1 point, and the first tail time corresponding to the memory queue a is 2 points, the target time interval corresponding to the memory queue a is 1 point to 2 points. The update time corresponding to the message A stored in the first database is 1 point and 20 minutes, the update time corresponding to the message B is 1 point and 40 minutes, the update time corresponding to the message C is 2 points, and the update time corresponding to the message D is 2 points and 40 minutes. Therefore, the messages with the corresponding update time within the target time interval include message a and message B, that is, the message a and the message B are the messages corresponding to the memory queue a.

S306, restoring the message corresponding to each memory queue to the target data source.

In the embodiment of the application, after the computer equipment determines the information corresponding to each memory queue, the information corresponding to each memory queue is restored to the target data source, so that the restoration of the whole memory data is realized, and the subsequent computer equipment can continue to consume the information from the memory queue normally, thereby avoiding the influence caused by restarting events and ensuring the safety of the data.

In one possible implementation manner, the computer device restores the message corresponding to each memory queue to the target data source, where the specific implementation manner is: sending the message corresponding to each memory queue to a target data source, so that the target data source consumes the message corresponding to each memory queue; or adding the message corresponding to each memory queue, so that the target data source consumes the message from each memory queue. It should be noted that, the computer device may directly send the message corresponding to each memory queue determined from the first database to the target data source for consumption, or may add the message corresponding to each memory queue determined from the first database to each memory queue, and the target data source consumes the message from each memory queue, which is not limited herein.

In general, the data processing method described above can be generalized into the following five parts: as shown in fig. 4, the first part is a database log file (DB Binlog) 401, the database log file 401 including a plurality of messages and an update time of each message for triggering changes of other data sources; the second part is a message queue 402, the message included in the database log file 401 is added into the message queue 402, the message queue 402 is used as a middleware of a distributed system, and in order to ensure that a single partition mode is adopted in sequence, a message distribution service, a collaborative data recording service and the like can be provided; the third part is a message distribution service (reading message queue data 403), a plurality of messages are added to a plurality of memory queues, the downstream synchronized target data source 406 consumes the messages from the plurality of memory queues, and the consumption speed can be improved by using the memory queues; the fourth part is collaborative data recording service, and uses the first database 404 as a consumer of the message queue to synchronously store a plurality of messages and the update time of each message, so as to realize the backup of the plurality of messages; the fifth part is stored by the second database 405, where the second database 405 may be a redis database for storing a first head time and a first tail time corresponding to each queue. The first header time here includes an update time of the head-of-line message, and the first trailer time here includes an update time of the tail-of-line message. When a restart event occurs, the second database 405 may be utilized to obtain a first header time and a first trailer time corresponding to each queue, and in combination with the plurality of messages stored in the first database 404 and the update time of each message, the messages in the plurality of memory queues may be restored to the downstream synchronized target data source 406.

In one embodiment, the head time may be denoted as Headtime and the tail time may be denoted as Backtime.

Specifically, the data processing method can be summarized as the following steps: as shown in fig. 5, step 1 is to write a plurality of messages in a database log file (DB Binlog) into a Message Queue (MQ), and each piece of written data must take an update time, which is finally converted into a first head time and a first tail time, for identifying a time range of a memory queue.

Step 2 is a consuming process (Message distribution service) that obtains a single partitioned Message (Message) from a Message queue, that is, adds multiple messages to multiple memory queues. Wherein each message corresponds to an update time.

And 3, the message distributing service reads the messages in the message queues, adds each message into different memory queues according to the primary key identification, enables the messages with the same primary key identification to fall into the same memory queue in the mode, guarantees the sequence of each message, simultaneously takes the queue identification and the tail identification of each queue as key fields corresponding to the first tail time of each queue, takes the update time of the tail message of each queue as a value field corresponding to the first tail time, and stores the data by using a second database. The stored second tail time and the first tail time can be compared, and the second database storage can be not called any more at the same time, so that the read-write times of the computer equipment are reduced, and the data processing efficiency is improved.

And 4, using the queue identifier and the head identifier of each memory queue as key fields corresponding to the first head time of each queue, using the update time of the queue head information of each queue as a value field corresponding to the first head time of each queue, and storing the data by using a second database. The stored second head time and the first head time can be compared, and the second database storage can be not called any more at the same time, so that the read-write times of the computer equipment are reduced, and the data processing efficiency is improved.

Step 5 is a collaborative data recording service, namely, synchronously storing a plurality of messages and the update time of each message into a first database, ensuring each message to be at least once by utilizing the transaction characteristic of the database, wherein the first database is a database for temporarily storing the messages, and deleting the expiration data at regular time when the data volume is large.

And 6, when a restart event occurs due to the abnormality of the message distribution service, first reading the first head time and the first tail time corresponding to all the memory queues in the second database, determining a target time interval by using the first head time and the first tail time, and acquiring the message with the corresponding update time in the target time interval, namely the message corresponding to each memory queue, from the first database.

And 7, restoring the corresponding information of each memory queue to the target data source, so that the whole memory data can be restored, and the subsequent computer equipment can continue to consume the information from the memory queues normally, thereby avoiding the influence caused by restarting the event and ensuring the safety of the data.

In summary, in the embodiment of the present application, a plurality of messages are first acquired from a message queue; synchronously storing the plurality of messages and the update time of each message to a first database, and adding the plurality of messages to a plurality of memory queues; and storing a first head time and a first tail time corresponding to each of the plurality of memory queues in a second database during consumption of the messages by the target data source from the plurality of memory queues, wherein the first head time comprises an update time of the head-of-line messages, and the first tail time comprises an update time of the tail-of-line messages. If a restarting event occurs, acquiring a first head time and a first tail time corresponding to each memory queue from a second database; and then according to the first head time and the first tail time corresponding to each memory queue, acquiring the information corresponding to each memory queue from a first database, and recovering the information corresponding to each memory queue to a target data source. The first database is used for backing up a plurality of messages and the update time of each message, the second database is used for storing the first head time and the first tail time corresponding to each memory queue, and when a restarting event occurs, the messages in the plurality of memory queues can be recovered by using the data stored in the first database and the second database, so that the data synchronization efficiency can be improved, the data loss can be prevented, and the data safety is improved.

Based on the data processing method, the embodiment of the application provides a data processing device. Referring to fig. 6, a schematic structural diagram of a data processing apparatus according to an embodiment of the present application is shown, and the data processing apparatus 600 may operate as follows:

an obtaining unit 601, configured to obtain a plurality of messages from a message queue, where the message queue includes messages read from a database log file;

a processing unit 602, configured to store a plurality of messages and update time of each message in synchronization to a first database, and add the plurality of messages to a plurality of memory queues;

the processing unit 602 is further configured to store, in a process that the target data source consumes the messages from the plurality of memory queues, a first header time and a first tail time corresponding to each of the plurality of memory queues, where the first header time includes an update time of the head-of-queue message, and the first tail time includes an update time of the tail-of-queue message, where the first database and the second database are configured to restore the messages in the plurality of memory queues to the target data source when a restart event occurs.

In one embodiment, the processing unit 602 is further configured to: if a restarting event occurs, acquiring a first head time and a first tail time corresponding to each memory queue from a second database; acquiring a message corresponding to each memory queue from a first database according to the first head time and the first tail time corresponding to each memory queue; and restoring the message corresponding to each memory queue to the target data source.

In another embodiment, the processing unit 602, when obtaining, from the first database, the message corresponding to each memory queue according to the first head time and the first tail time corresponding to each memory queue, may be specifically configured to: determining a target time interval corresponding to each memory queue based on the first head time and the first tail time corresponding to each memory queue; determining a message with corresponding update time within the target time interval from the messages stored in the first database; and taking the message with the corresponding update time in the target time interval as the message corresponding to each memory queue.

In another embodiment, the processing unit 602, when adding a plurality of messages to a plurality of memory queues, may be specifically configured to: determining a target memory queue from the plurality of memory queues based on the primary key identification of any message for any message in the plurality of messages; any message is added to the target memory queue.

In another embodiment, the processing unit 602, when storing the first head time and the first tail time corresponding to each of the plurality of memory queues in the second database, may be specifically configured to: acquiring a head message and a tail message of each memory queue in a plurality of memory queues; determining a first head time corresponding to each memory queue according to the update time of the queue head message, and determining a first tail time corresponding to each memory queue according to the update time of the queue tail message; and storing the first head time and the first tail time corresponding to each memory queue into a second database based on the queue identifier, the head identifier and the tail identifier of each memory queue.

In another embodiment, when storing the first head time and the first tail time corresponding to each memory queue in the second database based on the queue identifier, the head identifier, and the tail identifier of each memory queue, the processing unit 602 may be specifically configured to: acquiring a second head time corresponding to each memory queue stored in a second database based on the queue identifier and the head identifier of each memory queue; and if the first head time corresponding to each memory queue is different from the second head time, storing the first head time corresponding to each memory queue into a second database.

In another embodiment, the processing unit 602, when restoring the message corresponding to each memory queue to the target data source, may be specifically configured to: sending the message corresponding to each memory queue to the target data source so that the target data source consumes the message corresponding to each memory queue; or adding the message corresponding to each memory queue, so that the target data source consumes the message from each memory queue.

According to another embodiment of the present application, each unit in the data processing apparatus shown in fig. 6 may be separately or completely combined into one or several other units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the application, the data processing apparatus may also comprise other units, and in practical applications, these functions may also be realized with the assistance of other units, and may be realized by cooperation of a plurality of units.

According to another embodiment of the present application, a data processing apparatus as shown in fig. 6 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods as shown in fig. 2 or 3 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and implementing the data processing method of the embodiment of the present application. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into and executed by the above-described computing device via the computer-readable recording medium.

In the embodiment of the application, a plurality of messages are firstly acquired from a message queue; synchronously storing the plurality of messages and the update time of each message to a first database, and adding the plurality of messages to a plurality of memory queues; and storing a first head time and a first tail time corresponding to each of the plurality of memory queues in a second database during consumption of the messages by the target data source from the plurality of memory queues, wherein the first head time comprises an update time of the head-of-line messages, and the first tail time comprises an update time of the tail-of-line messages. In addition, the first database and the second database are used for recovering the messages in the memory queues to the target data source when a restart event occurs, the first database is used for backing up the messages and the update time of each message, the second database is used for storing the first head time and the first tail time corresponding to each memory queue, and when the restart event occurs, the messages in the memory queues can be recovered by using the data stored in the first database and the second database, so that the data synchronization efficiency can be improved, the data loss can be prevented, and the data security is improved.

Based on the description of the method embodiment and the device embodiment, the embodiment of the application also provides a computer device. Referring to fig. 7, the computer device 700 includes at least a processor 701, a communication interface 702, and a computer storage medium 703. Wherein the processor 701, the communication interface 702, and the computer storage medium 703 may be connected by a bus or other means. The computer storage medium 703 may be stored in the memory 704 of the computer device 700, said computer storage medium 703 being adapted to store a computer program comprising program instructions, said processor 701 being adapted to execute the program instructions stored by said computer storage medium 703. The processor 701, or CPU (Central Processing Unit ), is a computing core as well as a control core of a computer device, which is adapted to implement one or more instructions, in particular to load and execute one or more instructions to implement a corresponding method flow or a corresponding function.

In one embodiment, the processor 701 according to the embodiments of the present application may be configured to perform a series of data processing, specifically including: obtaining a plurality of messages from a message queue, the message queue comprising messages read from a database log file; synchronously storing a plurality of messages and the update time of each message into a first database, and adding the plurality of messages into a plurality of memory queues; storing a first head time and a first tail time corresponding to each memory queue in the plurality of memory queues in a second database during consumption of messages by the target data source from the plurality of memory queues, wherein the first head time comprises an update time of a head message, the first tail time comprises an update time of a tail message, the first database and the second database are used for recovering the messages in the plurality of memory queues from the target data source when a restart event occurs, and the like.

The embodiment of the application also provides a computer storage medium (Memory), which is a Memory device in the computer device and is used for storing programs and data. It is understood that the computer storage media herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer storage media provides storage space that stores an operating system of the computer device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor 701. The computer storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory; alternatively, it may be at least one computer storage medium located remotely from the aforementioned processor.

In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by a processor to implement the corresponding steps of the methods described above in connection with the data processing method embodiments illustrated in FIG. 2 or FIG. 3; in particular implementations, one or more instructions in a computer storage medium are loaded by processor 701 and perform the steps of:

In one embodiment, the one or more instructions are loadable by the processor and further executable to: if a restarting event occurs, acquiring a first head time and a first tail time corresponding to each memory queue from a second database; acquiring a message corresponding to each memory queue from a first database according to the first head time and the first tail time corresponding to each memory queue; and restoring the message corresponding to each memory queue to the target data source.

In another embodiment, the one or more instructions may be loaded and executed by the processor when the message corresponding to each memory queue is obtained from the first database according to the first head time and the first tail time corresponding to each memory queue: determining a target time interval corresponding to each memory queue based on the first head time and the first tail time corresponding to each memory queue; determining a message with corresponding update time within the target time interval from the messages stored in the first database; and taking the message with the corresponding update time in the target time interval as the message corresponding to each memory queue.

In another embodiment, the one or more instructions may be loaded and executed in particular by the processor when adding a plurality of messages to a plurality of memory queues: determining a target memory queue from the plurality of memory queues based on the primary key identification of any message for any message in the plurality of messages; any message is added to the target memory queue.

In another embodiment, the one or more instructions may be loaded and executed by the processor when storing a first head time and a first tail time corresponding to each of the plurality of memory queues in the second database: acquiring a head message and a tail message of each memory queue in a plurality of memory queues; determining a first head time corresponding to each memory queue according to the update time of the queue head message, and determining a first tail time corresponding to each memory queue according to the update time of the queue tail message; and storing the first head time and the first tail time corresponding to each memory queue into a second database based on the queue identifier, the head identifier and the tail identifier of each memory queue.

In another embodiment, when the first head time and the first tail time corresponding to each memory queue are stored in the second database based on the queue identifier, the head identifier and the tail identifier of each memory queue, the one or more instructions may be loaded and executed by the processor: acquiring a second head time corresponding to each memory queue stored in a second database based on the queue identifier and the head identifier of each memory queue; and if the first head time corresponding to each memory queue is different from the second head time, storing the first head time corresponding to each memory queue into a second database.

In another embodiment, the one or more instructions may be loaded and executed by the processor in recovering the message corresponding to each memory queue to the target data source: sending the message corresponding to each memory queue to the target data source so that the target data source consumes the message corresponding to each memory queue; or adding the message corresponding to each memory queue, so that the target data source consumes the message from each memory queue.

It should be noted that according to an aspect of the present application, there is also provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative ways of the data processing method embodiments aspects shown in fig. 2 or fig. 3, described above. It is also to be understood that the foregoing is merely illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. A method of data processing, comprising:

obtaining a plurality of messages from a message queue, wherein the message queue comprises messages read from a database log file;

synchronously storing the plurality of messages and the update time of each message to a first database, and adding the plurality of messages to a plurality of memory queues;

Storing a first head time and a first tail time corresponding to each memory queue in the plurality of memory queues into a second database in the process that a target data source consumes messages from the plurality of memory queues, wherein the first head time comprises the update time of the head messages, the first tail time comprises the update time of the tail messages, and the first database and the second database are used for recovering the messages in the plurality of memory queues to the target data source when a restart event occurs.

2. The method according to claim 1, wherein the method further comprises:

if a restarting event occurs, acquiring a first head time and a first tail time corresponding to each memory queue from the second database;

acquiring a message corresponding to each memory queue from the first database according to the first head time and the first tail time corresponding to each memory queue;

and restoring the message corresponding to each memory queue to the target data source.

3. The method of claim 2, wherein the obtaining the message corresponding to each memory queue from the first database according to the first head time and the first tail time corresponding to each memory queue comprises:

Determining a target time interval corresponding to each memory queue based on the first head time and the first tail time corresponding to each memory queue;

determining a message with corresponding update time within the target time interval from the messages stored in the first database;

and taking the message with the corresponding update time within the target time interval as the message corresponding to each memory queue.

4. A method according to any one of claims 1 to 3, wherein said adding said plurality of messages to a plurality of memory queues comprises:

determining a target memory queue from a plurality of memory queues based on a primary key identification of any one of the plurality of messages for the any one of the plurality of messages;

and adding any message to the target memory queue.

5. A method according to any one of claims 1 to 3, wherein storing the first head time and the first tail time corresponding to each of the plurality of memory queues in the second database comprises:

acquiring a head message and a tail message of each memory queue in the plurality of memory queues;

determining a first head time corresponding to each memory queue according to the update time of the queue head message, and determining a first tail time corresponding to each memory queue according to the update time of the queue tail message;

And storing the first head time and the first tail time corresponding to each memory queue into a second database based on the queue identifier, the head identifier and the tail identifier of each memory queue.

6. The method of claim 5, wherein storing the first head time and the first tail time corresponding to each memory queue in the second database based on the queue identifier, the head identifier, and the tail identifier of each memory queue comprises:

acquiring a second head time corresponding to each memory queue stored in a second database based on the queue identifier and the head identifier of each memory queue;

and if the first head time corresponding to each memory queue is different from the second head time, storing the first head time corresponding to each memory queue into the second database.

7. The method of claim 2, wherein the retrieving the message corresponding to each memory queue to the target data source comprises:

sending the message corresponding to each memory queue to the target data source, so that the target data source consumes the message corresponding to each memory queue;

Or adding the message corresponding to each memory queue, so that the target data source consumes the message from each memory queue.

8. A data processing apparatus, the apparatus comprising:

an obtaining unit, configured to obtain a plurality of messages from a message queue, where the message queue includes a message read from a database log file;

the processing unit is used for synchronously storing the plurality of messages and the update time of each message into the first database and adding the plurality of messages into a plurality of memory queues;

the processing unit is further configured to store, in a process that the target data source consumes the messages from the plurality of memory queues, a first header time and a first tail time corresponding to each memory queue in the plurality of memory queues to a second database, where the first header time includes an update time of a queue head message, the first tail time includes an update time of a queue tail message, and the first database and the second database are configured to restore the messages in the plurality of memory queues to the target data source when a restart event occurs.

9. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the data processing method according to any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more computer programs adapted to be loaded by a processor and to perform a data processing method according to any of claims 1-7.