CN110704000A

CN110704000A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN110704000A
Application number: CN201910960517.8A
Authority: CN
Inventors: 李博洋; 杨波
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2020-01-17
Anticipated expiration: 2039-10-10
Also published as: CN110704000B

Abstract

The embodiment of the disclosure provides a data processing method and device, electronic equipment and a storage medium. The method comprises the following steps: the method comprises the steps that data in a database are incrementally loaded through a first data node, the first data node is configured to write memory data into an external storage, the memory of the first data node is used for maintaining data id and index data, and the external storage is configured to maintain full data; index data is received from the first data node by the second data node and data is read from the external memory. The data processing method disclosed by the invention eliminates the bottleneck problem of the machine memory, so that the data service can be infinitely expanded along with the increase of the service nodes.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, and more particularly, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

The existing data processing architecture and method are limited by a service machine memory, and a memory bottleneck exists. Therefore, it is urgently needed to remove the memory bottleneck and have the service lateral expansion capability.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In order to solve the problems, the present disclosure provides a data processing method, an apparatus, an electronic device, and a storage medium, the present disclosure solves the problem of a bottleneck of a memory by introducing memory data into an external storage, and the memory only maintains some data id or index data, and the external storage can be infinitely laterally extended, so that the service capability of the data is greatly enhanced.

According to an embodiment of the present disclosure, there is provided a data processing method including: the method comprises the steps that data in a database are subjected to incremental loading through a first data node, the first data node is configured to write memory data into an external storage, the memory of the first data node is used for maintaining data id and index data, and the external storage is configured to maintain full data; index data is received from the first data node by a second data node and data is read from the external memory.

According to another embodiment of the present disclosure, there is provided a data processing apparatus including: the first data node is configured to carry out incremental loading on data in the database; the first data node writes memory data into the external memory, the memory of the first data node is used for maintaining data id and index data, and the external memory is configured to maintain full data; a second data node receiving index data from the first data node and reading data from the external memory.

According to another embodiment of the present disclosure, there is provided an electronic apparatus including: at least one memory and at least one processor; the memory is used for storing program codes, and the processor is used for calling the program codes stored in the memory to execute the data processing method.

According to another embodiment of the present disclosure, there is provided a computer storage medium storing program code for executing the above-described data processing method.

By adopting the data processing method disclosed by the invention, the bottleneck problem of a machine memory is eliminated, so that the data service can be infinitely expanded along with the increase of service nodes.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 shows a schematic diagram of a data processing manner.

Fig. 2 shows a schematic diagram of a double cache full replacement mechanism.

Fig. 3 shows a schematic flow chart of a data processing manner of an embodiment of the present disclosure.

Fig. 4 shows a schematic diagram of a data processing manner of an embodiment of the present disclosure.

FIG. 5 illustrates a schematic structural diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

In some data processing methods, cluster incremental data synchronization is calculated in real time completely through a master library (leader) of cluster elections, so that the bottleneck of Database (DB) query is basically eliminated, but the memory problem is still serious.

As shown in fig. 1, when a data change occurs in the DB, a binlog message is issued. binlog is a binary log of MySQL that records all Data Definition Language (DDL) and Data Manipulation Language (DML) statements, in addition to data query statements, in the form of events, and also contains the elapsed time that the statements were executed. Thus, whenever a data change (e.g., increase, decrease, change) occurs to the DB, it can be notified to the master library (leader) cluster through a binlog message.

Next, the binlog message is received and parsed by the collection component, resulting in parsed information. Since the binlog message is a binary file, further parsing is required. Additionally, the collection component may be canal, but the disclosure is not so limited and may be any suitable middleware for parsing binlog messages. canal is a middleware developed with java that provides incremental data subscription & consumption based on database incremental log parsing.

Then, the master library performs calculation processing on the analysis information to obtain changed data. Typically, the parsed information after canal parsing is sent to the master library or master library cluster through a message publishing system (e.g., kafka). The master library cluster is used to consume binlog messages, compute and assemble the advertisement data in real time, and push the computation results to the slave library (slave) cluster or downstream in real time through kafka. Thus, the master library cluster will no longer provide online service capabilities, but rather a real-time computing engine. It should be understood that the above-described push system kafka is merely exemplary and is not intended to limit the present disclosure. In addition, the information pushed by kafka is gradually consumed in a queue manner, and for convenience of description, the position of the consumed information at the current time in the queue can be referred to as a consumption site or consumption progress.

Before the DB data changes, the master library cluster has data in memory that is synchronized with the DB. When the data in the DB is changed, the data is notified to the master library cluster through a binlog message, the master library cluster calculates and assembles the data based on the change information to obtain changed data, and the changed data in the master library cluster and the data after the DB are changed keep synchronous again.

In addition, the slave library cluster receives the push message from the master library cluster and processes to update the data in the memory of the slave library cluster so that the data in the memory of the slave library cluster is consistent with the changed data in the master library cluster. Synchronization and consistency of the slave library clusters with the data in the DB is maintained through synchronized pushing of the master library cluster and the slave library clusters.

As shown in fig. 1, the master library cluster may periodically backup an in-memory file, and after the in-memory file is backed up, notify the slave library cluster to load the in-memory file. And loading the memory file after receiving the notification of the master library cluster from the slave library cluster, and carrying out full loading. For example, after the primary vault integrates a backup of a memory file to the external storage TBS, information is sent to the secondary vault cluster indicating that the memory file has been backed up. The slave library that receives the notification message begins caching the backed up file at this time. Referring to fig. 2, as further shown, a portion of the slave library caches in the slave library cluster are used to provide online services, and another portion of the slave library caches are used to store memory files backed up by the master library cluster to external storage. After the file loading is completed, the cache loaded with the backup file is replaced with the online service cache, and the full loading is completed. Since two caches exist in the slave library cluster in the full-load process, the method is a double-cache mechanism.

The present disclosure provides a data processing method for moving a service machine memory to external distributed storage. Specifically, as shown in fig. 3, the method of the present disclosure includes step 101, performing incremental loading on data in a database through a first data node (e.g., a master library or a master library cluster), where the first data node calculates advertisement data in real time and writes memory data into an external storage, and the memory of the first data node is used to maintain advertisement data id and index data, where the external storage is configured to maintain full advertisement data. It should be understood that although the advertisement data is exemplified in the present disclosure, the present disclosure is not limited thereto. The manner of incremental loading of the master library may be the same as in FIG. 1. The first data node may be a shard and may include a plurality of shards and a plurality of replicas. The method of the present disclosure further includes step S102, the second data node (e.g., the agent or from the library cluster) receiving the index data from the first data node and reading the data from the external memory using the index data.

As described above, the master library service machine may maintain only the advertisement data id and the index data and synchronize the index data to the slave library machine. The master library writes the full amount of data to the external memory and the slave library reads the relevant data from the external memory. In this way, the main library service machine only maintains the advertisement data id and the index data, the slave library reads data from the external memory, and the slave library does not maintain the full data, so the memory bottleneck of the service machine is not a problem. In addition, the external memory can be expanded horizontally, so that the problem of memory bottleneck can be eliminated.

The following detailed description is to be read in conjunction with fig. 4, and it is to be understood that the following specific examples are included merely for a better understanding of the present invention and are not intended to limit the present disclosure.

As shown in fig. 4, delta information is issued from a database DB in response to a data change of the DB, wherein the delta information is a binlog-based delta message. Thus, the delta datapath of the present disclosure will no longer be based on a timed query of, for example, 10s, but rather on binlog delta messages of mysql. When a data change occurs in the DB, a binlog message is issued. binlog is a binary log of MySQL that records all DDL and DML (except for data query statements) statements, records in the form of events, and also contains the elapsed time that the statements were executed. Thus, whenever a data change (e.g., an increase, decrease, change) occurs to the DB, it can be notified to the master library cluster through a binlog message.

The binlog message is received and parsed by the collection component to obtain parsing information. Since the binlog message is a binary file, further parsing is required. Additionally, the collection component may be canal, but the disclosure is not so limited and may be any suitable middleware for parsing binlog messages. canal is a middleware developed with java that provides incremental data subscription & consumption based on database incremental log parsing. This part of the path is the same as above.

Then, the analysis information is subjected to calculation processing to obtain changed data. Typically, the parsed information after canal parsing is sent to a first data node (e.g., a master library or a master library cluster) through a message publishing system (e.g., kafka). The master library cluster is used to consume binlog messages, compute in-ad data in real-time and write to external memory (e.g., Tbase). The main library memory is only maintained in the advertisement id and md5 (index data) corresponding to the data, is no longer maintained in the advertisement data, and is no longer provided with remote procedure call protocol (RPC) service. In addition, the master library may synchronize index data to a cluster of slave libraries (proxy) clusters).

In this disclosure, the master library cluster may be sharded, e.g., including shard 0, shard 1, shard 2, and shard 3. In a database environment, fragmentation can result in smaller partitions being created in the ledger. Therefore, these partitions are called slices. In each slice, multiple copies are included. Upon a reboot of the master library machine or some other operation, the roles of the master libraries (e.g., master library 1, master library 2, master library 3, and master library 4) may be assumed by the respective replicas. It should be understood that although FIG. 4 illustrates a master library cluster including 4 shards and 2 replicas per shard, the number of shards and replicas is merely exemplary and other numbers of shards and replicas may be included.

The slave library cluster may query the relevant data from external storage (e.g., Tbase) using the index data. The slave library cluster may be responsible only for providing RPC services and no longer retain the complete data file. Thus, the memory bottleneck of the master library and the slave library clusters is solved. The previous method for storing complete data by means of the memories in the master library and the slave library is limited in that the memories cannot be transversely expanded and the problem of memory bottleneck exists. By writing the full amount of data into the external memory, for example, Tbase, the external memory supports lateral expansion, thus solving the problem that the memory is not laterally expandable and is limited. Accordingly, online service capability can be improved.

In the above link of fig. 4, there may be only incremental loading, and not full loading. Full load may be accomplished through the data module. The data module may consume binlog data, as with the main library of the cluster, with memory maintained with the ad id and index data md 5. Additionally, the master library in the data module may also include a copy. The data module may perform data backtracking and update the full-volume on-demand ad id of the new temporal version and send to the redis by reading the full-volume data id (e.g., full-volume on-demand ad id) of the last temporal version in another external memory (e.g., a redis, not shown in fig. 4) that is versioned with a timestamp and also records the kafka message location corresponding to the last temporal version. Then, the data of the full load is divided into a plurality of buckets (e.g. 100-200 buckets), and the data of each bucket is incrementally loaded together with the incremental data written into the Tbase by the master library, so that the memory spurt problem caused by the full integral replacement can be eliminated, and the effective time of the data can be improved.

In some embodiments, a difference/compensation component may exist between the data module and the master library cluster to find differences between the data module and the master library cluster, and compensate the data when differences are found.

The data module of the present disclosure can only load data related to the advertisement putting state, and is used for real-time online and offline of the advertisement, and other data are not loaded any more. Because the data module only maintains the ad id and state data, the memory is not a bottleneck for the data module. After the full load, the data module divides the advertisement data into buckets (e.g., 100 buckets and 200 buckets), and sequentially refreshes the data of each bucket into the external memory Tbase. By adopting the barrel dividing mode, the data module uniformly disperses the full load into the incremental load, the full load is smoothly realized in a streaming data updating mode, the memory spurs of the full load are eliminated, and the effective time of the data is also improved.

Thus, the master library cluster is responsible for writing data into the external memory Tbase, and the slave library cluster is responsible for reading data from the external memory Tbase, so that read-write cluster separation is performed. Furthermore, by employing proxy (proxy) clusters, traffic is isolated and data storage shards are masked. Thus, access to the external memory may be imperceptible to the user. Therefore, the above factors all enable the data processing architecture of the present disclosure to support horizontal expansion, and eliminate the limitation of the memory bottleneck of the service machine. With the increase of the data service quantity, the method and the device can be transversely expanded by increasing the number of data service nodes, so that the problem of bottleneck of a memory is eliminated.

The present disclosure also provides a data processing apparatus, including: a first data node (e.g., a master library or a master library cluster) configured to incrementally load data in a database; the first data node writes the memory data into the external memory, the memory of the first data node is used for maintaining data id and index data, and the external memory is configured to maintain full data; and a second data node receiving the index data from the first data node and reading data from the external memory.

In some embodiments, the data processing apparatus further comprises: and the data module is used for carrying out full loading on the database, uniformly dispersing the full-loaded data into a plurality of buckets, and loading the data of each bucket in the plurality of buckets and the incremental loaded data of the first data node through the external memory.

Furthermore, the present disclosure also provides an electronic device, comprising: at least one memory and at least one processor; the memory is used for storing program codes, and the processor is used for calling the program codes stored in the memory to execute the data processing method.

In addition, the present disclosure also provides a computer storage medium storing program codes for executing the above-described data processing method.

In some embodiments, the present disclosure solves the problem that the memory cannot be expanded laterally by storing data to the external storage, the master library cluster and the external storage being fragmented and performing read-write cluster separation, the master library writing data to the external storage, and the slave library reading data from the external storage, and eliminates the memory bottleneck of data service. In addition, by adding proxy clusters (slave library clusters), the fragments of data storage are shielded, so that users cannot perceive the fragments of data storage.

Referring now to FIG. 5, a block diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 506 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 506 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate wirelessly or by wire with other devices to replace data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 609, or installed from the storage means 506, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.

Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a data processing method including: the method comprises the steps that data in a database are subjected to incremental loading through a first data node, the first data node is configured to write memory data into an external storage, the memory of the first data node is used for maintaining data id and index data, and the external storage is configured to maintain full data; index data is received from the first data node by a second data node and data is read from the external memory.

According to one or more embodiments of the present disclosure, further comprising: the database is fully loaded by a data module configured to evenly distribute fully loaded data into a plurality of buckets, the data of each bucket of the plurality of buckets being loaded through the external memory along with incrementally loaded data of the first data node.

According to one or more embodiments of the present disclosure, incrementally loading data in the database by a first data node comprises: sending incremental information through the database in response to data changes of the database; receiving the incremental information through an acquisition component and analyzing the incremental information to obtain analysis information; and sending the analysis information to the first data node through a message issuing system, and carrying out incremental loading on the basis of the analysis information through the first data node.

According to one or more embodiments of the present disclosure, the message publishing system further sends the parsing information to the data module.

According to one or more embodiments of the present disclosure, the full-loading of the database by the data module comprises: reading out the full data of the first time version from another external memory, backtracking the data through the consumption site in the message issuing system corresponding to the first time version, and updating to obtain the full data id of the second time version.

According to one or more embodiments of the present disclosure, further comprising: determining, by a discrepancy compensation module, a discrepancy between the data module and the first data node, and compensating for data when there is a discrepancy.

According to one or more embodiments of the present disclosure, the first data node includes a plurality of shards and a plurality of replicas.

According to one or more embodiments of the present disclosure, there is also provided a data processing apparatus including: the first data node is configured to carry out incremental loading on data in the database; the first data node writes memory data into the external memory, the memory of the first data node is used for maintaining data id and index data, and the external memory is configured to maintain full data; a second data node receiving index data from the first data node and reading data from the external memory.

According to one or more embodiments of the present disclosure, further comprising: and the data module is used for carrying out full loading on the database, uniformly dispersing the full-loaded data into a plurality of buckets, and loading the data of each bucket in the plurality of buckets and the incrementally-loaded data of the first data node through the external memory.

According to one or more embodiments of the present disclosure, there is provided an electronic device including: at least one memory and at least one processor; the memory is used for storing program codes, and the processor is used for calling the program codes stored in the memory to execute the data processing method.

According to one or more embodiments of the present disclosure, there is provided a computer storage medium storing program code for executing the above-described data processing method.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A data processing method, comprising:

the method comprises the steps that data in a database are subjected to incremental loading through a first data node, the first data node is configured to write memory data into an external storage, the memory of the first data node is used for maintaining data id and index data, and the external storage is configured to maintain full data;

index data is received from the first data node by a second data node and data is read from the external memory.

2. The data processing method of claim 1, further comprising:

the database is fully loaded by a data module configured to evenly distribute fully loaded data into a plurality of buckets, the data of each bucket of the plurality of buckets being loaded through the external memory along with incrementally loaded data of the first data node.

3. The data processing method of claim 2, wherein incrementally loading data in the database with the first data node comprises:

sending incremental information through the database in response to data changes of the database;

receiving the incremental information through an acquisition component and analyzing the incremental information to obtain analysis information;

and sending the analysis information to the first data node through a message issuing system so as to carry out increment loading based on the analysis information through the first data node.

4. The data processing method of claim 3, wherein the message publishing system further sends the parsing information to the data module.

5. The data processing method of claim 4, wherein the full loading of the database by the data module comprises:

reading out the full data of the first time version from another external memory, backtracking the data through the consumption site in the message issuing system corresponding to the first time version, and updating to obtain the full data id of the second time version.

6. The data processing method of claim 2, further comprising:

determining, by a discrepancy compensation module, a discrepancy between the data module and the first data node, and compensating for data when there is a discrepancy.

7. The data processing method of claim 1, wherein the first data node comprises a plurality of shards and a plurality of replicas.

8. A data processing apparatus, comprising:

the first data node is configured to carry out incremental loading on data in the database;

the first data node writes memory data into the external memory, the memory of the first data node is used for maintaining data id and index data, and the external memory is configured to maintain full data;

a second data node receiving index data from the first data node and reading data from the external memory.

9. The data processing apparatus of claim 8, further comprising:

and the data module is used for carrying out full loading on the database, uniformly dispersing the full-loaded data into a plurality of buckets, and loading the data of each bucket in the plurality of buckets and the incrementally-loaded data of the first data node through the external memory.

10. An electronic device, characterized in that the electronic device comprises:

at least one memory and at least one processor;

wherein the memory is configured to store program code and the processor is configured to call the program code stored in the memory to perform the data processing method of any of claims 1 to 7.

11. A computer storage medium characterized in that the computer storage medium stores a program code for executing the data processing method of any one of claims 1 to 7.