CN110888929B

CN110888929B - Data processing method, data processing device, data node and storage medium

Info

Publication number: CN110888929B
Application number: CN201911239023.7A
Authority: CN
Inventors: 夏锦阳; 张斌
Original assignee: Miaozhen Information Technology Co Ltd
Current assignee: Miaozhen Information Technology Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2022-03-29
Anticipated expiration: 2039-12-06
Also published as: CN110888929A

Abstract

The embodiment of the invention relates to the technical field of unstructured databases, and provides a data processing method, a device, a data node and a storage medium, wherein the method comprises the following steps: receiving a data processing task sent by a client; the data processing task comprises at least one data processing request, and each data processing request comprises an identifier, a processing parameter and a processing type of data to be processed; processing a first target cell which is consistent with the identification of the data to be processed of the data processing request in a pre-established temporary storage database according to the processing parameter and the processing type of each data processing request; and when the first target cells of all the data processing requests in the data processing tasks are processed, writing the cells in the temporary storage database back to the distributed database, and returning the data processing tasks to the client side to be completed. The embodiment of the invention can improve the efficiency of processing large-scale data by the database without a batch transaction interface.

Description

Data processing method, data processing device, data node and storage medium

Technical Field

The invention relates to the technical field of unstructured databases, in particular to a data processing method and device, a data node and a storage medium.

Background

In a distributed DataBase system ddbs (distributed DataBase system), an application program may perform transparent operations on a DataBase, and data in the DataBase are stored in different local databases, run on different machines, are supported by different operating systems, and are connected together by different communication networks. A distributed database is a unified whole logically, and is stored on different physical nodes respectively physically. An application may access a database distributed over different geographic locations through a network connection.

However, when data in a distributed database system without a batch transaction interface needs to be processed on a large scale, if a real-time synchronization interface is called to perform reading, operation and write-back, efficiency is low and operation time is long.

Disclosure of Invention

The invention provides a data processing method, a data processing device, a data node and a storage medium, which can realize large-scale and efficient transaction operation for a database without a batch transaction interface and improve the efficiency of processing large-scale data by the database system on the premise of ensuring data safety.

Embodiments of the invention may be implemented as follows:

in a first aspect, an embodiment of the present invention provides a data processing method, which is applied to a data node in a distributed database system, where a distributed database is deployed on the data node, the distributed database is unstructured and does not have a large-scale data transaction processing function, the data node is in communication connection with a client, and the distributed database includes a plurality of data unit cells, where the method includes: receiving a data processing task sent by a client; the data processing task comprises at least one data processing request, and each data processing request comprises an identifier, a processing parameter and a processing type of data to be processed; processing a first target cell which is consistent with the identification of the data to be processed of the data processing request in a pre-established temporary storage database according to the processing parameter and the processing type of each data processing request, wherein the temporary storage database comprises cells which are migrated from a distributed database in advance and are related to data processing tasks, and the temporary storage database has a transaction processing function; and when the first target cells of all the data processing requests in the data processing tasks are processed, writing the cells in the temporary storage database back to the distributed database, and returning the data processing tasks to the client side to be completed.

In a second aspect, an embodiment of the present invention provides a data processing apparatus, which is applied to a data node in a distributed database system, where a distributed database is deployed on the data node, the distributed database is unstructured and does not have a large-scale data transaction processing function, the data node is in communication connection with a client, the distributed database includes multiple data unit cells, the apparatus includes a receiving module, a processing module, and a write-back module, where the receiving module is configured to receive a data processing task sent by the client; the data processing task comprises at least one data processing request, and each data processing request comprises an identifier, a processing parameter and a processing type of data to be processed; the processing module is used for processing a first target cell which is consistent with the identification of the data to be processed of the data processing request in a pre-established temporary storage database according to the processing parameters and the processing type of each data processing request, wherein the temporary storage database comprises cells which are migrated from a distributed database in advance and are related to data processing tasks, and the temporary storage database has a transaction processing function; and the write-back module is used for writing the cells in the temporary storage database back to the distributed database when the first target cells of all the data processing requests in the data processing task are processed, and returning the data processing task completion to the client.

In a third aspect, an embodiment of the present invention provides a data node, where the data node includes: one or more processors; memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a data processing method as in any one of the preceding embodiments.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the data processing method according to any one of the foregoing embodiments.

Compared with the prior art, the embodiment of the invention provides a data processing method, a data processing device, a data node and a storage medium, wherein cells related to the same data processing task are placed in a temporary storage database in advance, when a data processing request of the data processing task is received, the cells corresponding to the data processing request in the temporary storage database are directly processed, when all the data processing requests of the data processing task are processed, the data processed by the data processing task are uniformly written back to a distributed database, and because a uniform asynchronous write-back mode is adopted for the processed data in the data processing task, large-scale and efficient transaction operation is realized for the database without a batch transaction interface, and the efficiency of processing large-scale data by the database system is improved on the premise of ensuring data safety.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 shows an exemplary diagram of an application scenario provided by an embodiment of the present invention.

Fig. 2 is a block diagram illustrating a data node according to an embodiment of the present invention.

Fig. 3 is a flowchart illustrating a data processing method according to an embodiment of the present invention.

Fig. 4 is a flowchart illustrating another data processing method according to an embodiment of the present invention.

Fig. 5 is a flowchart illustrating another data processing method according to an embodiment of the present invention.

Fig. 6 shows a flowchart of a data processing apparatus according to an embodiment of the present invention.

Icon: 10-a data node; 11-a memory; 12-a communication interface; 13-a processor; 14-a bus; 20-a client; 100-a data processing device; 110-a receiving module; 120-a processing module; 130-write back module; 140-reading module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

Referring to fig. 1, fig. 1 shows an exemplary view of an application scenario provided by an embodiment of the present invention, a distributed database system includes a plurality of data nodes 10, each data node 10 is in communication connection with a client 20, the distributed database is deployed in the plurality of data nodes 10, and the client 20 can issue a data processing task through any one data node 10 to process a data unit cell in the distributed unstructured database. Unstructured databases include, but are not limited to, Cassandra, HBase, riak, etc. Taking HBase as an example, each data cell may be a row of HBase, including: a row key (rowkey), a column family, a column identifier, a timestamp and data, and table 1 is a data organization diagram in HBase.

TABLE 1

Line key	Family of columns	Column identification	Time stamp	Data of
					Key1	Cf1	C1	Date1	Data1
Key1	Cf2	C1	Date2	Data2
					Key2	Cf1	C2	Date3	Data3

There are 3 rows in table 1, each row may be considered as a cell, and when data is added to table 1, the cell in which the data to be added is located may be determined by determining the row key, column family, and column identifier of the data to be added.

The data node 10 migrates cells related to the data processing task in the distributed database to the temporary storage database in advance, processes a first target cell in the temporary storage database, which is consistent with the identifier of the data to be processed, according to the identifier, the processing type and the processing parameters of the data to be processed in the data processing request, for each data processing request in the data processing task after receiving the data processing task, writes the cells in the temporary storage database back to the distributed database when all the data processing requests in the data processing task are processed, and then feeds back the completion of the data processing task to the client. The processing type may be adding data, modifying data or deleting data.

Referring to fig. 2, fig. 2 is a block diagram illustrating a data node 10 according to an embodiment of the present invention. The data node 10 further comprises a memory 11, a communication interface 12, a processor 13 and a bus 14. The memory 11, the communication interface 12, and the processor 13 are connected by a bus 14.

The memory 11 is used for storing a program, such as the network traffic recovery device described above, which includes at least one software functional module that can be stored in the memory 11 in the form of software or firmware (firmware), and the processor 13 executes the program after receiving an execution instruction to implement the data processing method disclosed in the above embodiment.

The Memory 11 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Alternatively, the memory 11 may be a storage device built in the processor 13, or may be a storage device independent of the processor 13.

The communication connection with other data nodes 10, an exit gateway and a firewall device is realized through at least one communication interface 12 (which can be wired or wireless).

The bus 14 may be an ISA bus, PCI bus, EISA bus, or the like. Fig. 2 is represented by only one double-headed arrow, but does not represent only one bus or one type of bus.

The processor 13 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 13. The Processor 13 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

Next, an embodiment of the present invention will describe in detail a data processing method applied to the data node 10, please refer to fig. 3, where fig. 3 shows a flowchart of a data processing method provided in an embodiment of the present invention, where the method includes the following steps:

step S101, receiving a data processing task sent by a client; the data processing task comprises at least one data processing request, and each data processing request comprises an identifier, a processing parameter and a processing type of data to be processed.

In this embodiment, taking HBase as an example, the identifier of the data to be processed may be a combination of row keys, column families, and column identifiers, the processing type may include a new addition type, a modification type, and a deletion type, the processing types are different, and the corresponding processing parameters may also be different.

The data processing task may include a plurality of data processing requests, where each cell processed by each data processing request is different, and the processing types may be the same or different. For example, the same data processing task may include adding cell1 and cell2, modifying cell3, and deleting cell4, but the same data processing task cannot add and modify cell 1.

Step S102, processing a first target cell which is consistent with the identification of the data to be processed of the data processing request in a pre-established temporary storage database according to the processing parameter and the processing type of each data processing request, wherein the temporary storage database comprises cells which are migrated from a distributed database in advance and are related to data processing tasks, and the temporary storage database has a transaction processing function.

In this embodiment, the data node 10 first determines a cell related to the data processing task from the distributed database according to the data processing task, and then migrates the cell related to the data processing task to the temporary storage database, so that when a data processing request in the data processing task is subsequently processed, the cell to be processed is directly obtained from the temporary storage database, and thus the cell to be processed can be found more quickly during processing, and the efficiency of data processing is improved.

And step S103, when the first target cells of all the data processing requests in the data processing tasks are processed, writing back the cells in the temporary storage database to the distributed database, and returning the data processing tasks to the client for completion.

In this embodiment, the first target cells are cells whose data processing requests need to be processed, each data processing request in the data processing task corresponds to one first target cell, and when all the first target cells in the same data processing task are processed, the data node 10 uniformly writes back the cells in the temporary storage database to the distributed database, so that it can be ensured that a plurality of first target cells in the same data processing task are processed in batch in the distributed database, and the efficiency of data processing is further improved.

According to the method provided by the embodiment of the invention, the cells related to the same data processing task are placed in the temporary storage database in advance, when the data processing request of the data processing task is received, the cells corresponding to the data processing request in the temporary storage database are directly processed, and when all the data processing requests of the data processing task are processed, the data processed by the data processing task are uniformly written back to the distributed database.

Referring to fig. 4, fig. 4 is a flowchart illustrating another data processing method according to an embodiment of the present invention. Step S102 includes the following substeps:

step S1021, updating the data of the corresponding first target cell according to the processing parameter and the processing type.

In this embodiment, when the processing type is a new type, the processing parameter includes data to be newly created, and the implementation manner of updating the data of the corresponding first target cell according to the processing parameter and the processing type may be:

firstly, a cell to be newly built is taken as a first target cell.

And then, newly adding the identifier of the data to be processed and the data to be newly built into a temporary storage database.

In this embodiment, taking HBase as an example, the identifier of the data to be processed may be a combination of a row key, a column family, and a column identifier. In this embodiment, taking HBase as an example, the scratch database uses a row key, a column family, and a column identifier as primary keys, and uses a timestamp and data as fields. As a specific implementation manner, the temporary database corresponding to table 1 is shown in table 2:

TABLE 2

Row key + column family + column identification	Time stamp	Data of
			Key1_Cf1_C1	Date1	Data1
Key1_Cf2_C1	Date2	Data2
			Key2_Cf1_C2	Date3	Data3

It should be noted that, when a first target cell is newly added to the temporary storage database, the field of the timestamp is not assigned, and when the newly added first target cell is to be written into the distributed database, the timestamp in the distributed database is assigned, and at this time, the value of the timestamp may be assigned to the system time when the timestamp is written into the distributed database at that time.

For example, the identifier of the first target cell newly added to the temporary database in table 2 is: the Key3_ Cf2_ C2, Data is Data4, and the Data is shown in table 3 after the first target cell is newly added:

TABLE 3

In this embodiment, when the processing type is the modification type, the processing parameter includes modified data, and the implementation manner of updating the data of the corresponding first target cell according to the processing parameter and the processing type may be:

and modifying the data of the first target cell into modified data.

In this embodiment, when modifying the first target cell, only the data of the first target cell may be modified, and the timestamp field in the temporary storage database is not modified, because the system time when writing needs to be re-acquired when the first target cell is written back to the distributed database from the temporary storage database, and finally the system time at this time is used to assign a value to the timestamp of the first target cell.

In this embodiment, when the processing type is the delete type, the implementation manner of updating the data of the corresponding first target cell according to the processing parameter and the processing type is at least any one of the following manners:

mode 1: if the processing parameter is null, that is, the processing parameter is not specified, at this time, a delete flag is set for the first target cell to identify that the data of the first target cell is invalid.

Mode 2: the processing parameter is a preset range of the cell to be deleted, and at this time, deletion marks are set for all the cells in the preset range after the first target cell so as to mark the data of the cells as invalid.

It should be noted that, here, only the delete flag is set, the cell is not directly deleted from the temporary storage database, and when the distributed database system is abnormal, the data lost when the deletion operation is interrupted abnormally without completion can be recovered, so that the reliability of the distributed database system is improved.

Step S1022, set the operation type of the first target cell as the processing type, and set the synchronization status of the first target cell.

In order to facilitate writing back a cell in the temporary storage database to the distributed database, in the embodiment of the present invention, each cell in the temporary storage database is newly added with a corresponding operation type and a synchronization state, where the operation type is a processing type in a processing request, and includes a new creation type, a modification type, and a deletion type, and the synchronization state is used to indicate a stage where data in the cell is processed, for example, a synchronization state is 0 to indicate that the data of the cell is being processed in a data processing request, and a synchronization state is 1 to indicate that the data of the cell has one copy in both the distributed database and the temporary storage database, and the data in the temporary storage database is the latest. Setting the synchronization state of the first target cell may be to set the synchronization state to 1.

In this embodiment, since the migration of the cells related to the data processing task in the distributed database to the temporary storage database is performed in two steps: firstly, copying the cells related to the data processing tasks in the distributed database to a temporary storage database, and then deleting the cells related to the data processing tasks in the distributed database. In order to avoid an error occurring when a client reads a cell related to a data processing task during migration, while copying the cell related to the data processing task in the distributed database into the temporary storage database, setting the operation type of the cell as null, and setting the synchronization state as 1, at this time, if the cell needs to be read, first reading the data of the cell in the temporary storage database, then reading the data of the cell in the distributed database, after covering the data of the cell in the distributed database with the data of the cell in the temporary storage database, returning the data of the covered cell in the distributed database to the client, for example, the data of the cell1 in the temporary storage database includes a website with a value of a and a synchronization state of 1, the data of the cell1 in the distributed database includes a website and a user name, wherein the website value is b, the user name is m, and when the client issues a read request of the cell1, the data node 10 first reads the cell1 from the temporary storage database, that is, the website is a, then reads the cell1 from the distributed database, that is, the website is b, the user name m, next, the data node 10 covers the website value a in the distributed database over the website b in the distributed database, and finally, the data returned to the client is: website a, user name m.

With continued reference to fig. 4, step S103 includes the following sub-steps:

and step S1031, writing back the cell with the operation type of the new type or the modified type in the temporary storage database to the distributed database.

And a substep S1032, clearing all cells in the temporary database.

In this embodiment, only the data of the cell in the temporary storage database is cleared, and the structure of the temporary storage database is still convenient for the subsequent data processing task to be repeatedly used. For example, the temporary database after all cells are cleared is shown in table 4:

TABLE 4

Row key + column family + column identification

Time stamp

Data of

According to the method provided by the embodiment of the invention, the operation type field and the synchronous state field are added to the temporary storage database, and only the cell of the newly added type and the cell of the modified type in the temporary storage database are written back to the distributed database by using the operation type field, so that the necessary write-back operation is avoided, and the synchronous state field is used for ensuring that the cell can correctly read the data in the cell in the process of transferring from the distributed database to the temporary storage database, and simultaneously ensuring the reliability and the access efficiency of the data in the distributed database.

In order to ensure that the data of the cells in the distributed database can be read correctly and efficiently in the whole data processing task process, an embodiment of the present invention further provides a data processing method, please refer to fig. 5, where fig. 5 shows a flowchart of another data processing method provided by an embodiment of the present invention, the method includes the following steps:

step S201, receiving a data reading request sent by a client, where the reading request includes an identifier of data to be read.

In this embodiment, taking HBase as an example, the identifier of the data to be read may be a combination of a row key, a column family, and a column identifier of the data to be processed.

Step S202, if the temporary storage database does not have a second target cell consistent with the identifier of the data to be read, finding out the second target cell consistent with the identifier of the data to be read from the distributed database and returning the data of the second target cell to the client.

In this embodiment, the second target cell is a cell to which data to be read belongs, and the temporary storage database does not have the second target cell, which means that the second target cell is not a cell that needs to be processed by the current data processing task, and at this time, the second target cell is directly found from the distributed database, and the data of the second target cell is returned to the client.

Step S203, if a second target cell consistent with the identifier of the data to be read exists in the temporary storage database and the synchronization state of the second target cell is not set, merging the second target cell and a third target cell consistent with the identifier of the data to be read in the distributed database, and returning the merged data to the client.

In this embodiment, when the second target cell exists in the temporary storage database, it means that the second target cell is a cell that needs to be processed by the current data processing task, and since the synchronization state of the second target cell in the temporary storage database is not set, only a part of data of the cell represented by the identifier of the data to be read may be stored in the temporary storage database, and all data of the cell represented by the identifier of the data to be read may be required by the client, therefore, the data of the second target cell represented by the identifier of the data to be read in the temporary storage database and the data of the third target cell represented by the identifier of the data to be read in the distributed database need to be merged and then returned to the client.

Step S204, if a second target cell consistent with the identification of the data to be read exists in the temporary storage database and the synchronization state of the second target cell is set, covering the data corresponding to the third target cell consistent with the identification of the data to be read in the distributed database by using the data of the second target cell, and then returning the covered data of the third target cell to the client.

In this embodiment, if the synchronization state of the second target cell in the temporary storage database is not set, repeated data may exist in the temporary storage database and the distributed database, and at this time, the data in the temporary storage database is the latest data, so that the data of the third target cell represented by the identifier of the data to be read in the distributed database needs to be covered with the data of the second target cell represented by the identifier of the data to be read in the temporary storage database, and the covered data of the third target cell is returned to the client.

It should be noted that the temporary database may be, but is not limited to, an SQL database and a redis database.

In the embodiment of the invention, the database supporting the transaction processing is used as the temporary storage database, so that the distributed database not supporting the transaction processing can realize the transaction processing of the data.

In order to perform the corresponding steps in the above-described embodiments and various possible implementations, an implementation of the data processing apparatus is given below. Referring to fig. 6, fig. 6 is a functional block diagram of a data processing apparatus 100 according to an embodiment of the present invention. It should be noted that the basic principle and the resulting technical effect of the data processing apparatus 100 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no mention is made in this embodiment, and reference may be made to the corresponding contents in the above embodiments. The data processing apparatus 100 includes a receiving module 110, a processing module 120, a write-back module 130, and a reading module 140.

A receiving module 110, configured to receive a data processing task sent by a client; the data processing task comprises at least one data processing request, and each data processing request comprises an identifier, a processing parameter and a processing type of data to be processed.

The processing module 120 is configured to process a first target cell in a pre-established temporary storage database, which is consistent with an identifier of data to be processed of each data processing request, according to a processing parameter and a processing type of each data processing request, where the temporary storage database includes cells that are migrated from a distributed database in advance and are related to a data processing task, and the temporary storage database has a transaction processing function.

Specifically, each cell in the temporary storage database is newly added with a corresponding operation type and a synchronization state, and when the processing module 120 executes a first target cell that is consistent with the identifier of the to-be-processed data of the data processing request in the temporary storage database established in advance according to the processing parameter and the processing type of each data processing request, the processing module is specifically configured to: updating the data of the corresponding first target cell according to the processing parameters and the processing type; and setting the operation type of the first target cell as a processing type, and setting the synchronization state of the first target cell.

Specifically, when the processing type is the new type, the processing parameter includes data to be newly created, and the processing module 120 is specifically configured to, when executing updating of the data of the corresponding first target cell according to the processing parameter and the processing type: taking a cell to be newly built as a first target cell; and newly adding the identification of the cell to be newly built and the data to be newly built into a temporary storage database.

Specifically, when the processing type is the modification type, the processing parameter includes modified data, and the processing module 120 is specifically configured to, when updating the corresponding data of the first target cell according to the processing parameter and the processing type: and modifying the data of the first target cell into the modified data.

Specifically, when the processing type is the deletion type, the processing module 120 is specifically configured to, when updating the data of the corresponding first target cell according to the processing parameter and the processing type: and setting a deletion mark for the first target cell to identify that the data of the first target cell is invalid.

And the write-back module 130 is configured to, when the first target cells of all the data processing requests in the data processing task are processed completely, write back the cells in the temporary storage database to the distributed database, and return the completion of the data processing task to the client.

Specifically, the write-back module 130 is specifically configured to: writing back the cell with the operation type of the new type or the modified type in the temporary storage database to the distributed database; and clearing all cells in the temporary database.

The reading module 140 is configured to receive a data reading request sent by a client, where the reading request includes an identifier of data to be read; if the temporary storage database does not have a second target cell consistent with the identifier of the data to be read, finding out the second target cell consistent with the identifier of the data to be read from the distributed database and returning the data of the second target cell to the client; if a second target cell consistent with the identification of the data to be read exists in the temporary storage database and the synchronization state of the second target cell is not set, merging the second target cell and a third target cell consistent with the identification of the data to be read in the distributed database, and returning the merged data to the client; and if a second target cell consistent with the identifier of the data to be read exists in the temporary storage database and the synchronization state of the second target cell is set, covering the data corresponding to the third target cell consistent with the identifier of the data to be read in the distributed database by using the data of the second target cell, and returning the covered data of the third target cell to the client.

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the data processing method according to any one of the foregoing embodiments.

In summary, embodiments of the present invention provide a data processing method, an apparatus, a data node, and a storage medium, which are applied to a data node in a distributed database system, where the data node is deployed with a distributed database, the distributed database is an unstructured database without a large-scale data transaction processing function, the data node is in communication connection with a client, and the distributed database includes multiple data unit cells, and the method includes: receiving a data processing task sent by a client; the data processing task comprises at least one data processing request, and each data processing request comprises an identifier, a processing parameter and a processing type of data to be processed; processing a first target cell which is consistent with the identification of the data to be processed of the data processing request in a pre-established temporary storage database according to the processing parameter and the processing type of each data processing request, wherein the temporary storage database comprises cells which are migrated from a distributed database in advance and are related to data processing tasks, and the temporary storage database has a transaction processing function; and when the first target cells of all the data processing requests in the data processing tasks are processed, writing the cells in the temporary storage database back to the distributed database, and returning the data processing tasks to the client side to be completed. Compared with the prior art, the embodiment of the invention has the advantages that the cells related to the same data processing task are placed in the temporary storage database in advance, when the data processing request of the data processing task is received, the cells corresponding to the data processing request in the temporary storage database are directly processed, when all the data processing requests of the data processing task are processed, the data processed by the data processing task are uniformly written back to the distributed database, and as the uniform asynchronous write-back mode is adopted for the processed data in the data processing task, the large-scale and efficient transaction operation can be realized for the database without a batch transaction interface, and the efficiency of the database system for processing large-scale data is improved on the premise of ensuring the data safety.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A data processing method is applied to a data node in a distributed database system, a distributed database is deployed on the data node, the distributed database is an unstructured database without a large-scale data transaction processing function, the data node is in communication connection with a client, the distributed database comprises a plurality of data unit cells, and the method comprises the following steps:

receiving a data processing task sent by the client; the data processing task comprises at least one data processing request, each data processing request comprises an identifier, a processing parameter and a processing type of data to be processed, and the identifier of the data to be processed is a combination of a row key, a column family and a column identifier;

processing a first target cell which is consistent with an identifier of data to be processed of each data processing request in a pre-established temporary storage database according to a processing parameter and a processing type of each data processing request, wherein the temporary storage database comprises cells which are migrated from the distributed database in advance and are related to the data processing tasks, the temporary storage database has a transaction processing function, the data processing tasks comprise at least one data processing request, and the cells processed by each data processing request are different;

when the first target cells of all the data processing requests in the data processing tasks are processed completely, writing back the cells in the temporary storage database to the distributed database, and returning the data processing tasks to the client side to be completed.

2. The data processing method according to claim 1, wherein each cell in the temporary storage database is newly added with a corresponding operation type and a synchronization state, and the step of processing the first target cell in the temporary storage database, which is pre-established and is consistent with the identifier of the data to be processed of the data processing request, according to the processing parameter and the processing type of each data processing request comprises:

updating the data of the corresponding first target cell according to the processing parameters and the processing type;

setting the operation type of the first target cell as the processing type, and setting the synchronization state of the first target cell.

3. The data processing method according to claim 2, wherein when the processing type is a new type, the processing parameter includes data to be newly created, and the step of updating the corresponding data of the first target cell according to the processing parameter and the processing type includes:

taking a cell to be newly built as the first target cell;

and newly adding the row key, the column identifier and the data to be newly built of the cell to be newly built into the temporary storage database.

4. The data processing method according to claim 2, wherein when the processing type is a modification type, the processing parameter includes modified data, and the step of updating the corresponding data of the first target cell according to the processing parameter and the processing type includes:

and modifying the data of the first target cell into the modified data.

5. The data processing method according to claim 2, wherein when the processing type is a delete type, the step of updating the data of the corresponding first target cell according to the processing parameter and the processing type includes:

and setting a deletion mark for the first target cell to identify that the data of the first target cell is invalid.

6. The data processing method of claim 1, wherein the step of writing the cell in the staging database back to the distributed database comprises:

writing back the cell with the operation type of the new type or the modified type in the temporary storage database to the distributed database;

and clearing all cells in the temporary database.

7. The data processing method of claim 1, wherein the method further comprises:

receiving a data reading request sent by the client, wherein the reading request comprises an identifier of data to be read;

if a second target cell consistent with the identifier of the data to be read does not exist in the temporary storage database, finding a second target cell consistent with the identifier of the data to be read from the distributed database and returning the data of the second target cell to the client;

if a second target cell consistent with the identifier of the data to be read exists in the temporary storage database and the synchronization state of the second target cell is not set, merging the second target cell and a third target cell consistent with the identifier of the data to be read in the distributed database, and returning the merged data to the client;

if a second target cell consistent with the identifier of the data to be read exists in the temporary storage database and the synchronization state of the second target cell is set, covering data corresponding to a third target cell consistent with the identifier of the data to be read in the distributed database by using the data of the second target cell, and returning the covered data of the third target cell to the client.

8. A data processing apparatus, which is applied to a data node in a distributed database system, where a distributed database is deployed on the data node, the distributed database is an unstructured database without a large-scale transaction processing function, the data node is in communication connection with a client, the distributed database includes a plurality of data unit cells, and the apparatus includes:

the receiving module is used for receiving the data processing task sent by the client; the data processing task comprises at least one data processing request, each data processing request comprises an identifier, a processing parameter and a processing type of data to be processed, and the identifier of the data to be processed is a combination of a row key, a column family and a column identifier;

the processing module is used for processing a first target cell which is consistent with the identification of the data to be processed of each data processing request in a pre-established temporary storage database according to the processing parameters and the processing type of each data processing request, wherein the temporary storage database comprises cells which are migrated from the distributed database in advance and are related to the data processing tasks, the temporary storage database has a transaction processing function, the data processing tasks comprise at least one data processing request, and the cells processed by each data processing request are different;

and the write-back module is used for writing the cells in the temporary storage database back to the distributed database and returning the completion of the data processing task to the client when the first target cells of all the data processing requests in the data processing task are processed completely.

9. A data node, characterized in that the data node comprises:

one or more processors;

memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a data processing method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 7.