CN114896276A - Data storage method and device, electronic equipment and distributed storage system - Google Patents

Data storage method and device, electronic equipment and distributed storage system Download PDF

Info

Publication number
CN114896276A
CN114896276A CN202210307614.9A CN202210307614A CN114896276A CN 114896276 A CN114896276 A CN 114896276A CN 202210307614 A CN202210307614 A CN 202210307614A CN 114896276 A CN114896276 A CN 114896276A
Authority
CN
China
Prior art keywords
data
processed
stored
metadata information
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210307614.9A
Other languages
Chinese (zh)
Inventor
张战防
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN202210307614.9A priority Critical patent/CN114896276A/en
Publication of CN114896276A publication Critical patent/CN114896276A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of databases, in particular to a data storage method, a data storage device, electronic equipment and a distributed storage system, wherein the method is applied to a computing node of the distributed storage system and comprises the steps of acquiring data to be processed and metadata information corresponding to the data to be processed, wherein the metadata information comprises position information to be written in the data to be processed; writing the data to be processed into a memory and generating the data to be stored in the memory based on the data to be processed and the metadata information; and writing the data to be stored into the storage node so that the storage node stores the data to be processed based on the metadata information in the data to be stored. The calculation and the storage are separated, the data to be stored are generated in the calculation nodes by using the data to be processed and the metadata information, and the data to be stored are directly transmitted from the memory through the network, so that the subsequent storage nodes can directly store the data to be processed in the memory of the subsequent storage nodes, the IO operation is avoided, and the data storage efficiency is improved.

Description

Data storage method and device, electronic equipment and distributed storage system
Technical Field
The invention relates to the technical field of databases, in particular to a data storage method, a data storage device, electronic equipment and a distributed storage system.
Background
In order to solve the problems of fast elastic shrinkage of a distributed storage system, such as a Massive Parallel Processing (MPP) database, and maximum utilization of resources, a read-write separation technology is introduced in the field of the distributed storage system. Taking the MPP database as an example, the current MPP read-write separation mostly adopts a structure that hot data is stored in a local disk (Cache), and cold data or historical data is stored in a public storage server through a network.
Although the structure can solve the problem of rapid elastic shrinkage, the IO of the database is amplified due to multiple transmission and IO consumption of data, and finally the performance of writing data into the database is seriously influenced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data storage method, an apparatus, an electronic device, and a distributed storage system, so as to solve the problem of low efficiency of data storage.
According to a first aspect, an embodiment of the present invention provides a data storage method, which is applied to a computing node of a distributed storage system, and the method includes:
acquiring data to be processed and metadata information corresponding to the data to be processed, wherein the metadata information comprises position information to be written in by the data to be processed;
writing the data to be processed into a memory and generating data to be stored in the memory based on the data to be processed and the metadata information;
and writing the data to be stored into a storage node so that the storage node stores the data to be processed based on metadata information in the data to be stored.
According to the data storage method provided by the embodiment of the invention, calculation and storage are separated, the data to be stored is generated in the calculation node by using the data to be processed and the metadata information, and the data to be stored is directly written into the storage node.
With reference to the first aspect, in a first implementation manner of the first aspect, the generating, in the memory, data to be stored based on the data to be processed and the metadata information includes:
and splicing the data to be processed and the metadata information in the memory to generate the data to be stored.
According to the data storage method provided by the embodiment of the invention, the data to be stored is generated by directly splicing the data to be processed and the metadata information, and the method is simple and easy to implement.
With reference to the first aspect or the first embodiment of the first aspect, in a second embodiment of the first aspect, the method further comprises:
broadcasting the data to be stored so that a copy node of the computing node backs up the data to be processed based on metadata information in the data to be stored.
According to the data storage method provided by the embodiment of the invention, the data to be stored is directly broadcasted, and as the data to be processed is included in the data to be stored, the subsequent copy node can realize the backup of the data to be processed by directly utilizing the received data to be stored without carrying out IO operation with the computing node, so that the frequency of the IO operation is reduced, and the efficiency of data backup is improved.
According to a second aspect, an embodiment of the present invention further provides a data storage method, which is applied to a storage node of a distributed storage system, where the method includes:
receiving data to be stored sent by a computing node, wherein the data to be stored comprises data to be processed and metadata information of the data to be processed, and the metadata information comprises position information to be written in by the data to be processed;
analyzing the data to be stored, and determining the metadata information and the data to be processed;
and storing the data to be processed based on the metadata information.
According to the data storage method provided by the embodiment of the invention, as the data to be stored comprises the data to be processed, the data to be processed can be obtained by directly utilizing the received data to be stored for the storage node, IO communication with the computing node is not needed, and the data storage efficiency is improved.
With reference to the second aspect, in a first implementation manner of the second aspect, the storing the to-be-processed data based on the metadata information includes:
generating a data page corresponding to the data to be processed based on the metadata information and the data to be processed;
and storing the data page and the data to be processed.
According to the data storage method provided by the embodiment of the invention, the data page is generated in the computing node instead of the storage node, and then when the data page is used for data query, the data page can be directly processed in the storage node without accessing the computing node, so that the efficiency of data query is improved.
According to a third aspect, an embodiment of the present invention further provides a data storage apparatus, which is applied to a computing node of a distributed storage system, and the apparatus includes:
the device comprises a first acquisition module, a first storage module and a first processing module, wherein the first acquisition module is used for acquiring data to be processed and metadata information corresponding to the data to be processed, and the metadata information comprises position information to be written in by the data to be processed;
the first writing module is used for writing the data to be processed into a memory and generating data to be stored in the memory based on the data to be processed and the metadata information;
and the second writing module is used for writing the data to be stored into a storage node so that the storage node stores the data to be processed based on the metadata information in the data to be stored.
According to a fourth aspect, an embodiment of the present invention further provides a data storage apparatus for a distributed storage system, which is applied to a storage node, and the apparatus includes:
the data processing device comprises a receiving module, a storing module and a processing module, wherein the receiving module is used for receiving data to be stored sent by a computing node, the data to be stored comprises data to be processed and metadata information of the data to be processed, and the metadata information comprises position information which needs to be written in by the data to be processed;
the analysis module is used for analyzing the data to be stored and determining the metadata information and the data to be processed;
and the storage module is used for storing the data to be processed based on the metadata information.
According to a fifth aspect, an embodiment of the present invention provides an electronic device, including: the storage device comprises a memory and a processor, wherein the memory and the processor are communicatively connected with each other, the memory stores computer instructions, and the processor executes the computer instructions to execute the first aspect or any one of the implementation manners of the first aspect, or execute the data storage method described in any one of the implementation manners of the second aspect or the second aspect.
According to a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the first aspect or any one of the implementation manners of the first aspect, or execute the data storage method described in any one of the implementation manners of the second aspect or the second aspect.
According to a seventh aspect, an embodiment of the present invention further provides a distributed storage system, including:
a computing node, configured to perform the data storage method according to the first aspect or any one of the implementation manners of the first aspect;
and the storage node is connected with the computing node and used for executing the data storage method of the second aspect or any one implementation mode of the second aspect.
It should be noted that, for corresponding beneficial effects of the data storage device, the electronic device, the computer-readable storage medium, and the distributed storage system provided in the embodiments of the present invention, please refer to the description of the corresponding beneficial effects of the data storage method above, which is not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 shows a schematic diagram of a data store;
FIG. 2 is a schematic structural diagram of a distributed storage system according to an embodiment of the present invention;
FIG. 3 is a flow diagram of a data storage method according to an embodiment of the invention;
FIG. 4 is a block diagram of data to be stored according to an embodiment of the invention;
FIG. 5 is a flow chart of a data storage method according to an embodiment of the present invention;
FIG. 6 is a flow chart of a data storage method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a data store according to an embodiment of the present invention;
FIG. 8 is a block diagram of a data storage device according to an embodiment of the present invention;
FIG. 9 is a block diagram of a data storage device according to an embodiment of the present invention;
fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For distributed storage systems, the scalability of the database is optimized by separating the computing resources from the storage resources. In the mode, data is stored in a public storage area, and the local storage on the node is only used as a cache part of common data to accelerate the query, so that the addition and the deletion of the node are easily realized. Taking the MPP database as an example, when data is stored, the data is segmented firstly, and then enters a cache (Depot) of different nodes, the data is transferred among the different nodes so as to be transferred to the nodes subscribing to the segments, the segment data of each node is flushed into a common storage area, and then transaction storage is submitted. The data query method comprises the following steps that a Depot is located in a computing node and stores a frequently queried data copy and a part of a segmented directory subscribed by the node; and the common Storage area (Comminal Storage) is used for storing complete data in the database and sharing the data among the database nodes.
In the data storage process, data first passes through the delete of the computing node, then is stored in a local disk after being processed by the database, and then is transmitted to the shared storage. Multiple IO operations occur in this process, as shown in fig. 1:
(1) firstly, after the SQL command for data writing enters an Execution Engine (Execution Engine), some necessary checks are performed, and the writing of a disk is not involved in the checking process;
(2) after the check is finished, a Storage Engine (Storage Engine) starts to write data, in the process, the data write bypasses a local disk cache (Depot) to directly write the data into a remote shared Storage, and at the moment, a network IO operation and a shared Storage write IO operation exist;
(3) when data is written into the shared memory, Metadata information (Metadata) is also written into a local disk cache (cache) to prevent Metadata information loss caused by power failure and other abnormalities, wherein the Metadata mainly records transaction types and other related information, and a local write IO operation exists at the moment;
(4) in order to ensure that a copy node (duplicate node) can read the latest inserted data, Metadata information (Metadata) needs to be broadcasted, the copy node is notified to update the data in a disk cache (cache), and at this time, a network IO operation exists;
(5) after the copy node reads the metadata information, newly inserted data is obtained again in the public storage, and at this time, there is a read IO operation of a network IO and a shared storage.
For the above flow, when the data writing process (1), (2), (3) and (4) are performed synchronously, the writing task is really finished after the four steps are completely satisfied, so the total time delay of data storage is equal to the total time sum of the above four steps. Therefore, the data storage method has the problem of multiple IO processing, and the data storage efficiency is low.
Further, as shown in fig. 1, after receiving the metadata information, the replica node needs to go to a remote public storage for reading to acquire the latest data, which results in that the replica node acquires data slowly.
Based on this, in order to solve the problem of low data storage efficiency caused by the existence of multiple IO processing problems, embodiments of the present invention provide a data storage method, which is applied to a compute node, generates data to be stored by using metadata information and data to be processed in a memory of the compute node, writes the data to be stored in a storage node, and the storage node can directly store the data to be processed by using the data to be stored, thereby avoiding multiple IO operations. The specific process will be described in detail below.
Further, in order to solve the problem that the data acquisition by the replica node is slow, the data storage method provided by the embodiment of the invention is applied to the computing node, and the computing node broadcasts the data to be stored, because the data to be stored includes the data to be processed. After receiving the data to be stored, the subsequent copy node can directly back up the data to be processed without performing IO operation with the computing node, so that the speed of acquiring the data by the copy node is improved.
An embodiment of the present invention further provides a distributed storage system, as shown in fig. 2, the system includes a computing node 10 and a storage node 20, where a specific number of the computing nodes 10 and the storage nodes 20 included in the distributed storage system is not limited herein, and is specifically set according to an actual requirement. For the computing node 10 to execute the data storage method in the embodiment of the present invention, data to be stored is generated based on data to be processed and metadata information, and the data to be stored is written into the storage node 20; the storage node 20 is mainly used for storing the received data to be stored by executing the corresponding data storage method in the embodiment of the present invention.
For the computing node and the storage node, the corresponding data storage method will be described in detail below.
In accordance with an embodiment of the present invention, there is provided a data storage method embodiment, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
In this embodiment, a data storage method is provided, which may be used in a computing node of the distributed storage system, and fig. 3 is a flowchart of the data storage method according to the embodiment of the present invention, as shown in fig. 3, where the flowchart includes the following steps:
and S11, acquiring the data to be processed and the metadata information corresponding to the data to be processed.
Wherein the metadata information includes location information to which the data to be processed needs to be written.
For the computing node, the obtained to-be-processed data may be sent to the computing node by other devices, or may be obtained by the computing node through other computing processes, and so on. The acquisition mode of the data to be processed is not limited at all, and the data to be processed can be set according to actual requirements.
The metadata information includes location information to which the data to be processed needs to be written, for example, a transaction type indicating the data to be processed, storage information, and the like.
And S12, writing the data to be processed into the memory and generating the data to be stored in the memory based on the data to be processed and the metadata information.
After acquiring the data to be processed, the compute node writes the data to be processed into the memory, for example, the compute node writes the data to be processed into the cache pool by using the storage engine. And meanwhile, in the calculation stage, the data to be stored is generated in the memory by using the data to be processed and the metadata information. The data to be stored may be formed by splicing the data to be processed and the metadata information, or may be formed by fusing the data to be processed and the metadata information, and the like.
For example, as shown in fig. 4, the data to be stored includes metadata information and data to be processed, where the metadata information includes Type, Space ID, and Page Num. Specifically, Type: mainly the type of the recording transaction; space ID: ID information indicating a table space; page Num: offset information indicating a data page.
Since the previous metadata information does not include data information, in this embodiment, the data to be processed is combined with the metadata information to form data to be stored, which may also be referred to as a database rewrite log (Redo log). It should be noted that the database rewrite log is a log created anew, and is not a redo log contained in the database itself, and the database rewrite log contains data to be transmitted.
And S13, writing the data to be stored into the storage node, so that the storage node stores the data to be processed based on the metadata information in the data to be stored.
The computing node writes the obtained data to be stored into the storage node, and the data to be stored comprises the data to be processed and the metadata information, so that the subsequent storage node can analyze the data to be processed from the data to be stored and store the data to be processed. For example, the storage node generates a data page of the data to be processed based on the metadata information, and then stores the data page and the data to be processed.
In the data storage process described in fig. 1, storage is separated from computation, and generation of database metadata and data pages is completed at the compute node. Based on the fact that disk IO and network IO are caused by generation and transmission of the data page, in the embodiment of the invention, the data to be processed is directly sent to the storage node by using the generated data to be stored, and the process of generating the data page by using the data to be stored is put into the storage node, so that IO consumption is reduced.
According to the data storage method provided by the embodiment, calculation and storage are separated, the data to be stored is generated in the calculation node by using the data to be processed and the metadata information, and the data to be stored is directly written into the storage node.
In this embodiment, a data storage method is provided, which may be used in a computing node of the above-mentioned distributed storage system, and fig. 5 is a flowchart of the data storage method according to the embodiment of the present invention, as shown in fig. 5, the flowchart includes the following steps:
and S21, acquiring the data to be processed and the metadata information corresponding to the data to be processed.
Wherein the metadata information includes location information to which the data to be processed needs to be written.
Please refer to S11 in fig. 3 for details, which are not described herein.
And S22, writing the data to be processed into the memory and generating the data to be stored in the memory based on the data to be processed and the metadata information.
And the computing node splices the data to be processed and the metadata information in the memory to generate the data to be stored. Namely, after the data to be processed is spliced to the metadata information, the data to be stored is formed. The data to be stored is generated by directly splicing the data to be processed and the metadata information, and the method is simple and easy to implement.
And S23, writing the data to be stored into the storage node, so that the storage node stores the data to be processed based on the metadata information in the data to be stored.
Please refer to S13 in fig. 3 for details, which are not described herein.
And S24, broadcasting the data to be stored so that the copy node of the computing node backs up the data to be processed based on the metadata information in the data to be stored.
After the data to be stored is written into the storage node by the computing node, the data to be stored is broadcasted, so that the backup of the data to be processed in the data to be stored is realized. The backup of the data to be processed is realized by using a copy node of the computing node, the copy node analyzes the data to be stored after receiving the data to be stored to obtain metadata information, and the data to be processed is backed up by using the metadata information.
The definition of the replica node is set in advance when the system is configured, and is specifically set according to actual requirements.
The data storage method provided by the embodiment directly broadcasts the data to be stored, and since the data to be processed is included in the data to be stored, the subsequent copy node directly utilizes the received data to be stored to realize the backup of the data to be processed, and does not need to perform IO operation with the computing node, thereby reducing the number of IO operations and improving the efficiency of data backup.
In this embodiment, a data storage method is provided, which may be used in a storage node of the above-mentioned distributed storage system, and fig. 6 is a flowchart of the data storage method according to the embodiment of the present invention, as shown in fig. 6, the flowchart includes the following steps:
and S31, receiving the data to be stored sent by the computing node.
The data to be stored comprises data to be processed and metadata information of the data to be processed, wherein the metadata information comprises position information which needs to be written in the data to be processed.
For the formation of the data to be stored, please refer to the above description, and further description is omitted here.
For the storage node, the storage node receives the data to be stored generated by the computing node, and stores the data to be stored.
And S32, analyzing the data to be stored, and determining the metadata information and the data to be processed.
As described above, the data to be stored includes the metadata information and the data to be processed, and the storage node can obtain the metadata information and the data to be processed by parsing the data to be stored. For example, in the data to be stored, the metadata information and the data to be processed are divided by different fields, so as to distinguish the metadata information and the data to be processed; alternatively, the data to be stored may be distinguished from each other by different identifiers, and so on.
And S33, storing the data to be processed based on the metadata information.
After the storage node obtains the metadata information and the data to be processed through analysis, the storage node can store the data to be processed by utilizing the metadata information because the metadata information represents the attribute of the data to be processed and the place where the data to be processed needs to be written.
According to the data storage method provided by the embodiment, the data to be stored comprises the data to be processed, and for the storage node, the received data to be stored can be directly utilized to obtain the data to be processed, so that IO communication with the computing node is not needed, and the data storage efficiency is improved.
In some alternative embodiments, the S33 includes:
(1) and generating a data page corresponding to the data to be processed based on the metadata information and the data to be processed.
(2) And storing the data page and the data to be processed.
As described above, the computing node is used for generating data to be stored and sending the data to be stored to the storage node; the storage node generates a data page by using the metadata information and the data to be processed, the data page is used for facilitating subsequent data query, and for the data to be processed, a corresponding data block needs to be finally formed in the storage node for storage.
Because the data page is generated in the computing node but not in the storage node, the data page can be directly processed in the storage node when being subsequently used for data query, and the computing node does not need to be accessed, so that the efficiency of data query is improved.
As a specific application example of the embodiment, as shown in fig. 7, the method is applied to the distributed storage system shown in fig. 2, and includes:
(1) firstly, after an SQL command written by data to be processed enters an Execution Engine (Execution Engine), some necessary checks are carried out, the writing of a disk is not involved in the checking process, and the part is kept unchanged;
(2) after the check is finished, a Storage Engine (Storage Engine) starts to write data, and in the process, the data is only written into a memory, so that the follow-up query of upper-layer services is facilitated, and one-time network IO and remote shared Storage write IO operation is avoided;
(3) when data is written into the memory, the data to be stored, which is generated by using the data to be processed and the metadata information, can be directly written into the storage node, so that local write operation is avoided, and one-time network IO and shared storage IO operation exist;
(4) in order to ensure that the copy node (duplicate node) can read the newly inserted data, the data to be stored needs to be broadcasted, and the copy node is notified to update the data in the delete (disk cache). At this time, a network IO operation exists;
(5) after the copy node reads the data to be stored, the data to be processed is directly reapplied to the cache according to the metadata information, and IO operation does not exist at the moment.
For the above flow, when the data writing process is performed synchronously in (1), (2) and (3), the writing task is really finished only after the three steps are completely satisfied, so the total time delay of data storage is equal to the total time sum of the above three steps. Compared with the flow shown in fig. 1, it can be seen that the flow can reduce IO operations by 3 times, and avoid the transmission of data pages, thereby reducing the delay of data storage.
For the replica node, after receiving the data to be stored, the replica node can perform an update operation on the data, whereas the operation shown in fig. 1 requires the replica node to rewrite to acquire the data from the shared storage.
According to the data storage method provided by the embodiment of the invention, the data to be stored is constructed and pushed down to the storage node, and the data page is constructed by using the storage node, so that the problem of database write amplification is solved, and the aim of improving the database data insertion efficiency is fulfilled. Furthermore, the data to be stored constructed by rewriting contains the data to be processed, and the replica node only needs to reapply the data, so that the process of secondary query to the storage node is avoided, and the data synchronization efficiency is improved.
In this embodiment, a data storage device is further provided, and the data storage device is used to implement the foregoing embodiments and preferred embodiments, and the description of the data storage device is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.
The present embodiment provides a data storage apparatus, which is applied to a computing node of a distributed storage system, as shown in fig. 8, and includes:
a first obtaining module 41, configured to obtain data to be processed and metadata information corresponding to the data to be processed, where the metadata information includes location information to be written in the data to be processed;
a first writing module 42, configured to write the to-be-processed data into a memory and generate, in the memory, to-be-stored data based on the to-be-processed data and the metadata information;
a second writing module 43, configured to write the data to be stored into a storage node, so that the storage node stores the data to be processed based on metadata information in the data to be stored.
In some alternative embodiments, the first writing module 42 includes:
and the splicing unit is used for splicing the data to be processed and the metadata information in the memory to generate the data to be stored.
In some optional embodiments, the data storage device further comprises:
and the broadcasting module is used for broadcasting the data to be stored so that the copy node of the computing node backs up the data to be processed based on the metadata information in the data to be stored.
The present embodiment provides a data storage apparatus, which is applied to a storage node of a distributed storage system, as shown in fig. 9, and includes:
a receiving module 51, configured to receive data to be stored sent by a computing node, where the data to be stored includes data to be processed and metadata information of the data to be processed, and the metadata information includes location information to be written in by the data to be processed;
the analyzing module 52 is configured to analyze the data to be stored, and determine the metadata information and the data to be processed;
and the storage module 53 is configured to store the to-be-processed data based on the metadata information.
In some alternative embodiments, the storage module 53 includes:
the generating unit is used for generating a data page corresponding to the data to be processed based on the metadata information and the data to be processed;
and the storage unit is used for storing the data page and the data to be processed.
The data storage devices in this embodiment are in the form of functional units, where a unit refers to an ASIC circuit, a processor and memory that execute one or more software or fixed programs, and/or other devices that may provide the above-described functionality.
Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.
An embodiment of the present invention further provides an electronic device, which has the data storage device shown in fig. 8 or fig. 9.
Referring to fig. 10, fig. 10 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 10, the electronic device may include: at least one processor 601, such as a CPU (Central Processing Unit), at least one communication interface 603, memory 604, and at least one communication bus 602. Wherein a communication bus 602 is used to enable the connection communication between these components. The communication interface 603 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 603 may also include a standard wired interface and a standard wireless interface. The Memory 604 may be a high-speed RAM (Random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 604 may optionally be at least one storage device located remotely from the processor 601. Wherein the processor 601 may be in connection with the apparatus described in fig. 8 or fig. 9, the memory 604 stores an application program, and the processor 601 calls the program code stored in the memory 604 for performing any of the above-mentioned method steps.
The communication bus 602 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 602 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.
The memory 604 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (such as a flash memory), a hard disk (HDD) or a solid-state drive (SSD); the memory 604 may also comprise a combination of the above types of memory.
The processor 601 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of a CPU and an NP.
The processor 601 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
Optionally, the memory 604 is also used for storing program instructions. Processor 601 may invoke program instructions to implement a data storage method as shown in any of the embodiments of the present application.
An embodiment of the present invention further provides a non-transitory computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute the data storage method in any of the above method embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A data storage method is applied to a computing node in a distributed storage system, and the method comprises the following steps:
acquiring data to be processed and metadata information corresponding to the data to be processed, wherein the metadata information comprises position information to be written in by the data to be processed;
writing the data to be processed into a memory and generating data to be stored in the memory based on the data to be processed and the metadata information;
and writing the data to be stored into a storage node so that the storage node stores the data to be processed based on metadata information in the data to be stored.
2. The method according to claim 1, wherein the generating data to be stored in the memory based on the data to be processed and the metadata information comprises:
and splicing the data to be processed and the metadata information in the memory to generate the data to be stored.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
broadcasting the data to be stored so that a copy node of the computing node backs up the data to be processed based on metadata information in the data to be stored.
4. A data storage method is applied to a storage node in a distributed storage system, and the method comprises the following steps:
receiving data to be stored sent by a computing node, wherein the data to be stored comprises data to be processed and metadata information of the data to be processed, and the metadata information comprises position information to be written in by the data to be processed;
analyzing the data to be stored, and determining the metadata information and the data to be processed;
and storing the data to be processed based on the metadata information.
5. The method of claim 4, wherein storing the to-be-processed data based on the metadata information comprises:
generating a data page corresponding to the data to be processed based on the metadata information and the data to be processed;
and storing the data page and the data to be processed.
6. A data storage apparatus, for use in a compute node in a distributed storage system, the apparatus comprising:
the device comprises a first acquisition module, a second acquisition module and a processing module, wherein the first acquisition module is used for acquiring data to be processed and metadata information corresponding to the data to be processed, and the metadata information comprises position information which needs to be written in by the data to be processed;
the first writing module is used for writing the data to be processed into a memory and generating data to be stored in the memory based on the data to be processed and the metadata information;
and the second writing module is used for writing the data to be stored into a storage node so that the storage node stores the data to be processed based on the metadata information in the data to be stored.
7. A data storage apparatus, applied to a storage node in a distributed storage system, the apparatus comprising:
the data processing device comprises a receiving module, a storing module and a processing module, wherein the receiving module is used for receiving data to be stored sent by a computing node, the data to be stored comprises data to be processed and metadata information of the data to be processed, and the metadata information comprises position information which needs to be written in by the data to be processed;
the analysis module is used for analyzing the data to be stored and determining the metadata information and the data to be processed;
and the storage module is used for storing the data to be processed based on the metadata information.
8. An electronic device, comprising:
a memory and a processor, the memory and the processor being communicatively coupled to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the data storage method of any one of claims 1-3, or of claim 4 or 5.
9. A computer-readable storage medium storing computer instructions for causing a computer to perform the data storage method of any one of claims 1 to 3, or of claim 4 or 5.
10. A distributed storage system, comprising:
a compute node for performing the data storage method of any of claims 1-3;
a storage node connected to the computing node for performing the data storage method of claim 4 or 5.
CN202210307614.9A 2022-03-25 2022-03-25 Data storage method and device, electronic equipment and distributed storage system Pending CN114896276A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210307614.9A CN114896276A (en) 2022-03-25 2022-03-25 Data storage method and device, electronic equipment and distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210307614.9A CN114896276A (en) 2022-03-25 2022-03-25 Data storage method and device, electronic equipment and distributed storage system

Publications (1)

Publication Number Publication Date
CN114896276A true CN114896276A (en) 2022-08-12

Family

ID=82715760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210307614.9A Pending CN114896276A (en) 2022-03-25 2022-03-25 Data storage method and device, electronic equipment and distributed storage system

Country Status (1)

Country Link
CN (1) CN114896276A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098045A (en) * 2022-08-23 2022-09-23 成都止观互娱科技有限公司 Data storage system and network data reading and writing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098045A (en) * 2022-08-23 2022-09-23 成都止观互娱科技有限公司 Data storage system and network data reading and writing method
CN115098045B (en) * 2022-08-23 2022-11-25 成都止观互娱科技有限公司 Data storage system and network data reading and writing method

Similar Documents

Publication Publication Date Title
CN110442560B (en) Log replay method, device, server and storage medium
US8751441B2 (en) System, method, and computer program product for determining SQL replication process
CN109542682B (en) Data backup method, device, equipment and storage medium
CN111651519B (en) Data synchronization method, data synchronization device, electronic equipment and storage medium
CN111324665A (en) Log playback method and device
CN111414362A (en) Data reading method, device, equipment and storage medium
CN113760846A (en) Data processing method and device
CN110019063B (en) Method for computing node data disaster recovery playback, terminal device and storage medium
CN112948409A (en) Data processing method and device, electronic equipment and storage medium
CN113806301A (en) Data synchronization method, device, server and storage medium
CN115114370B (en) Master-slave database synchronization method and device, electronic equipment and storage medium
CN115858488A (en) Parallel migration method and device based on data governance and readable medium
CN114896276A (en) Data storage method and device, electronic equipment and distributed storage system
CN109388651B (en) Data processing method and device
CN111753141B (en) Data management method and related equipment
CN109542860B (en) Service data management method based on HDFS and terminal equipment
US7949632B2 (en) Database-rearranging program, database-rearranging method, and database-rearranging apparatus
CN114064725A (en) Data processing method, device, equipment and storage medium
CN114116723A (en) Snapshot processing method and device and electronic equipment
CN109740027B (en) Data exchange method, device, server and storage medium
CN115114258A (en) Data copying method and device, electronic equipment and computer storage medium
CN113536047A (en) Graph database data deleting method, system, electronic equipment and storage medium
CN112527841A (en) Stream data merging processing method and device
CN112559457A (en) Data access method and device
CN117422556B (en) Derivative transaction system, device and computer medium based on replication state machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination