CN113950145B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN113950145B
CN113950145B CN202111557964.2A CN202111557964A CN113950145B CN 113950145 B CN113950145 B CN 113950145B CN 202111557964 A CN202111557964 A CN 202111557964A CN 113950145 B CN113950145 B CN 113950145B
Authority
CN
China
Prior art keywords
container
data
file
data file
data partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111557964.2A
Other languages
Chinese (zh)
Other versions
CN113950145A (en
Inventor
黄华
宋杰
江进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202111557964.2A priority Critical patent/CN113950145B/en
Publication of CN113950145A publication Critical patent/CN113950145A/en
Application granted granted Critical
Publication of CN113950145B publication Critical patent/CN113950145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W64/00Locating users or terminals or network equipment for network management purposes, e.g. mobility management
    • H04W64/006Locating users or terminals or network equipment for network management purposes, e.g. mobility management with additional information processing, e.g. for direction or speed determination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a data processing method and device. The method is applied to a server, the server comprises a first container and a second container, the first container and the second container are respectively associated with a first data partition and a second data partition of the server, the first container is a query service container, and the second container is a merged computing container, and the method comprises the following steps: the second container reads a first data file stored in the first data partition; the second container merges the first data file and the incremental data file of the first data file to obtain a second data file; the second container stores the second data file to the second data partition; the first container switches the data partition associated with the first container from the first data partition to the second data partition.

Description

Data processing method and device
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a data processing method and apparatus.
Background
In order to avoid mutual interference among multiple processes, multiple containers can be created on the server, and each container is responsible for query processing of data files of different data partitions. Taking the example where the server includes the first container, the first container may be associated with a first data partition, where the first data partition stores the first data file. The first container may read data corresponding to the query request from the first data partition based on the query request of the user.
In addition, the first container may also update the first data file. Specifically, the first container may merge the incremental data file of the first data file with the first data file to obtain the second data file. Further, the first container may store the second data file to the first data partition. Thereafter, the first container may be queried for data based on the second data file.
However, the merging process of the first data file needs to occupy a large amount of computing resources and input/output (IO) resources, so that the computing resources and storage IO resources for the query service are reduced, and the data query service quality is improved. In addition, since the first data partition needs to store the first data file and the second data file at the same time, the utilization rate of the storage space resource is lower than 50%.
Disclosure of Invention
The embodiment of the disclosure provides a data processing method and device, which are beneficial to improving the data query service quality and the utilization rate of storage space resources.
In a first aspect, a data processing method is provided, where the method is applied to a server, where the server includes a first container and a second container, where the first container and the second container are respectively associated with a first data partition and a second data partition of the server, the first container is a query service container, and the second container is a merged computation container, where the method includes: the second container reads a first data file stored in the first data partition; the second container merges the first data file and the incremental data file of the first data file to obtain a second data file; the second container stores the second data file to the second data partition; the first container switches the data partition associated with the first container from the first data partition to the second data partition.
In a second aspect, a data processing apparatus is provided, where the apparatus is applied to a server, where the server includes a first container and a second container, where the first container and the second container are respectively associated with a first data partition and a second data partition of the server, the first container is a query service container, and the second container is a merged computing container, and the apparatus includes: a first reading unit configured to read a first data file stored in the first data partition using the second container; the first merging unit is used for merging the first data file and the incremental data file of the first data file by using the second container to obtain a second data file; a first storage unit configured to store the second data file to the second data partition using the second container; a first switching unit, configured to switch the data partition associated with the first container from the first data partition to the second data partition by using the first container.
In a third aspect, there is provided a data processing apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the computer program.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon executable code which, when executed, is capable of implementing the method of the first aspect.
In a fifth aspect, there is provided a computer program product comprising executable code which, when executed, is capable of implementing the method of the first aspect.
The server of the embodiment of the present disclosure may include a first container (i.e., a query service container) and a second container (i.e., a merge calculation container), where the first container is only responsible for processing a user query request, and the second container is only responsible for performing a merge calculation on a file, and by separating the query service container and the merge calculation container, a problem of a calculation resource conflict may be avoided.
Secondly, the data partition corresponding to one container is not fixed, but all the data partitions on the server can be shared among different containers, so that each container can switch the association relation with the data partition. The second container may store the merged second data file to the second data partition after completion of the merging of the files of the first data partition. The query service container may then switch the associated data partition from the first data partition to the second data partition so that a data query may be made based on the second data file. Because the data partition stored by the first data file is different from the data partition stored by the second data file, the influence of the writing-in of the second data file on the data reading in the query process can be avoided, and the problem of conflict of the reading IO in the query process and the writing IO in the merging process can be avoided.
In addition, since the first data partition does not need to store the first data file and the second data file at the same time, the first data partition does not need to reserve a storage space for writing the second data file, and the size of data in the first data partition may be equal to the storage capacity of the first data partition. Therefore, the scheme of the embodiment of the disclosure is beneficial to improving the utilization rate of the storage space resources.
Drawings
Fig. 1 is a schematic diagram of a merging process of data files provided by an embodiment of the present disclosure.
Fig. 2 is a schematic structural diagram of a server including multiple containers according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram of a method for querying data according to an embodiment of the present disclosure.
Fig. 4 is a schematic diagram of a method for separating a merged computing container from a query service container according to an embodiment of the present disclosure.
Fig. 5 is a schematic flow chart of a data processing method provided by the embodiment of the disclosure.
Fig. 6-9 are schematic diagrams of a data file updating method provided by an embodiment of the present disclosure.
Fig. 10 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure.
Fig. 11 is a schematic structural diagram of a data processing apparatus according to another embodiment of the present disclosure.
Detailed Description
Technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings of the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments.
The server in the disclosed embodiments may be used to store data files for user query. For example, the server may be used to store one or more files. When a user requests a data query, the server may retrieve data corresponding to the query request from the one or more files. The server may also be referred to as a storage server or a database server.
The servers of the embodiments of the present disclosure may also be referred to as physical machines, physical servers, hosts, and the like.
The data may be stored in a relational data storage manner or a non-relational data storage manner. A relational database may be called a relational database, and generally employs a two-dimensional table structure, and data is stored in rows and columns. A database of a non-relational data storage system may be referred to as a non-relational database (Not Only SQL, NoSQL). Data storage in NoSQL does not require a fixed table structure, nor is there a join operation in general. NoSQL does not use a traditional relational database model, but stores data models using ways such as key-value storage, document-type, column storage, graph database, xml, and the like. Among them, the most used is the key-value storage.
The Log-Structured Merge (LSM) tree is often applied to The design of key-value storage systems. Therefore, the LSM tree is also very common in the NoSQL system, and has become a necessary solution basically. At present, LSM trees are used in a plurality of products, and the LSM tree structure is used in a bottom layer key-value data engine of a GeaBase graph database. Products that directly or indirectly apply LSM trees may also include, for example, LevelDB, RocksDB, MongoDB, TiDB, HBase, and the like.
The server of the disclosed embodiments may include storage space resources and processor computing resources (computing resources for short). The storage space resources may include, for example, memory, hard disks or disks, etc. Taking a disk as an example, one or more files described above may be stored on the disk. The computing resources may include, for example, a Central Processing Unit (CPU) chip or processing chip.
Taking the key-value storage as an example, the user may send a query request to the server, where the query request is used to query the data corresponding to the target key. After receiving the query request, the server may analyze the target key by using the computing resource, for example, perform semantic analysis on the target key, and/or determine a file location where the target key is located. The server can read the data corresponding to the target key from the disk and return the data to the user.
In order to ensure timeliness of the stored data, the server can also update the stored data. There are various ways of updating data. For example, the server may update the data in real time. For another example, the server may periodically update the data. The server may update the data stored on the server at regular intervals. The time period of the periodic update may be several hours or several days, etc., and the embodiment of the present disclosure is not particularly limited thereto. A regularly updated server may also be referred to as a batch system server.
For example, before starting the query service, the server may generate a data file for the user to query. Then, new data (or delta data file or delta data) is reflowed at regular time intervals. For example, the server may download the incremental data file from a network. The server may merge the incremental data file with the existing data file to generate a new data file. Thereafter, all query requests are queried based on the new data file. That is, after the data is updated, the user can query the latest data set.
As shown in fig. 1, an incremental data file (DeltaDataFile) is a file that is streamed back in a certain period, and a server may download the incremental data file to a local device (such as a disk or a memory) through a network. Further, the server may read the incremental data file and the existing data file (BaseDataFile) from the disk into the memory, and merge the incremental data file and the existing data file to obtain a new data file (NewDataFile). The server may store the new data file to disk. When the server processes the query request of the user, the new data file can be used as the query file.
When multiple processes are running on a server, there can be interference between the multiple processes. For example, the plurality of processes include a first process and a second process, and since the first process and the second process share resources, such as computing resources, storage space resources, and memory resources, on the server, when the first process occupies a large amount of computing resources on the server due to computing needs, the computing resources that can be used for the second process are greatly reduced, thereby affecting the operation of the second process.
In order to avoid mutual interference among multiple processes, a container technology is developed, which can isolate computing resources and storage space resources on a server. That is, the plurality of containers may divide the resources of the server into a plurality of mutually isolated computing resources and mutually isolated storage space resources. A plurality of containers may be created on the server, each container corresponding to a portion of the storage space resources and a portion of the computing resources. The process on each container can only run on the resource corresponding to the container, and the process does not occupy the resources corresponding to other containers.
The server in the embodiments of the present disclosure may also use container technology, that is, the server may include a plurality of containers. The multiple containers may divide the storage space resources of the server into multiple data partitions (or multiple storage space resources), and each container may be responsible for file queries of a different data partition. It will be appreciated that there is a mapping relationship between the data partitions and the disks in the server, with each data partition mapping to a separate disk space on the server. The processes on multiple containers may be performed in parallel. For example, multiple containers may be queried for data simultaneously, which may improve query efficiency.
A server comprising a plurality of containers is described below in connection with fig. 2.
The server shown in fig. 2 includes 6 containers, such as containers 0 to 5, where the 6 containers can divide resources on the server into 6 shares, and one container corresponds to one resource. Each container only runs on the corresponding resource, and the resources corresponding to other containers are not occupied. For example, container 0 can only store data files on disk 0, and does not occupy storage space resources on other disks.
The size of the resource corresponding to each container may be equal or different, and this is not specifically limited in this disclosure. For example, 6 containers may divide the resources on a server into 6 resource-peers, each container accounting for one-sixth of the computing resources on the server and one-sixth of the storage space resources. For another example, 6 containers may apply for corresponding resources from the server according to actual needs, and the resources corresponding to each container may be different.
Taking the size of the resource corresponding to each container as an example, if the server includes 12 disks, the storage space resource corresponding to each container includes 2 disks; if the server includes 60 processor cores, the computing resources corresponding to each container may include 10 processor cores.
Among the plurality of containers shown in fig. 2, container 0 is associated with data partition 0, and data partition 0 corresponds to disk 0 on the server. Container 1 is associated with data partition 1, data partition 1 corresponding to disk 1 on the server, and so on. The container 5 is associated with a data partition 5, the data partition 5 corresponding to a disk 5 on a server. Each container is responsible for the processing of data corresponding to a respective data partition. That is, each container is responsible for the query service of only a portion of the data in the full database.
Each data partition contains a corresponding data file. The data partition 0 corresponds to the file0, the file0 is stored on the disk 0, and the container 0 is responsible for data query of the file 0. Similarly, the data partition 1 corresponds to a file 1, the file 1 is stored on the disk 1, and the container 1 is responsible for performing data query on the file 1. The data partition 2 corresponds to a file 2, the file 2 is stored on the disk 2, and the container 2 is responsible for data query of the file 2. The data partition 3 corresponds to a file 3, the file 3 is stored on the disk 3, and the container 3 is responsible for data query of the file 3. The data partition 4 corresponds to a file 4, the file 4 is stored on the disk 4, and the container 4 is responsible for data query of the file 4. The data partition 5 corresponds to a file 5, the file 5 is stored on the disk 5, and the container 5 is responsible for data query of the file 5.
After receiving the query request of the corresponding partition, any container bearing the data partition can read corresponding data from the data file of the corresponding partition according to the query request.
The following describes the query process of data by taking container 0 as an example in conjunction with fig. 3. Among them, the container 0 is responsible for query processing of files stored in the data partition 0. After receiving the query request from the user, the container 0 may analyze the query request and read data corresponding to the query request from the first data partition.
Further, container 0 may also update files in data partition 0, similar to the file update process shown in FIG. 1. Container 0 may download delta data file0 (DeltaDataFile 0) from the new data source to local disk 0. The container 0 may read an existing data file (BaseDataFile 0) from the disk 0, for example, read the BaseDataFile0 from the disk 0 into the memory, and merge the DeltaDataFile0 and the BaseDataFile0 to obtain a new data file (NewDataFile 0) after merging.
After the data merge is complete, container 0 writes NewDataFile0 to disk 0, after which all data queries are based on NewDataFile 0.
To ensure that container 0 can also handle user's query requests during data merge, the BaseDataFile0 is not deleted until the NewDataFile0 successfully writes to disk 0. That is, prior to data consolidation, if there is a user making a data query, container 0 may still provide query services to the user based on BaseDataFile 0. After NewDataFile0 writes to disk 0, the data file corresponding to container 0 switches from BaseDataFile0 to NewDataFile0 at some point. Thereafter, container 0 may provide query services to the user based on NewDataFile 0.
In addition, after the NewDataFile0 writes to disk 0, container 0 may also delete the previous BaseDataFile0 and DeltaDataFile0 to free up storage space of disk 0.
However, when the query process and the merge calculation process are performed simultaneously, there are problems of calculation resource conflict, storage IO resource conflict, and low utilization rate of storage space resources. These three cases are described separately below.
1. Computing resource conflicts
In the process of data merging, a large amount of computing resources are needed to process the merging of the files. Since the total computing resources in a single container are fixed, when the merging process occupies more computing resources, the computing resources responsible for the query request of the user are reduced, so that the processing capability of the query request is reduced, and the user experience is influenced.
2. Storage IO resource conflicts
Taking case 0 as an example, the merge process requires reading BaseDataFile0 from disk 0 and rewriting NewDataFile 0. In this process, reading and writing of data needs to involve a large number of IO operations. In addition, in the query process, data corresponding to the user query request needs to be read from the disk 0. Preemption of the disk IO resources by the merge process, particularly the write operation of NewDataFile0, results in a steep delay in the read operation of the query process. Therefore, if the container 0 is performing a write operation of NewDataFile0 in the background while processing a query request of a user, IO latency for reading data from the container 0 may be greatly increased, so that latency of the query request of the user is increased, and user experience is affected. It should be noted that the storage IO resource conflict described below mainly refers to a conflict between a read IO in a query process and a write IO in a merge process.
3. Low utilization rate of storage space resources
Since the old file (e.g., BaseDataFile 0) cannot be deleted until the new and old data file switch is completed, such as from BaseDataFile0 to NewDataFile 0. Thus, old files (e.g., BaseDataFile 0) and new files (e.g., NewDataFile 0) need to be saved on disk 0 at the same time, in which case the size of the data on disk 0 cannot exceed 50% of the disk capacity. Thus, in the above scheme, the effective utilization of the disk is less than 50%. Similarly, the utilization rate of the disks corresponding to other containers is lower than 50%, which results in the utilization rate of the entire storage space resource of the server being lower than 50%.
In the related art, a scheme can solve the problem of computing resource conflict, but the problems of storage IO resource conflict and low disk utilization rate still exist. This scheme is described below in conjunction with fig. 4 from the perspective of merging computation containers and query service containers, respectively.
To avoid the problem of computing resource conflicts, the scheme shown in FIG. 4 moves the merged computing task out to be performed in a separate server B. The query system shown in fig. 4 includes a server a and a server B. The server A and the server B both comprise a container, the container in the server B is only responsible for the merging calculation operation of the data files, and the container in the server A is only responsible for the processing of the query request. For convenience of description, the container in server a will be referred to as a query service container, and the container in server B will be referred to as a merged computation container. And inquiring the storage space resource corresponding to the service container as a disk 0 in the server A, and combining the storage space resource corresponding to the calculation container as a disk 1 in the server B.
The merge computation container may download the incremental data file0 of file0 from the new data source and merge the incremental data file0 with the existing data file0 to generate the new data file 0. The merge computation container may store new data file0 to disk 1. After the new data file is stored to disk 1, the merge computation container may notify the query service container to download new data file0 from disk 1, or the merge computation container may send the new data file directly to the query service container.
After the query service container completes downloading the new data file, the merge computation container may delete the old data file 0. The merge computing container will typically retain the newly generated data file as the BaseDataFile for the next data update. Of course, the merged computing container may not retain the newly generated data file, and the BaseDataFile may be obtained from the query service container each time a data update is performed.
The query service container may download a new data file0 from the merged computing container upon receiving a notification sent by the merged computing container. When the download of the new data file is completed, the query service container may switch the query file from the existing data file0 to the new data file 0. Thereafter, all query requests are based on new data file 0. After the switch of the data file is completed, the query service container may delete the existing data file 0.
Although the scheme adopts a mode of separating the merging operation and the query operation, the preemption of the computing resources in the merging process can be reduced. However, the new data file generated by the final merging still needs to be written back to the query service container. In the process of data writing, the preemption of the storage IO resources is also caused, so that the query IO delay is increased.
In addition, since the query service container still needs to store both existing data files and new data files, the effective usage of the disk is still less than 50%.
In addition, in the scheme shown in fig. 4, since the new data file needs to be pulled from the merged computing container to the query service container, a large amount of network resources need to be occupied in the pulling process, which may conflict with the network resources of the user sending the query request, thereby affecting the user experience.
From the above description, it can be seen that although the above scheme can avoid the conflict of the computing resources, it still has the problems of storage IO resource conflict and low utilization rate of the storage space resources. Therefore, how to solve the problems of computation resource conflict, storage IO resource conflict and low utilization rate of storage space resources at the same time is a problem that needs to be solved urgently at present.
Based on this, the embodiments of the present disclosure provide a data processing method, which can solve the problems of computation resource conflict, storage IO resource conflict, and low utilization rate of storage space resources at the same time.
The server of the disclosed embodiments may include an inquiry service container and a merged computing container, where different containers correspond to different data partitions on the server. In addition, the data partition corresponding to one container is not fixed, but all the data partitions on the server can be shared among different containers, so that each container can switch the association relationship with the data partition. Here, the sharing is that the query service container can access the data file on any disk. For example, the data file corresponding to the container 0 may be stored on the disk 0, or may be stored on any other disk, but the data file is stored on only one disk, not on multiple disks.
The merging calculation container is responsible for merging the data files, and after merging is completed, the merging calculation container can store the merged new data files in the data partitions associated with the merging calculation container. The query service container may then switch the associated data partition to a data partition storing the new data file.
The following description will be given taking an example in which the server includes a first container and a second container. The first container is a query service container and the second container is a merged computing container. The first container is only responsible for processing the user query request, the second container is only responsible for carrying out the merging calculation on the files, and the problem of calculation resource conflict can be avoided by separating the query service container from the merging calculation container.
Second, the first container is associated with a first data partition of the server, the second container is associated with a second data partition of the server, and the second container may store the merged file to the second data partition after the merging of the file of the first data partition is completed. The query service container may then switch the associated data partition from the first data partition to the second data partition, so that a data query may be made based on the merged file. Because the data partition stored by the merged file is different from the data partition where the old file is located, the data partition corresponding to the query service container only has a read request and does not have a write request, so that the influence of the writing-in of a new file on the data reading in the query process can be avoided, and the problem of storage IO resource conflict can be avoided.
In addition, since the first data partition does not need to store the new data file and the old data file at the same time, the first data partition does not need to reserve a storage space for writing of the new data, and the size of data in the first data partition may be equal to the storage capacity of the first data partition. Therefore, the scheme is beneficial to improving the utilization rate of the storage space resources.
The following describes in detail aspects of embodiments of the present disclosure with reference to fig. 5. The method shown in fig. 5 is applicable to the server described above. The method shown in FIG. 5 includes steps S510 to S540.
In step S510, the second container reads the first data file stored in the first data partition.
The first container is associated with a first data partition in which a first data file is stored. Before the first data file is updated, the first container may obtain data corresponding to the query request from the first data file based on the query request.
The second container may retrieve the first data file from the first data partition when an update to the first data file is needed. Specifically, the second container may read the first data file from the first data partition into a memory corresponding to the second container.
It will be appreciated that the first data file may be the entire file stored by the first data partition. Each time a data update is made, all files in the first data partition may be updated.
In step S520, the second container merges the first data file and the incremental data file of the first data file to obtain a second data file.
The second container may download the delta data file for the first data file from the data source. For example, the second container may first download the incremental data file to the second data partition, read the incremental data file from the second data partition into the memory when merging, and then merge the first data file and the incremental data file in the memory.
The second container stores the second data file to the second data partition at step S530.
After completion of the merging of the files, the second container may store the second data file directly to the second data partition. That is, the second container stores the second data file to a data partition corresponding to the second container, rather than a data partition corresponding to the query service container.
At step S540, the first container switches the data partition associated with the first container from the first data partition to the second data partition. In other words, the first container may switch the corresponding data file from the first data file to the second data file.
After the second container stores the second data file to the second data partition, the second container may notify the first container so that the first container performs switching of the data partitions. After completing the switching of the data partitions, the first container may process a query request of the user based on the second data file stored in the second data partition. For example, after the switching of the data partition is completed, the first container may receive a query request sent by a user, and based on the query request, the first container may obtain data corresponding to the query request from the second data partition.
It will be appreciated that the first container may still process the user's query request based on the first data file stored in the first data partition before the switch of data partitions is completed.
The server of the disclosed embodiments may include one merged computing container or multiple merged computing containers. If the server only comprises one combined computing container, the combined computing container can sequentially update the data files corresponding to the plurality of query service containers, and the mode is more favorable for improving the utilization rate of storage space resources. If the server comprises a plurality of merging calculation containers, the merging calculation containers can respectively update the data files corresponding to the query service containers, that is, the merging calculation of the data files can be simultaneously performed, so that the file updating speed is favorably improved. The following description takes as an example that the server includes a merged computing container.
After completing the update of the first data file, the second container may switch the data partition associated with the second container from the second data partition to the first data partition. Because the first data partition also stores the first data file, in order to improve the utilization rate of the storage space, the second container can also delete the first data file corresponding to the first data partition. Thereafter, the first data partition may serve as the data partition storing the next newly merged data file.
The server may comprise a plurality of containers which may comprise further containers in addition to the first and second containers described above, e.g. the server may comprise a third container. Wherein the third container is a query service container. The third container is associated with a third data partition of the server. The second container may also update data files stored in the third data partition. That is, if the server includes a plurality of query service containers, the second container may update data files corresponding to the plurality of query service containers.
Specifically, the second container may read a third data file stored in the third data partition and download a delta data file for the third data file from the new data source. Further, the second container may merge the third data file and the incremental data file of the third data file to obtain a fourth data file. The second container may store the fourth data file to the first data partition. After the fourth data file is stored in the first data partition, the second container may notify the third container to perform the switching of the data partitions. After the third container receives the notification, the data partition associated with the third container may be switched from the third data partition to the first data partition.
Of course, the second container may also switch the data partition associated with the second container from the first data partition to the third data partition. Thereafter, the third data partition may serve as the data partition for storing the next updated data file.
When the server includes a plurality of query service containers, there may be a plurality of sequences in which the merged computing container updates the data file, and this is not specifically limited in the embodiment of the present disclosure. For example, the merge computation container may sequentially update the data files corresponding to the plurality of query service containers in a preset order. For another example, the merge computation container may determine the update order of the data files according to the update speed of the data files corresponding to each query service container. For example, if the update speed of the data file corresponding to the first container is fast and the update speed of the data file corresponding to the third container is slow, the second container may update the data file corresponding to the first container twice and then update the data file corresponding to the third container.
The data processing method according to the embodiment of the present disclosure is described in detail below with reference to fig. 6 to 9, taking an example in which a server includes 5 query service containers and 1 merge computation container. Of course, the server may also include other numbers of query service containers and merged computing containers, which is not specifically limited in this disclosure.
The 5 query service containers are respectively containers 0-4. The query service container is only responsible for the query request of the data file on the corresponding disk, and does not need to be responsible for the merging processing of the files. The number of disks corresponding to each container may be one or more. For example, disk 0 may include one or more disks. The number of files stored on each disk may be one or more. For example, the number of files included in file0 may be one or more.
The container 0 may receive a query request of a user for the data file0 on the disk 0, and the container 0 may read data corresponding to the query request from the disk 0 based on the query request. The container 1 may receive a query request of a user for the data file 1 on the disk 1, and the container 1 may read data corresponding to the query request from the disk 1 based on the query request. The container 2 may receive a query request of a user for the data file 2 on the disk 2, and the container 2 may read data corresponding to the query request from the disk 2 based on the query request. The container 3 may receive a query request of a user for the data file 3 on the disk 3, and the container 3 may read data corresponding to the query request from the disk 3 based on the query request. The container 4 may receive a query request of a user for the data file 4 on the disk 4, and the container 4 may read data corresponding to the query request from the disk 4 based on the query request.
The disks 0 to 4 only relate to data reading operation, but not data writing operation, so that conflict of storage IO resources can be avoided.
And the merging calculation container is responsible for merging calculation of the data files corresponding to the containers 0-4. The merge computation container is associated with the disk 5, and the merge computation container may store the merged data file to the disk 5. Before the merge calculation, the disk 5 is a free disk, i.e. no data is stored on the disk 5. When the merging calculation container performs merging calculation on the data files, the data files corresponding to the containers 0-4 can be updated in sequence in a rotating mode. The merging calculation process of the data files is described below in the order of updating container 0, container 1, container 2, container 3, and container 4.
As shown in fig. 7, taking the merge computation container as an example for updating file0, the merge computation container may download delta data file0 from the new data source to disk 5, and furthermore, the merge computation container may read file0 from disk 0. When data merging is performed, the merge computation container may read delta data file0 from disk 5. The merge computation container may then merge file0 with incremental data file0 to obtain new file 0. The merge computation container may write new file0 to disk 5. Thereafter, the merge computing container may also delete delta data file0 in disk 5, leaving only new file 0.
After the new file0 is stored to disk 5, the merge computation container may also notify container 0 to perform a disk switch. After receiving the switch notification sent by the merge computation container, the container 0 may switch the associated disk from disk 0 to disk 5. Container 0 switches the associated disk from disk 0 to disk 5, meaning that container 0 switches the corresponding data file from file0 to new file 0. Thereafter, the container 0 may perform data query based on the new file0 in the disk 5, and the container 0 may read the data queried by the user from the disk 5 and return the data to the user.
The merge computation container may also switch the associated disk from disk 5 to disk 0, as shown in FIG. 8. The disk switching of the container 0 and the disk switching of the merged computing container may be performed simultaneously or sequentially, which is not specifically limited in this embodiment of the disclosure.
As an example, the merge computation container is not disk switched at the same time as container 0. For example, the merge computation container may perform a disk switch after container 0 completes the disk switch. That is, the merge computation container may perform disk switching again on the premise of ensuring successful switching of the container 0.
After the merge calculation container is associated to the disk 0, the file0 in the disk 0 may be deleted, and the disk 0 is used as a new storage disk.
Further, the merge computation container may update file 1 stored in disk 1, as shown in FIG. 8. The merge computation container downloads incremental data file 1 from the new data source to disk 0, and additionally, the merge computation container may read file 1 from disk 1. When data merging is performed, the merge computing container may read delta data file 1 from disk 0. The merge computation container may then merge file 1 with incremental data file 1 to obtain new file 1. The merge computation container may write new file 1 to disk 0. Thereafter, the merge computing container may also delete delta data file 1 in disk 0, leaving only new file 1.
After the new file 1 is stored to disk 0, the merge computation container may also notify container 1 to perform a disk switch. After receiving the switch notification sent by the merge computation container, the container 1 may switch the associated disk from disk 1 to disk 0. Container 1 switches the associated disk from disk 1 to disk 0, which means that container 1 switches the corresponding data file from file 1 to new file 1.
The merge computation container may also switch the associated disk from disk 0 to disk 1. After the merged computing container is associated to the disk 1, the file 1 in the disk 1 may be deleted, and the disk 1 is used as a new storage disk.
The merge computation container may update file 2 stored in disk 2. The merge computation container downloads the delta data file 2 from the new data source to disk 1, and the merge computation container may read file 2 from disk 2. When data merging is performed, the merge computation container may read the delta data file 2 from disk 1. The merge computation container may then merge file 2 with incremental data file 2 to obtain new file 2. The merge computation container may write a new file 2 to disk 1. Thereafter, the merge computing container may also delete delta data file 2 in disk 1, leaving only new file 2.
After the new file 2 is stored to the disk 1, the merge computation container may also notify the container 2 to perform disk switching. After receiving the switch notification sent by the merge computation container, the container 2 may switch the associated disk from the disk 2 to the disk 1. Container 2 switches the associated disk from disk 2 to disk 1, which means that container 2 switches the corresponding data file from file 2 to new file 2.
The merge computation container may also switch the associated disk from disk 1 to disk 2. After the merged computing container is associated to the disk 2, the file 2 in the disk 2 may be deleted, and the disk 2 is used as a new storage disk.
The merge computation container may update the file 3 stored in disk 3. The merge computation container downloads the delta data file 3 from the new data source to disk 2, and the merge computation container may read file 3 from disk 3. When data merging is performed, the merge computation container may read the delta data file 3 from the disk 2. The merge computation container may then merge file 3 with incremental data file 3 to obtain new file 3. The merge computation container may write a new file 3 to disk 2. Thereafter, the merge computation container may also delete the delta data file 3 in disk 2, leaving only the new file 3.
After the new file 3 is stored to the disk 2, the merge computation container may also notify the container 3 to perform a disk switch. After receiving the switch notification sent by the merged computing container, the container 3 may switch the associated disk from the disk 3 to the disk 2. The container 3 switches the associated disk from disk 3 to disk 2, which means that the container 3 switches the corresponding data file from file 3 to the new file 3.
The merge computation container may also switch the associated disk from disk 2 to disk 3. After the merged computing container is associated to the disk 3, the file 3 in the disk 3 may be deleted, and the disk 3 may be used as a new storage disk.
The merge computation container may update the file 4 stored in disk 4. The merge computation container downloads the delta data file 4 from the new data source to disk 3, and the merge computation container may read file 4 from disk 4. When data merging is performed, the merge computation container may read the delta data file 4 from the disk 3. The merge computation container may then merge file 4 with delta data file 4 to obtain new file 4. The merge computation container may write a new file 4 to disk 3. Thereafter, the merge computation container may also delete the delta data files 4 in disk 3, leaving only the new files 4.
After the new file 4 is stored to the disk 3, the merge computation container may also notify the container 4 to perform a disk switch. After receiving the switch notification sent by the merged computing container, the container 4 may switch the associated disk from the disk 4 to the disk 3. The container 4 switches the associated disk from disk 4 to disk 3, which means that the container 4 switches the corresponding data file from file 4 to the new file 4.
The merge computation container may also switch the associated disk from disk 3 to disk 4. After the merged computing container is associated to the disk 4, the file 4 in the disk 4 may be deleted, and the disk 4 may be used as a new storage disk.
After the above file update, the layout of the file corresponding to each container is shown in fig. 9. Container 0 is associated with disk 5, container 1 is associated with disk 0, container 2 is associated with disk 1, container 3 is associated with disk 2, container 4 is associated with disk 3, and container 5 is associated with disk 4.
From the above data updating process, it can be seen that only one extra disk needs to be vacated in the whole data updating process. The storage space resources of other disks can be full, as shown in fig. 6 to 9, the utilization rate of the storage space resources of the entire server can reach five sixths, which is about 83%.
Method embodiments of the present disclosure are described in detail above in conjunction with fig. 1-9, and apparatus embodiments of the present disclosure are described in detail below in conjunction with fig. 10-11. It is to be understood that the description of the method embodiments corresponds to the description of the apparatus embodiments, and therefore reference may be made to the preceding method embodiments for parts not described in detail.
Fig. 10 is a schematic structural diagram of a data processing apparatus provided in an embodiment of the present disclosure. The apparatus 1000 shown in fig. 10 is applicable to a server including a first container and a second container, the first container and the second container being respectively associated with a first data partition and a second data partition of the server, the first container being a query service container, and the second container being a merged computing container. The apparatus 1000 of fig. 10 includes a first reading unit 1010, a first merging unit 1020, a first storing unit 1030, and a first switching unit 1040.
A first reading unit 1010, configured to read the first data file stored in the first data partition by using the second container.
A first merging unit 1020, configured to merge the first data file and the incremental data file of the first data file by using the second container to obtain a second data file.
A first storing unit 1030, configured to store the second data file to the second data partition by using the second container.
A first switching unit 1040, configured to switch, by using the first container, the data partition associated with the first container from the first data partition to the second data partition.
Optionally, the apparatus 1000 further comprises: a second switching unit, configured to switch, by using the second container, the data partition associated with the second container from the second data partition to the first data partition; a deleting unit configured to delete the first data file stored in the first data partition using the second container.
Optionally, the server further includes a third container, where the third container is a query service container, and the third container is associated with a third data partition of the server, and the apparatus further includes: a second reading unit configured to read a third data file stored in the third data partition using the second container; the second merging unit is used for merging the third data file and the incremental data file of the third data file by using the second container to obtain a fourth data file; a second storage unit, configured to store the fourth data file to the first data partition by using the second container; a third switching unit, configured to switch, by using the third container, the data partition associated with the third container from the third data partition to the first data partition.
Optionally, the apparatus 1000 further comprises: a receiving unit, configured to receive a query request using the first container; an obtaining unit, configured to obtain, by using the first container, data corresponding to the query request from the second data partition.
Optionally, the container is a Docker-based container.
Optionally, the server is a batch system server.
Fig. 11 is a schematic structural diagram of a data processing apparatus according to yet another embodiment of the present disclosure. The apparatus 1100 may be any of the servers described above. The apparatus 1100 may include a memory 1110 and a processor 1120. Memory 1110 may be used to store executable code. The processor 1120 may be configured to execute the executable code stored in the memory 1110 to implement the steps of the various methods described above. In some embodiments, the apparatus 1100 may further include a network interface 1130, and data exchange between the processor 1120 and an external device may be implemented through the network interface 1130.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any other combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the disclosure are, in whole or in part, generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (13)

1. A data processing method is applied to a server, the server comprises a first container and a second container, the first container and the second container are respectively associated with a first data partition and a second data partition of the server, the first container is a query service container, the second container is a merged computing container,
the method comprises the following steps:
the second container reads a first data file stored in the first data partition;
the second container merges the first data file and the incremental data file of the first data file to obtain a second data file;
the second container stores the second data file to the second data partition;
the first container switches the data partition associated with the first container from the first data partition to the second data partition.
2. The method of claim 1, further comprising:
the second container switches the data partition associated with the second container from the second data partition to the first data partition;
the second container deletes the first data file stored in the first data partition.
3. The method of claim 2, the server further comprising a third container, the third container being a query service container, the third container being associated with a third data partition of the server,
the method further comprises the following steps:
the second container reads a third data file stored in the third data partition;
the second container merges the third data file and the incremental data file of the third data file to obtain a fourth data file;
the second container stores the fourth data file to the first data partition;
the third container switches the data partition associated with the third container from the third data partition to the first data partition.
4. The method of claim 1, further comprising:
the first container receiving a query request;
the first container obtains data corresponding to the query request from the second data partition.
5. The method of claim 1, the first container and the second container being Docker-based containers.
6. The method of claim 1, the server being a batch system server.
7. A data processing device is applied to a server, the server comprises a first container and a second container, the first container and the second container are respectively associated with a first data partition and a second data partition of the server, the first container is a query service container, the second container is a merged computing container,
the device comprises:
a first reading unit configured to read a first data file stored in the first data partition using the second container;
the first merging unit is used for merging the first data file and the incremental data file of the first data file by using the second container to obtain a second data file;
a first storage unit configured to store the second data file to the second data partition using the second container;
a first switching unit, configured to switch the data partition associated with the first container from the first data partition to the second data partition by using the first container.
8. The apparatus of claim 7, further comprising:
a second switching unit, configured to switch, by using the second container, the data partition associated with the second container from the second data partition to the first data partition;
a deleting unit configured to delete the first data file stored in the first data partition using the second container.
9. The apparatus of claim 8, the server further comprising a third container, the third container being a query service container, the third container associated with a third data partition of the server,
the device further comprises:
a second reading unit configured to read a third data file stored in the third data partition using the second container;
the second merging unit is used for merging the third data file and the incremental data file of the third data file by using the second container to obtain a fourth data file;
a second storage unit, configured to store the fourth data file to the first data partition by using the second container;
a third switching unit, configured to switch, by using the third container, the data partition associated with the third container from the third data partition to the first data partition.
10. The apparatus of claim 7, further comprising:
a receiving unit, configured to receive a query request using the first container;
an obtaining unit, configured to obtain, by using the first container, data corresponding to the query request from the second data partition.
11. The apparatus of claim 7, the first and second containers being Docker-based containers.
12. The apparatus of claim 7, the server being a batch system server.
13. A data processing apparatus comprising a memory having executable code stored therein and a processor configured to execute the executable code to implement the method of any one of claims 1 to 6.
CN202111557964.2A 2021-12-20 2021-12-20 Data processing method and device Active CN113950145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111557964.2A CN113950145B (en) 2021-12-20 2021-12-20 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111557964.2A CN113950145B (en) 2021-12-20 2021-12-20 Data processing method and device

Publications (2)

Publication Number Publication Date
CN113950145A CN113950145A (en) 2022-01-18
CN113950145B true CN113950145B (en) 2022-03-08

Family

ID=79339279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111557964.2A Active CN113950145B (en) 2021-12-20 2021-12-20 Data processing method and device

Country Status (1)

Country Link
CN (1) CN113950145B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107786638A (en) * 2017-09-27 2018-03-09 华为技术有限公司 A kind of data processing method, apparatus and system
CN108958881A (en) * 2018-05-31 2018-12-07 平安科技(深圳)有限公司 Data processing method, device and computer readable storage medium
CN112711564A (en) * 2019-10-24 2021-04-27 华为技术有限公司 Merging processing method and related equipment
EP3859626A1 (en) * 2018-09-26 2021-08-04 Beijing Geekplus Technology Co., Ltd. Warehouse management system and method
CN113778347A (en) * 2021-11-15 2021-12-10 广东睿江云计算股份有限公司 Read-write quality optimization method for ceph system and server

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107786638A (en) * 2017-09-27 2018-03-09 华为技术有限公司 A kind of data processing method, apparatus and system
CN108958881A (en) * 2018-05-31 2018-12-07 平安科技(深圳)有限公司 Data processing method, device and computer readable storage medium
EP3859626A1 (en) * 2018-09-26 2021-08-04 Beijing Geekplus Technology Co., Ltd. Warehouse management system and method
CN112711564A (en) * 2019-10-24 2021-04-27 华为技术有限公司 Merging processing method and related equipment
CN113778347A (en) * 2021-11-15 2021-12-10 广东睿江云计算股份有限公司 Read-write quality optimization method for ceph system and server

Also Published As

Publication number Publication date
CN113950145A (en) 2022-01-18

Similar Documents

Publication Publication Date Title
CN109254733B (en) Method, device and system for storing data
US11182356B2 (en) Indexing for evolving large-scale datasets in multi-master hybrid transactional and analytical processing systems
US9679003B2 (en) Rendezvous-based optimistic concurrency control
CN109716324B (en) Direct table association in-memory database
CN103023982B (en) Low-latency metadata access method of cloud storage client
CN104160381A (en) Managing tenant-specific data sets in a multi-tenant environment
CN113672627B (en) Method and device for constructing index of elastic search engine
CN111258978A (en) Data storage method
CN104423982A (en) Request processing method and device
US10747773B2 (en) Database management system, computer, and database management method
CN109726264A (en) Method, apparatus, equipment and the medium updated for index information
US20170270147A1 (en) Method and apparatus for storing data
CN114741335A (en) Cache management method, device, medium and equipment
CN102724301B (en) Cloud database system and method and equipment for reading and writing cloud data
CN114610680A (en) Method, device and equipment for managing metadata of distributed file system and storage medium
CN110457307B (en) Metadata management system, user cluster creation method, device, equipment and medium
CN111414356A (en) Data storage method and device, non-relational database system and storage medium
US10942912B1 (en) Chain logging using key-value data storage
CN113032349A (en) Data storage method and device, electronic equipment and computer readable medium
CN113950145B (en) Data processing method and device
CN113778975B (en) Data processing method and device based on distributed database
CN114328466A (en) Data cold and hot storage method and device and electronic equipment
US20140040578A1 (en) Managing data set volume table of contents
US11789971B1 (en) Adding replicas to a multi-leader replica group for a data set
CN108595488B (en) Data migration method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant